For years, extracting information from document images, especially those containing unstructured tables, handwritten content, and non-standard formatting has posed a significant challenge in the tech world. Tech giants like Google, Microsoft, and Amazon have developed powerful Optical Character Recognition (OCR) and Natural Language Processing (NLP) systems, but most are optimized primarily for English and other widely used languages. Vietnamese, with its rich system of diacritics and multilayered phonetic structure, often yields low accuracy results from these international systems,particularly in recognizing handwriting.
The research project titled "Development of Technology for Extracting Information from Document Images with Diverse Layouts, Tables, and Vietnamese Handwriting", led by Viettel’s Data Services and Artificial Intelligence Center (Viettel AI), received the VIFOTEC Consolation Prize in 2024. VIFOTEC is a national award jointly organized by the Vietnam Union of Science and Technology Associations, the Ministry of Science and Technology, the Vietnam General Confederation of Labor, and the Central Committee of the Ho Chi Minh Communist Youth Union. This research marks one of the first Vietnamese-developed and fully-owned core technologies for information extraction from Vietnamese texts.
This technology can accurately recognize and process complex structured document images, including unstructured tables, non-standard formats, and notably, Vietnamese handwriting. The recognition accuracy reaches up to 99% for printed text and 90% for handwriting. For chart and table recognition, Viettel AI’s technology achieves 98.6% accuracy, comparable to major players like Microsoft, Alibaba, Tencent, or IFLYTEKwhile offering faster processing speeds. Rather than developing isolated products, Viettel AI has chosen a unique path by building an open-architecture technology platform. This allows flexible module integration and customization to suit various specific problem scenarios.
Viettel AI's text extraction technology from image-based documents boasts an impressive accuracy of up to 99% for printed text and 90% for handwritten text.
Based on this platform, Viettel AI has developed numerous practical solutions for real-world use. A standout example is Viettel IPA (Intelligent Process Automation), which is widely deployed across sectors like insurance, banking, and administrative services.
Viettel IPA enables the recognition of handwritten text and automatic information extraction from identification documents, invoices, contracts, and more. It can classify different types of documents and automate approval and verification processes, significantly optimizing paperwork handling in organizations. Notably, the product helps digitize millions of records and saves up to 80% of time and effort for businesses and institutions in their documentation workflows. It also supports document monitoring in cyberspace, identity verification, and automation of administrative text processing.
Previously, Viettel AI’s text recognition technology has received multiple prestigious domestic and international awards, including the IT World Awards, Make in Vietnam, and Vietnam Digital Awards. The project has also been presented at three leading global conferences in computer vision and AI: ICDAR (International Conference on Document Analysis and Recognition), DICTA (International Conference on Digital Image Computing), and IEEE NICS (Conference on Information and Computer Science). A significant component of the project the fixed-form template processing modulehas been granted a patent by the Vietnam Intellectual Property Office.The Vietnam Science and Technology Innovation Awards (VIFOTEC) is a national award that honors projects with novelty, creativity, effectiveness, and high applicability. It aims to recognize contributions to the development of science, technology, and the socio-economic growth of the country.
Viettel AI, a unit under Viettel Group, is a pioneer in mastering and developing products and services in AI, Big Data, Robotics, and Digital Twin technologies. Today, the Viettel AI ecosystem includes a wide range of leading products in Vietnam, trusted by many large domestic and international organizations.
Other news