Viettel OCR - the solution to "automating" data

31/05/2022

The story of OCR and the hidden power of a "big data" processing technology.


In the digital world, there is a term called "Dark data" - "Dark data" is unstructured data that cannot be used without processing, analysis, and arrangement. If data grows exponentially then where what is supposed to be "dark data" grows exponentially. This requires people to be ready to handle big and super big data solutions. The role of data is especially important, but exploiting and optimizing data to become valuable assets is not simple.


In the last 3 years, the OCR document digitization solution has become attractive to many businesses because of its data processing power. But few know, since the early years of the twentieth century, physicist Emanuel Goldberg has developed a machine that reads the characters and converts them into a standard telegraph code called a "statistical computer" to search digitally stored in microfilm using an optical code recognition system.


In 1931, he was granted US Patent No. 1,838,389 for his invention which was later acquired by IBM. Around the same time, Edmund Fournier d'Albe developed the Optophone, a hand-held scanner that, as it moved over a printed page, produced sounds corresponding to specific letters or characters.


This was the first foundation of recordkeeping automation, bringing "dark data" to light by generating structured data (SQL tables) from unstructured information (text, tables, images...) and integrate that data with an existing structured database. IBM quickly acquired Emanuel's patent and continued research and development. Until 2002, being able to use OCR right on mobile phones and desktops via cloud computing was considered a turning point.


In Vietnam, although OCR is approached later, so far, it has achieved similar results to the world in Vietnamese language processing (big technology companies in the world often focus on language processing). English).


In 2020, according to the provisions of Circular 23/2019/TT-NHNN, e-wallet and intermediary payment services must authenticate user accounts via ID card, or regulations related to opening an account of the Bank. State-owned goods are the driving force for businesses to quickly apply OCR to extract information, automate the data entry process, and review information. Facing great demand, expanding market, and government policies to promote digital transformation are the driving force for Viettel Cyber ​​Center to focus on researching and packaging OCR solutions on the basis of combining technologies:


 Optical character recognition (OCR) technology allows to recognize of documents in PDF, image, paper documents...;

- Natural language processing (NLP) technology automatically corrects information to ensure high semantic accuracy


The strength of Viettel OCR also comes from deep learning technology (Deep Learning) that provides recognition results of over 99% for printed letters, over 90% for handwriting and up to 98% for information extraction. by market, outperforming other developers in the same field in the market by 4-5%.



Payment method
vnpay vtmoney
Banner_CTTDT_BQP2 Banner_CDVC_BQP2

logoSaleNoti