Advanced search
Start date

Robust reading of documents using deep learning

Grant number: 19/06667-3
Support type:Research Grants - Innovative Research in Small Business - PIPE
Duration: March 01, 2020 - February 28, 2022
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Roberto de Alencar Lotufo
Grantee:Roberto de Alencar Lotufo
Company:Neuralmind Inteligência Artificial Ltda
CNAE: Desenvolvimento e licenciamento de programas de computador customizáveis
City: Campinas
Assoc. researchers: Rodrigo Frassetto Nogueira ; Rubens Campos Machado
Associated research grant:18/01188-7 - System for robust reading of text in images using deep learning, AP.PIPE
Associated scholarship(s):20/04814-6 - Generation and Synthesis of form-like documents using Generative Adversarial Networks, BP.TT
20/04829-3 - Designing End-to-End Architectures for Robust Reading of Registration Documents, BP.TT


The automated reading of text in images brings to bear a growing interest in enabling a large number of commercial applications such as: automatic registrations based on relevant information from cadastral documents, instant queries in inspection services, semi-supervised checking and structuring in accountability, assistance in the surveillance with recognition of vehicle and traffic signs, recognition of serial numbers from containers, product packaging, etc.; among others. Robust Reading represents the research area that explores written communication in unrestricted environments. The leading interest in this area is evidenced by the recent increase in challenges in the Robust Reading Competition, one of the several competitions held during the International Conference on Document Analysis and Recognition (ICDAR). In its seventh edition, the 2019 ICDAR Robust Reading Competition involves six challenges, five of which are introduced in the current edition. One of these challenges, Scanned Receipts OCR and Information Extraction, arouses our interest. Character localization and classification, as well as word recognition in images, are topics whose relevance has been evidenced by the scientific community since the 80's. Although it has been investigated for a long time, there are huge market needs for applications that achieve error rates lower than the human ones, which is corroborated by the recent increase in competitions dedicated to developing more efficient and effective technological solutions for this purpose, and by the involvement of high technology companies in the research and development of solutions in the field. From 2012, the Deep Learning methodology started to win all the challenges of large-scale image competition (ImageNet), reaching, in 2015, error rates smaller than the human ones and turning the traditional image processing and recognition techniques almost obsolete. Just in 2017, Deep Learning finally became part of all the winning solutions of the robust reading competition. However, the best error rate for simultaneous text localization and recognition was 44%. The error rates are still far from the human rate. Therefore, it is concluded that there is significant room for improvement, which will probably occur in the next two years. Taking into account this market opportunity for technologies that achieve significant error-rates reduction in Robust Reading - in addition to the experience boost of our team in analyzing, implementing, prototyping and validating the recently proposed approaches - this project aims to extend Phase 1 through the design of a novel competitive end-to-end architecture with the current state of the art in robust reading. In Phase 2, we aim at expanding the research team by creating a strong core willing to develop the NeuralMind OCR algorithm: a model that is not only efficient and reliable, recognizing cadastral data with the least possible error; as well as efficient, requiring the minimum of parameters and making possible its use in mobile devices; robust to noise, i.e., operating in different lighting and document-preservation conditions; and ultimately scalable, to accommodate multiple documents bypassing re-configurations. In addition, putting together all the attributes of our model, we aim not only at detecting and recognizing texts in cadastral documents, but also at inferring, in a single pass, semantic information that allows to classify the category of the field, resulting in a unified approach for any type of cadastral document. This project will position the NeuralMind as a company of global competitiveness in the area of document digitalization, recognition and reading, based on artificial intelligence. (AU)

Articles published in other media outlets (2 total):
UOL: Startups brasileiras usam inteligência artificial para diagnosticar Covid (25/Aug/2020)
ResumoCast: Startups brasileiras usam inteligência artificial para diagnosticar Covid – 25/08/2020 (25/Aug/2020)