Research and Innovation: Robust reading of documents using deep learning

Abstract

The automated reading of text in images brings to bear a growing interest in enabling a large number of commercial applications such as: automatic registrations based on relevant information from cadastral documents, instant queries in inspection services, semi-supervised checking and structuring in accountability, assistance in the surveillance with recognition of vehicle and traffic signs, recognition of serial numbers from containers, product packaging, etc.; among others. Robust Reading represents the research area that explores written communication in unrestricted environments. The leading interest in this area is evidenced by the recent increase in challenges in the Robust Reading Competition, one of the several competitions held during the International Conference on Document Analysis and Recognition (ICDAR). In its seventh edition, the 2019 ICDAR Robust Reading Competition involves six challenges, five of which are introduced in the current edition. One of these challenges, Scanned Receipts OCR and Information Extraction, arouses our interest. Character localization and classification, as well as word recognition in images, are topics whose relevance has been evidenced by the scientific community since the 80's. Although it has been investigated for a long time, there are huge market needs for applications that achieve error rates lower than the human ones, which is corroborated by the recent increase in competitions dedicated to developing more efficient and effective technological solutions for this purpose, and by the involvement of high technology companies in the research and development of solutions in the field. From 2012, the Deep Learning methodology started to win all the challenges of large-scale image competition (ImageNet), reaching, in 2015, error rates smaller than the human ones and turning the traditional image processing and recognition techniques almost obsolete. Just in 2017, Deep Learning finally became part of all the winning solutions of the robust reading competition. However, the best error rate for simultaneous text localization and recognition was 44%. The error rates are still far from the human rate. Therefore, it is concluded that there is significant room for improvement, which will probably occur in the next two years. Taking into account this market opportunity for technologies that achieve significant error-rates reduction in Robust Reading - in addition to the experience boost of our team in analyzing, implementing, prototyping and validating the recently proposed approaches - this project aims to extend Phase 1 through the design of a novel competitive end-to-end architecture with the current state of the art in robust reading. In Phase 2, we aim at expanding the research team by creating a strong core willing to develop the NeuralMind OCR algorithm: a model that is not only efficient and reliable, recognizing cadastral data with the least possible error; as well as efficient, requiring the minimum of parameters and making possible its use in mobile devices; robust to noise, i.e., operating in different lighting and document-preservation conditions; and ultimately scalable, to accommodate multiple documents bypassing re-configurations. In addition, putting together all the attributes of our model, we aim not only at detecting and recognizing texts in cadastral documents, but also at inferring, in a single pass, semantic information that allows to classify the category of the field, resulting in a unified approach for any type of cadastral document. This project will position the NeuralMind as a company of global competitiveness in the area of document digitalization, recognition and reading, based on artificial intelligence. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:

More items Less items

TITULO

Articles published in other media outlets ( ):

More items Less items

VEICULO: TITULO (DATA)

Grant number:	19/06667-3
Support Opportunities:	Research Grants - Innovative Research in Small Business - PIPE
Start date:	March 01, 2020
End date:	April 30, 2022
Field of knowledge:	Physical Sciences and Mathematics - Computer Science - Computer Systems
Agreement:	FINEP - PIPE/PAPPE Grant

Principal Investigator:	Roberto de Alencar Lotufo
Grantee:	Roberto de Alencar Lotufo


Company:	Neuralmind Inteligência Artificial Ltda
CNAE:	Desenvolvimento e licenciamento de programas de computador customizáveis

Associated researchers:	Rodrigo Frassetto Nogueira ; Rubens Campos Machado

Associated research grant:	18/01188-7 - System for robust reading of text in images using deep learning, AP.PIPE
Associated scholarship(s):	21/13144-7 - Generation of appropriate synthetic data for the training of robust reading models, BP.TT 21/02589-8 - Generation of appropriate synthetic data for the training of robust reading models, BP.TT 20/14120-1 - Capture and pre-processing of images contained several cadastral documents, BP.TT 20/04829-3 - Designing End-to-End Architectures for Robust Reading of Registration Documents, BP.TT 20/04814-6 - Generation and Synthesis of form-like documents using Generative Adversarial Networks, BP.TT

Short URL