Research and Innovation: System for robust reading of text in images using deep learning

Abstract

There is increasing interest in the extraction and automatic reading of texts in images, as they are techniques that enable a large number of commercial applications such as: reading texts in web images and videos, identifying texts captured by surveillance videos, maps, engineering, serial number recognition of containers, vehicle license plates, signboards, price plates, texts on packaging, various announcements, among others. The term Robust Reading represents the area of research related to the interpretation of written communication in unrestricted environments. The large interest in this area can be evidenced by the recent increase of competitions and challenges in the area, called the Robust Reading Competition, organized by the International Conference on Document Analysis and Recognition (ICDAR / IAPR), which jumped from only one competition in 2003 to 9 in 2017. The detection, localization, classification of characters and words in images is a subject that the scientific community has already indicated its importance since the 80's, and the methodology has been progressively evolving with text segmentation techniques based on characteristics such as shape, color, texture, and traditional methods of pattern classification. Although this is a long-standing problem, the recent increase in competitions dedicated to developing more efficient technological solutions to this end, proves that there is a market need for applications that achieve near human rates or lower. As of 2012, the Deep Learning methodology has won all competitions of image recognition on a large scale by the ImageNet competition, achieving in 2015 lower error rates than the human ones, leaving the traditional techniques of image processing and pattern recognition practically obsolete. Observing the Robust Reading competitions, starting in 2017, all winning techniques use Deep Learning. However, when analyzing the error rates of these competitions, it is observed that they are still far from the human rate, and that there is room for significant improvements, which will likely occur over the next three years. Analyzing the error rates of these competitions, it is possible to notice that they are still far from the human rate - for example, the best accuracy obtained in 2017 for localization and simultaneous recognition of text was 44%. Therefore, there is significant room for improvement, which is likely to occur over the next three years. Considering this market opportunity for technologies that achieve a significant decrease in error rates in Robust Reading, this project aims to analyze and test the various architectures used in the Deep Learning techniques winners of the Robust Reading 2017 competitions, to design a new architecture that is competitive with state-of-the-art Robust Reading systems. This new architecture will be evaluated and tested with data from the Robust Reading Competition competitions and used to develop the prototypes of two initial applications - automatic fuel price reading on gas station panels and advanced reading of vehicle license plates. The complete development of these applications, as well as others for the area of logistics and security, will be the target of a Phase 2 PIPE project. It is expected that the development of state-of-the-art Deep Learning techniques in Robust Reading will form the core competency of NeuralMind, positioning it as an artificial intelligence globally competitive company in computer vision and text processing. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:

More items Less items

TITULO

Articles published in other media outlets ( ):

More items Less items

VEICULO: TITULO (DATA)

Grant number:	18/01188-7
Support Opportunities:	Research Grants - Innovative Research in Small Business - PIPE
Start date:	October 01, 2018
End date:	June 30, 2019
Field of knowledge:	Engineering - Electrical Engineering

Principal Investigator:	Roberto de Alencar Lotufo
Grantee:	Roberto de Alencar Lotufo

Company:	Neuralmind Inteligência Artificial Ltda
CNAE:	Desenvolvimento e licenciamento de programas de computador customizáveis Pesquisas de mercado e de opinião pública
City:	Campinas

Associated research grant(s):	19/06667-3 - Robust reading of documents using deep learning, AP.PIPE
Associated scholarship(s):	18/21707-9 - Robust reading system applied for license plate and fuel prices, BP.TT

Short URL