Advanced search
Start date
Betweenand

#PraCegoVer: automatic image audiodescription

Grant number: 19/24041-4
Support type:Scholarships in Brazil - Scientific Initiation
Effective date (Start): January 01, 2020
Effective date (End): December 31, 2021
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal researcher:Sandra Eliza Fontes de Avila
Grantee:Gabriel Oliveira dos Santos
Home Institution: Instituto de Computação (IC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil
Associated research grant:13/08293-7 - CCES - Center for Computational Engineering and Sciences, AP.CEPID

Abstract

The Internet has become increasingly accessible, reaching a wide range of audiences. However, little progress has been made in the inclusion of people with disabilities, and the scenery is even worse when it comes specifically to visual impairment, as much of the published content is purely visual (e.g., photos, advertising images). Every day, the visually impaired suffer from the violation of their right to access to the Internet, which is guaranteed by Article 4, the item I of the Marco Civil da Internet. Thus, automatically describing the content of images using well-formulated sentences is an essential task for the inclusion of visually impaired people on the Internet. However, making this kind of description, known as image captioning, is still a challenge. Image captioning aims to describe not only the objects contained in the image but also the semantic relationship between them. Thus, in addition to visual interpretation methods, linguistic models are needed to express the semantic issues described. Recently, social image posts are being published with the hashtag #PraCegoVer, which consists of a brief description of the visual content of the image. The PraCegoVer project started in 2012, aims to disseminate the culture of accessibility in social networks through the audio description of images (translation of visual content to textual, obeying accessibility criteria) for the appreciation of visually impaired people. Inspired by this movement, this Scientific Initiation project aims to investigate Machine Learning techniques for image captioning for audio description. The main objectives are: 1) to build a multimodal database, composed of images and their audio descriptions annotated in Portuguese (most of the databases are directed to the English language); 2) study and propose improvements for neural network architectures for image captioning; 3) create a template for automatic audio description generation, which will be validated with an expert in the field. As a result, we hope to include visually impaired people on the Internet, making it more accessible. (AU)