Advanced search
Start date

EMU Collection Infrastructure: acquisition of computing infrastructure for a Web Portal to Linguistic Resources and Data Related to Research in Artificial Intelligence

Grant number: 22/11254-2
Support Opportunities:Research Infrastructure Program - Collections
Duration: June 01, 2023 - May 31, 2026
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Marcelo Finger
Grantee:Marcelo Finger
Host Institution: Centro de Inovação da USP (INOVA). Universidade de São Paulo (USP). São Paulo , SP, Brazil
Associated researchers:Fabio Gagliardi Cozman


This proposal falls within the scope of the call for Multiuser Equipment for Information Depository Centers, Document Collections and/or Historiographic and Biological Collections - 2022, within the category Support for Research Infrastructure of Archives and Document Collections. In particular, the call aims to obtain the infrastructure for the construction of a portal for the public availability of various resources used and generated by research in the field of artificial intelligence produced by the associated projects mentioned above, as well as by their partners. In particular, the projects aim at the production of linguistic data from Brazilian Portuguese in digital format, composed both by corpus (collections) of pure texts and annotated with morphosyntactic, syntactic and semantic annotations; and by digital audio recordings of Brazilian Portuguese speakers. Databases on areas of specific interest are also generated in these projects, such as information on the Brazilian coast (Blue Amazon) and food production networks, as well as various computer programs using Big Data and Deep Learning processing techniques. Three major categories of data produced by the center should be considered: Textual corpuses, that is, large collections of texts, on the order of billions of words, with or without morphosyntactic, syntactic and semantic annotations. Audio corpus, with transcription of Brazilian Portuguese recordings collected over the last 50 years. Structured and semi-structured databases containing Big Data about the C4AI study areas: georeferenced databases and ocean information (Amazônia Azul); databases of food production and agriculture networks; medical information databases for stroke diagnosis and recovery; databases to inform public policy on artificial intelligence and the future of work. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items

Please report errors in scientific publications list using this form.