Advanced search
Start date

HUB FUNDOS: computational platform for the standardization of information in the investment fund industry

Grant number: 23/10398-3
Support Opportunities:Research Grants - Innovative Research in Small Business - PIPE
Duration: April 01, 2024 - December 31, 2024
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Aloisio Mota Rodrigues Junior
Grantee:Aloisio Mota Rodrigues Junior
Host Company:RTM Infraestrutura em Tecnologia da Informação Eireli
CNAE: Desenvolvimento de programas de computador sob encomenda
Consultoria em tecnologia da informação
Tratamento de dados, provedores de serviços de aplicação e serviços de hospedagem na internet
City: São Paulo
Associated researchers: Luiz Guilherme Miguel Jucá
Associated scholarship(s):24/03872-3 - Testing and Implementation of the NER-BERT Algorithm in Compliance with ISO 20022 Standard Testing and Implementation of the NER-BERT Algorithm in Compliance with ISO 20022 Standard, BP.TT
24/01554-4 - HUB FUNDOS - Computational platform for the standardization of information in the investment fund industry, BP.TT


Over the last ten years, the Funds industry in Brazil has grown continuously by double digits per year. Despite constant growth, the industry is not yet in the digital age: paper, telephone, e-mail and manual tasks dominate the daily lives of institutions and shareholders (investors holding fund shares), whether individuals or companies. It is worth mentioning that, in addition to quota holders, the fund industry is a complex ecosystem, basically composed of managers, distributors, trustees, custodians, liability controllers, asset controllers, auditors, regulators and self-regulators, and clearing and settlement chambers. With the high number of actors in the ecosystem of this important industry, the fluid management of information relevant to its broad operation becomes essential. In this context, one of the main challenges is to establish effective and efficient communication between the different participants in the business processes that involve the management of investment funds, so that the standardization of all available documentation, and consequently the information exchanged between all players, is characterized by a necessary and crucial research axis. It is known that much of the information from the different sources is in an unstructured format, which makes the task of reading, identifying and subsequently registering the data of interest even more onerous. Many terms and different business domains from the investment fund industry must be evaluated and structured according to the ISO 20022, that it is an international norm that establishes a standard for exchanging messages in the financial sector. This standard defines a universal language for financial communications, facilitating interoperability between different systems, facilitating transaction efficiency and providing benefits in the global financial industry. This standard messaging, so a computational tool that is capable of incorporating resources for the standardization of such information would be a differential in this niche market, especially for the Portuguese language. In this sense, the present scientific and technological research project proposes the development of a computational algorithm based on a technique from the field of Natural Language Processing (NLP), called Named Entity Recognition (NER - Named Entity Recognition) and an algorithm of pre-trained deep learning (BERT) in order to automatically extract the different terms and expressions related to messages collected from different sources of the aforementioned fund industry. The steps foreseen in the project include the collection of data in documents whose texts are digitized, but also texts extracted from printed documents, in addition to information transcribed from audio obtained from meetings and phone calls, for example. After the construction of a single text data repository (textual corpus), the project foresees at least three following steps, whose purpose will be to investigate the best approaches for pre-processing and cleaning the dataset and in particular the development of a natural language model of the NER-BERT type. The NER model must be trained and validated, in such a way that the main elements identified in the exchanged messages are labeled and standardized in accordance with the recommendations of the ISO 20022 standard. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items

Please report errors in scientific publications list using this form.