| Texto completo | |
| Autor(es): |
dos Santos, Jose Antonio
;
Souza, Ellen
;
Bastos Filho, Carmelo J. A.
;
Albuquerque, Hidelberg O.
;
Vitorio, Douglas
;
Gouveia de Lucena, Danilo Carlos
;
Silva, Nadia
;
de Carvalho, Andre
Número total de Autores: 8
|
| Tipo de documento: | Artigo Científico |
| Fonte: | PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2024, PT I; v. 14967, p. 12-pg., 2025-01-01. |
| Resumo | |
The use of Transformers for text processing has attracted a large deal of attention in the last years. This is particularly true for sentence models, which present high capacity to comprehend and generate text contextually, improving the predictive performance in different Natural Language Processing tasks, when compared with previous approaches. Even so, there are still several challenges when applied to long documents, especially for some knowledge areas with very specific characteristics, such as legislative proposals. This study investigated different strategies for utilizing BERT-based models in long document retrieval written in Brazilian Portuguese. We used three corpora from the Brazilian Chamber of Deputies to build a dataset and assess the models, incorporating zero-shot and fine-tuning strategies. Five sentence models were evaluated: BERTimbau, LegalBert, LegalBert-pt, LegalBERTimbau, and LaBSE. We also assessed a summarized corpus of bills considering the input size limitation of the sentence models. Finaly, we propose a hybrid model, named HIRS, combining BM25 and BERTimbau with fine-tuning. According to the experimental results, the predictive performance obtained by HIRS was superior to the performance obtained by the other models, with a Recall of 84.78% for 20 documents. (AU) | |
| Processo FAPESP: | 13/07375-0 - CeMEAI - Centro de Ciências Matemáticas Aplicadas à Indústria |
| Beneficiário: | Francisco Louzada Neto |
| Modalidade de apoio: | Auxílio à Pesquisa - Centros de Pesquisa, Inovação e Difusão - CEPIDs |