Busca avançada
Ano de início
Entree


JurisBERT: A New Approach that Converts a Classification Corpus into an STS One

Texto completo
Autor(es):
Viegas, Charles F. O. ; Costa, Bruno C. ; Ishii, Renato P.
Número total de Autores: 3
Tipo de documento: Artigo Científico
Fonte: COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2023, PT I; v. 13956, p. 17-pg., 2023-01-01.
Resumo

We propose in this work a new approach that aims to transform a classification corpus into an STS (Semantic Textual Similarity) one. In that sense, we use BERT (Bidirectional Encoder Representations from Transformers) to validate our hypothesis, i.e., a multi-level classification dataset can be converted into an STS dataset which improves the fine-tuning step and evidences the proposed corpus. Also, in our approach, we trained from scratching a BERT model considering the legal texts, called JurisBert which reveals a considered improvement in fastness and precision, and it requires less computational resources than other approaches. JurisBERT uses the concept of sub-language, i.e., a model pre-trained in a language (Brazilian Portuguese) passes through refining (fine-tuning) to better attend to a specific domain, in our case, the legal field. JurisBERT uses 24k pairs of ementas with degrees of similarity varying from 0 to 3. We got this data from search mechanisms available on the court websites to validate the model with realworld data. Our experiments showed JurisBERT is better than other models such as multilingual BERT and BERTimbau with 3.30% better precision (F-1), 5 times reduced training time, and using accessible hardware, i.e., low-cost GPGPU architecture. (AU)

Processo FAPESP: 15/24485-9 - Internet do futuro aplicada a cidades inteligentes
Beneficiário:Fabio Kon
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 14/50937-1 - INCT 2014: da Internet do Futuro
Beneficiário:Fabio Kon
Modalidade de apoio: Auxílio à Pesquisa - Temático