Advanced search
Start date
Betweenand


PetroBERT: A Domain Adaptation Language Model for Oil and Gas Applications in Portuguese

Full text
Author(s):
Show less -
Rodrigues, Rafael B. M. ; Privatto, Pedro I. M. ; de Sousa, Gustavo Jose ; Murari, Rafael P. ; Afonso, Luis C. S. ; Papa, Joao P. ; Pedronette, Daniel C. G. ; Guilherme, Ivan R. ; Perrout, Stephan R. ; Riente, Aliel F. ; Pinheiro, V ; Gamallo, P ; Amaro, R ; Scarton, C ; Batista, F ; Silva, D ; Magro, C ; Pinto, H
Total Authors: 18
Document type: Journal article
Source: COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022; v. 13208, p. 9-pg., 2022-01-01.
Abstract

This work proposes the PetroBERT, which is a BERT-based model adapted to the oil and gas exploration domain in Portuguese. PetroBERT was pre-trained using the Petroles corpus and a private daily drilling report corpus over BERT multilingual and BERTimbau. The proposed model was evaluated in the NER and sentence classification tasks and achieved interesting results, which shows its potential for such a domain. To the best of our knowledge, this is the first BERT-based model to the oil and gas context. (AU)

FAPESP's process: 14/12236-1 - AnImaLS: Annotation of Images in Large Scale: what can machines and specialists learn from interaction?
Grantee:Alexandre Xavier Falcão
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 18/15597-6 - Aplication and investigation of unsupervised learning methods in retrieval and classification tasks
Grantee:Daniel Carlos Guimarães Pedronette
Support Opportunities: Research Grants - Young Investigators Grants - Phase 2
FAPESP's process: 19/07665-4 - Center for Artificial Intelligence
Grantee:Fabio Gagliardi Cozman
Support Opportunities: Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program
FAPESP's process: 13/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry
Grantee:Francisco Louzada Neto
Support Opportunities: Research Grants - Research, Innovation and Dissemination Centers - RIDC