Advanced search
Start date
Betweenand

Semantic driven automated post-editing for Brazilian Portuguese

Grant number: 16/21317-0
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Start date: March 01, 2017
End date: December 31, 2018
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Helena de Medeiros Caseli
Grantee:Marcio Lima Inácio
Host Institution: Centro de Ciências Exatas e de Tecnologia (CCET). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil

Abstract

The Machine Translation (MT) is one of the most important applications (and subfields) of Natural Language Processing (NLP). MT systems generate, in a target language, an equivalent version of a text provided as input, in a source language. After more than 70 years of research in MT and various approaches have been proposed and implemented - such as rule-based MT, phrase-based statistical MT and neural MT - it is not possible yet to achieve the ambitious goals of its appearance: the full-automatic translation with good quality for unrestricted domains. Therefore, the automatic translations, as a rule, have to be post-edited by humans to become accurate and fluent in the target language. However, the manual post-editing is an arduous process that requires specialized effort. In this context, several proposals for automated post-editing have emerged in recent years. This project aims to investigate the automated post-editing based on semantic knowledge. One of the most traditional forms for representing textual semantics is based on the distributional hypothesis which considers the context of the words. This contextual information can be mapped into the distributional semantic models (DSMs). DSMs represent words as vectors in a high-dimension space which associates words with their occurrence contexts. Thus, this project aims to investigate how the DSMs can be applied in automated post-editing. This proposal is related to the MMeaning project (Regular Aid from FAPESP #2016/13002-0). (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
CASELI, HELENA DE MEDEIROS; INACIO, MARCIO LIMA; CALZOLARI, N; BECHET, F; BLACHE, P; CHOUKRI, K; CIERI, C; DECLERCK, T; GOGGI, S; ISAHARA, H; et al. NMT and PBSMT Error Analyses in English to Brazilian Portuguese Automatic Translations. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), v. N/A, p. 7-pg., . (16/21317-0, 16/13002-0)
INACIO, MARCIO LIMA; CASELI, HELENA DE MEDEIROS; QUARESMA, P; VIEIRA, R; ALUISIO, S; MONIZ, H; BATISTA, F; GONCALVES, T. Word Embeddings at Post-Editing. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, v. 12037, p. 12-pg., . (16/21317-0, 16/13002-0)