Advanced search
Start date
Betweenand
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

Automatic machine translation error identification

Full text
Author(s):
de Jesus Martins, Debora Beatriz [1] ; Caseli, Helena de Medeiros [1]
Total Authors: 2
Affiliation:
[1] Univ Fed Sao Carlos, Sao Carlos, SP - Brazil
Total Affiliations: 1
Document type: Journal article
Source: MACHINE TRANSLATION; v. 29, n. 1, p. 1-24, MAR 2015.
Web of Science Citations: 0
Abstract

Although machine translation (MT) has been an object of study for decades now, the texts generated by the state-of-the-art MT systems still present several errors for many language pairs. Aiming at coping with this drawback, lots of efforts have been made to post-edit those errors either manually or automatically. Manual post-editing is more accurate but can be prohibitive when too many changes have to be made. Automatic post-editing demands less effort but can also be less effective and give rise to new errors. A way to avoid unnecessary automatic post-editing and new errors is by previously selecting only the machine-translated segments that really need to be post-edited. Thus, this paper describes the experiments carried out to automatically identify MT errors generated by a state-of-the-art phrase-based statistical MT system. Despite the fact that our experiments have been carried out using a statistical MT engine, we believe the approach can also be applied to other types of MT systems. The experiments investigated the well-known machine-learning algorithms Naive Bayes, Decision Trees and Support Vector Machines. Using the decision tree algorithm it was possible to identify wrong segments with around 77 % precision and recall when a small training corpus of only 2,147 error instances was used. Our experiments were performed on English-to-Brazilian Portuguese MT, and although some of the features are language-dependent, the proposed approach is language-independent and can be easily generalized to other language pairs. (AU)

FAPESP's process: 13/50757-0 - Analysis and integration of multiword expressions in speech and translation
Grantee:Helena de Medeiros Caseli
Support Opportunities: Regular Research Grants
FAPESP's process: 13/11811-0 - Learning from the web how to translate and paraphrase texts
Grantee:Helena de Medeiros Caseli
Support Opportunities: Regular Research Grants
FAPESP's process: 11/03799-4 - Automatic post editing of machine translated texts
Grantee:Débora Beatriz de Jesus Martins
Support Opportunities: Scholarships in Brazil - Master
FAPESP's process: 10/07517-0 - Machine Translation Portal: resources and tools for Brazilian Portuguese
Grantee:Helena de Medeiros Caseli
Support Opportunities: Regular Research Grants