Automatic machine translation error identification

de Jesus Martins, Debora Beatriz; Caseli, Helena de Medeiros

Texto completo
Autor(es):	de Jesus Martins, Debora Beatriz ^[1] ; Caseli, Helena de Medeiros ^[1] Número total de Autores: 2
Afiliação do(s) autor(es):	^[1] Univ Fed Sao Carlos, Sao Carlos, SP - Brazil Número total de Afiliações: 1
Tipo de documento:	Artigo Científico
Fonte:	MACHINE TRANSLATION; v. 29, n. 1, p. 1-24, MAR 2015.
Citações Web of Science:	0
Resumo
Although machine translation (MT) has been an object of study for decades now, the texts generated by the state-of-the-art MT systems still present several errors for many language pairs. Aiming at coping with this drawback, lots of efforts have been made to post-edit those errors either manually or automatically. Manual post-editing is more accurate but can be prohibitive when too many changes have to be made. Automatic post-editing demands less effort but can also be less effective and give rise to new errors. A way to avoid unnecessary automatic post-editing and new errors is by previously selecting only the machine-translated segments that really need to be post-edited. Thus, this paper describes the experiments carried out to automatically identify MT errors generated by a state-of-the-art phrase-based statistical MT system. Despite the fact that our experiments have been carried out using a statistical MT engine, we believe the approach can also be applied to other types of MT systems. The experiments investigated the well-known machine-learning algorithms Naive Bayes, Decision Trees and Support Vector Machines. Using the decision tree algorithm it was possible to identify wrong segments with around 77 % precision and recall when a small training corpus of only 2,147 error instances was used. Our experiments were performed on English-to-Brazilian Portuguese MT, and although some of the features are language-dependent, the proposed approach is language-independent and can be easily generalized to other language pairs. (AU)

Processo FAPESP:	13/50757-0 - Analysis and integration of multiword expressions in speech and translation
Beneficiário:	Helena de Medeiros Caseli
Modalidade de apoio:	Auxílio à Pesquisa - Regular


Processo FAPESP:	13/11811-0 - Aprendendo com a web a traduzir e parafrasear textos
Beneficiário:	Helena de Medeiros Caseli
Modalidade de apoio:	Auxílio à Pesquisa - Regular


Processo FAPESP:	11/03799-4 - Pós-edição automática de textos traduzidos automaticamente
Beneficiário:	Débora Beatriz de Jesus Martins
Modalidade de apoio:	Bolsas no Brasil - Mestrado


Processo FAPESP:	10/07517-0 - Portal de Tradução Automática: recursos e ferramentas para o português do Brasil
Beneficiário:	Helena de Medeiros Caseli
Modalidade de apoio:	Auxílio à Pesquisa - Regular

URL curto