Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Automatic machine translation error identification

Texto completo
Autor(es):
de Jesus Martins, Debora Beatriz [1] ; Caseli, Helena de Medeiros [1]
Número total de Autores: 2
Afiliação do(s) autor(es):
[1] Univ Fed Sao Carlos, Sao Carlos, SP - Brazil
Número total de Afiliações: 1
Tipo de documento: Artigo Científico
Fonte: MACHINE TRANSLATION; v. 29, n. 1, p. 1-24, MAR 2015.
Citações Web of Science: 0
Resumo

Although machine translation (MT) has been an object of study for decades now, the texts generated by the state-of-the-art MT systems still present several errors for many language pairs. Aiming at coping with this drawback, lots of efforts have been made to post-edit those errors either manually or automatically. Manual post-editing is more accurate but can be prohibitive when too many changes have to be made. Automatic post-editing demands less effort but can also be less effective and give rise to new errors. A way to avoid unnecessary automatic post-editing and new errors is by previously selecting only the machine-translated segments that really need to be post-edited. Thus, this paper describes the experiments carried out to automatically identify MT errors generated by a state-of-the-art phrase-based statistical MT system. Despite the fact that our experiments have been carried out using a statistical MT engine, we believe the approach can also be applied to other types of MT systems. The experiments investigated the well-known machine-learning algorithms Naive Bayes, Decision Trees and Support Vector Machines. Using the decision tree algorithm it was possible to identify wrong segments with around 77 % precision and recall when a small training corpus of only 2,147 error instances was used. Our experiments were performed on English-to-Brazilian Portuguese MT, and although some of the features are language-dependent, the proposed approach is language-independent and can be easily generalized to other language pairs. (AU)

Processo FAPESP: 13/50757-0 - Analysis and integration of multiword expressions in speech and translation
Beneficiário:Helena de Medeiros Caseli
Modalidade de apoio: Auxílio à Pesquisa - Regular
Processo FAPESP: 13/11811-0 - Aprendendo com a web a traduzir e parafrasear textos
Beneficiário:Helena de Medeiros Caseli
Modalidade de apoio: Auxílio à Pesquisa - Regular
Processo FAPESP: 11/03799-4 - Pós-edição automática de textos traduzidos automaticamente
Beneficiário:Débora Beatriz de Jesus Martins
Modalidade de apoio: Bolsas no Brasil - Mestrado
Processo FAPESP: 10/07517-0 - Portal de Tradução Automática: recursos e ferramentas para o português do Brasil
Beneficiário:Helena de Medeiros Caseli
Modalidade de apoio: Auxílio à Pesquisa - Regular