Advanced search
Start date
Betweenand

Learning from the web how to translate and paraphrase texts

Grant number: 13/11811-0
Support Opportunities:Regular Research Grants
Start date: September 01, 2013
End date: August 31, 2015
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Helena de Medeiros Caseli
Grantee:Helena de Medeiros Caseli
Host Institution: Centro de Ciências Exatas e de Tecnologia (CCET). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil
Associated researchers: Eloize Rossi Marques Seno ; Estevam Rafael Hruschka Júnior

Abstract

The automatic recognition of paraphrases and machine translation are two sub-areas of NaturalLanguage Processing (NLP) that share similarities like the fact that both deal with monolingual (forparaphrases) or bilingual (for translations) parallel texts (texts expressing the same content). However, only recently a few studies have been conducted exploring the combination of methods and techniques of these two subareas of NLP (BANNARD; CALLISON-BURCH, 2005, CALLISON-BURCH et al., 2006; BARREIRO, 2008; PANG et al., 2003). This project aims to investigate the automatic extraction of paraphrases and useful knowledge for machine translationusing the strategy of the never-ending language learning (NELL) and the web as the source ofknowledge. On-line repositories of knowledge like Wikipedia define, explain and exemplify knowledge in different ways. On-line repositories of subtitles as OpenSubtitles and SubDB and lyrics like Lyrics present versions of the same text in several languages. These repositories are valuable sources of information for methods able to automatically extract paraphrases and usefulknowledge for translation that will be designed following the strategy of NELL. NELL is a machinelearning strategy based on the constant and incremental learning carried out by the humans. The idea of NELL is to learn simple concepts and relationships between these concepts and then apply this knowledge to learn, in the future, something new and more complex (MITCHELL et al., 2008). This proposal is innovative in applying NELL in the two subareas of NLP cited above and may give rise to integrated approaches, thus contributing to the advancement in these and other areas of research. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
DE JESUS MARTINS, DEBORA BEATRIZ; CASELI, HELENA DE MEDEIROS. Automatic machine translation error identification. MACHINE TRANSLATION, v. 29, n. 1, p. 1-24, . (13/50757-0, 13/11811-0, 11/03799-4, 10/07517-0)