Advanced search
Start date
Betweenand

The SCDtRanslator tool: conversion of documents from PDF to XML applied to the domain of medical papers of sickle cell disease

Grant number: 08/10621-4
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Start date: April 01, 2009
End date: November 30, 2010
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Cristina Dutra de Aguiar
Grantee:Arthur Emanuel de Oliveira Carosia
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil

Abstract

Sickle cell disease (SCD) is a genetic and hereditary disease that does not have cure, and requires an early and adequate treatment to prolong the life of the patients. In Brazil, the study of the SCD is not well very intensive; but there are already some international scientific papers that describe relevant results. Exploring the knowledge described in these papers aiming at identifying patterns that indicate important or unknown relationships or that can be used to predict future facts is essential to assist new researches in this area. A first challenge of exploring this knowledge is to convert documents from the PDF format to the XML format. While SCD medical papers are usually available in the PDF format, the XML format allows an easy manipulation of the text aiming at applying text mining algorithms. Although there are some tools that convert documents between different formats, they face several drawbacks, which introduce conversion errors. Besides, almost all tools are not open source. The objective of this project is to develop the SCDtRanslator tool, which is aimed at converting SCD scientific papers from the PDF to the XML format. The tool focuses on the correct conversion taking into account the particular characteristics of the papers under analysis and on the use of the XML documents to extract data of interest. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)