|Support type:||Scholarships in Brazil - Post-Doctorate|
|Effective date (Start):||October 01, 2010|
|Effective date (End):||August 31, 2011|
|Field of knowledge:||Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques|
|Principal Investigator:||Caetano Traina Junior|
|Home Institution:||Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil|
Recently, XML has been established as a major standard for information exchange and management, and has been broadly employed for data representation and storage. With the increased use of XML, specially on the Web, developing efficient search and retrieval techniques for XML data becomes very important, particularly for the database (DB) and information retrieval (IR) communities. As XML allows combining both structured and unstructured data, recent trends in DB and IR research show a growing interest to merge DB and IR techniques, exploiting IR methods in DBs and vice versa, for example extending DB-style XML query languages to support ranked results. In this project, we intend to develop a framework for efficiently producing ranked results for keyword search queries over large heterogeneous XML document collections. Our focus is based on employing the IR keyword search model, aiming to develop a keyword-based search environment that is more adapted and efficient to search and retrieve XML-encoded documents exploiting algorithms to define an efficient keyword-based XML search approach. Therefore, instead of exploiting complex query languages, like XML-QL, XQL or XQuery, to search on XML data, we retain the simple, widely used keyword search query method and exploit XML's nested structure during query processing. In other words, we aim at developing a user-friendly technique for searching XML data where users can express queries in the simplest possible form (keywords), in a way that less control is given to the user and more of the logic is put in the ranking mechanism to best match the user's needs.