Advanced search
Start date
Betweenand


Ranking of publications based on extraction of texts of the Internet

Full text
Author(s):
Henrique Przibisczki de Oliveira
Total Authors: 1
Document type: Master's Dissertation
Press: Campinas, SP.
Institution: Universidade Estadual de Campinas (UNICAMP). Instituto de Computação
Defense date:
Examining board members:
Ricardo de Oliveira Anido; Jacques Wainer; Altigran Soares da Silva
Advisor: Ricardo de Oliveira Anido
Abstract

Several current ranking methods compare different publication venues in relation to quality or impact. This information is very important for a researcher to choose renowned venues to publish his research. Institutes could promote their researchers based on the quality of places they have published. This information about the venues can also be valuable for a government to allocate resources to universities, or for companies to evaluate the quality of a candidate for a job. There are other distinct measures to perform a ranking of venues, but the idea in common among most of them is the use of citations. Therefore, despite the fact a venue is very prestigious for its researchers, if it is not indexed in a citation database, it will not be considered, since its "quality" cannot be measured. This work proposes to construct a ranking of publication venues obtaining the information not from a database, but from another data source: the Web. The university professor's webpages are visited to extract the publications. The venue is extracted from each publication, and thus, based on venues which a researcher wanted to show in his webpage, they are ranked. This method will include publication venues that do not exist in current databases, creating a new ranking of publications. Many interesting computational problems are discussed in this work: information search on the internet, text segmentation, extraction of components in a bibliographic citation, and clustering (AU)