Um estudo sobre o papel de medidas de similaridade em visualização de coleções de documentos

Frizzi Alejandra San Roman Salazar

Full text
Author(s):	Frizzi Alejandra San Roman Salazar Total Authors: 1
Document type:	Master's Dissertation
Press:	São Carlos.
Institution:	Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:	2012-09-27
Examining board members:	Maria Cristina Ferreira de Oliveira; Rosane Minghim; Guilherme Pimentel Telles
Advisor:	Maria Cristina Ferreira de Oliveira
Abstract
Information visualization techniques, such as similarity based point placement, are used for generating of visual data representation that evidence some patterns. These techniques are sensitive to data quality, which depends of a very influential preprocessing step. This step involves cleaning the text and in some cases, detecting terms and their weights, as well as definiting a (dis)similarity function. There are few studies on how these (dis)similarity calculations aect the quality of visual representations for textual data. This work presents a study on the role of the various (dis)similarity measures in generating visual maps. We focus primarily on two types of distance functions, those based on vector representations of the text (Vector Space Model (VSM)) and measures obtained from direct comparison of text strings, comparing the effect on the visual maps obtained with point placement techniques with the two approaches. For this, objective measures were employed to compare the visual quality of the generated maps, such as the Neighborhood Hit and Silhouette Coefficient. We found that both approaches have strengths, but in general, the VSM showed better results as far as class discrimination is concerned. However, the conventional VSM is not incremental, i.e., new additions to the collection force the recalculation of the data space and dissimilarities previously computed. Thus, a new model based on incremental VSM (Incremental Vector Space Model (iVSM)) has been also considered in our comparative studies. iVSM showed the best quantitative and qualitative results in several of the configurations considered. The evaluation results are presented and recommendations on the application of different similarity measures for text analysis tasks visually are provided (AU)

FAPESP's process:	10/03100-8 - Approaching the problem of updating similarity computation for visualizing dynamic document collections
Grantee:	Frizzi Alejandra San Roman Salazar
Support Opportunities:	Scholarships in Brazil - Master

Short URL