Scholarship 14/09599-5 - Mineração de dados, Análise de dados

Grant number:	14/09599-5
Support Opportunities:	Scholarships abroad - Research
Start date:	August 01, 2015
End date:	January 31, 2016
Field of knowledge:	Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques

Principal Investigator:	Rosane Minghim
Grantee:	Rosane Minghim
Host Investigator:	Evangelos Milios

Host Institution:	Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Institution abroad:	Dalhousie University, Halifax, Canada

Associated research grant:	11/22749-8 - Challenges in exploratory visualization of multidimensional data: paradigms, scalability and applications, AP.TEM


Abstract In recent years, researchers in various areas of knowledge related with data intensive computing (such as machine learning, data mining, visualization and applications) have been directing efforts towards dealing with larger and more diverse data sources. Additionally, there has been a renewed motivation to integrate research in those fields, since they are complementary in the manner they approach data analysis. This project proposes integrating mining and visualization techniques in order to explore and understand textual data in novel ways, seeking to tackle current problem s in Visual Text Analytics and focusing attention on a strategic applications. Current research in visual text mining has not yet succeeded in handling multiple scales of information or to effectively associate two or more sets from distinct sources treating the same sub jects. For instance, when political discourse is carried out on a particular subject and that same subject is discussed in the social media or in the news, they are not easily associated by the analyst. In this project we propose to devise new strategies to tackle the problem of handling association between sets of documents from different sources. By choosing a flexible representation - in out case - a network representation - we can employ the large body of work in graphs and networks to perform visual analysis, mining, and partitioning of textual data sets, as well as content match. This same framework also lends itself to multi-level representation through network partitioning, which is a valid strategy to support visual exploration of larger data sets. The project entails the three following activities: 1 - handle data sets of separate sources (such as government debates and news) as separate and associate networks, by means of a graph representation as well as a selection of algorithms to partition the space; 2- adaptation of multidimensional visualization techniques to explore such data sets; 3 - development of strategies to link the two or more related data sets and incorporate that into the visual mining set up. Our baseline data set during this first year of joint work is that of the Canadian parliament. The data collection stage is already well under way, developed during the last year under collaboration with the group at Dalhousie. With the collaboration with other colleagues and up on my return, our advances will be applied also to the Brazilian Congress. Target users vary from interested citizens to news professional and the government policy devisers. The main idea is to integrate the large text mining experience of the host institution with that of visualization from the proponent and her group, to contribute in studying the issues of how government structure is reflected in the media and in the general population. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More items Less items
TITULO

Articles published in other media outlets ( ):
More items Less items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Short URL