Discretização e geração de gráficos de dados em aprendizado de máquina

Richardson Floriani Voltolini

Full text
Author(s):	Richardson Floriani Voltolini Total Authors: 1
Document type:	Master's Dissertation
Press:	São Carlos.
Institution:	Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:	2006-11-17
Examining board members:	Maria Carolina Monard; Braulio Coelho Avila; Solange Oliveira Rezende
Advisor:	Maria Carolina Monard
Abstract
The great quantity and variety of information acquired and stored electronically and the lack of human capacity to analyze it, have motivated the development of Data Mining - DM - a process that attempts to extract new and useful knowledge from databases. One of the steps of the DM process is data preprocessing. The main goals of the data preprocessing step are to enable the user to have a better understanding of the data being used and to transform the data so it is appropriate for the next step of the DM process related to pattern extraction. A technique concerning the first goal consists of the graphic representation of records (examples) of databases. There are various methods to generate these graphic representations, each one with its own characteristics and objectives. Furthermore, still in the preprocessing step, and in order to transform the raw data into a more suitable form for the next step of the DM process, various data discretization technique methods which transform continuous database attribute values into discrete ones can be applied. This work presents some frequently used methods of graph generation and data discretization. Related to the graph generation methods, we have developed a system called DISCOVERGRAPHICS, which offers different interfaces for graph generation. These interfaces allow both advanced and beginner users, as well as other systems, to access the DISCOVERGRAPHICS system facilities. Regarding the second subject of this work, data discretization, we considered various supervised and unsupervised methods and proposed a new unsupervised data discretization method called K-MeansR. Using different evaluation measures and databases, all these methods were experimentally compared to each other and statistical tests were run to analyze the experimental results. These results showed that the proposed method performed better than many of the other data discretization methods considered in this work (AU)

Short URL