Advanced search
Start date

Automatic clustering based on nature inspired metaheuristics

Grant number: 17/06142-2
Support type:Scholarships in Brazil - Scientific Initiation
Effective date (Start): June 01, 2017
Effective date (End): May 31, 2018
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Adriane Beatriz de Souza Serapião
Grantee:Maynara Natalia Scoparo
Home Institution: Instituto de Geociências e Ciências Exatas (IGCE). Universidade Estadual Paulista (UNESP). Campus de Rio Claro. Rio Claro , SP, Brazil


Data clustering is one of the most important unsupervised techniques ofdata management, which is used in many scientific and engineeringapplications, such as machine learning, data mining, pattern recongnitionand image processing. It consists of splitting a dataset into smallersubsets, named clusters. The partition of datasets is obtained byestablishing a function that assigns each object of the dataset to asubset, so that similiar objects are in the same cluster. A fundamentalchallenge in clustering analyzis is to determine the best estimate of thenumber of clusters, which is recongnized as the automatic clusteringproblem. The difficulty of choosing the appropriate number of clusters isdue to the lack of previous knowledge about the application's domain,especially when the data have many dimensions, when the clusters differwidely distinct in shape, size and density and when there is overlapbetween groups. In this project, three Swarm Intelligence algorithms willlbe used for the automatic clustering problem in numeric datasets. Suchalgorithms wil be developed to optimize division criteria, usingclustering measures, in order to find the optimal number of clusters andthe centroids coordinates. The bioinspired optimization methods WhaleOptimization Algorithm, Cuckoo Search and Cat Swarm Optmization will beadapted for the clustering task by using the partitional aproach. In orderto evaluate the results of these algorithms for automatic clustering,internal and external validation indexes will be used.