Advanced search
Start date
Betweenand

Scalable descriptive models over extensive volumes of distributed data

Grant number: 19/09817-6
Support Opportunities:Regular Research Grants
Start date: February 01, 2020
End date: July 31, 2022
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Murilo Coelho Naldi
Grantee:Murilo Coelho Naldi
Host Institution: Centro de Ciências Exatas e de Tecnologia (CCET). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil
Associated researchers: Elaine Ribeiro de Faria Paiva ; Ricardo Cerri ; Ricardo José Gabrielli Barreto Campello

Abstract

The increasing amount of data generated by today's technologies makes its analysis challenging. First, because much of this data is often not identified (labeled) during its creation and therefore the organization/relationship between its objects are not explicit. Second, it is necessary that the methods used for this analysis are scalable enough to reach their goals, even with the increase in the amount of data analyzed. Having such questions in mind, data clustering is appropriate as part of the analysis of these data, since it consists of a set of unsupervised techniques that allow the automatic categorization of data. Based on these techniques, it is possible to obtain a descriptive analysis of the data using implicit information, the relations between objects and their structure. However, traditional clustering techniques were developed for small, static datasets. Their limitations do not always allow scalability, that is, their application in larger, distributed data sets or even in data sets that are constantly growing. This project aims at researching clustering techniques applicable in incremental data sets. Its objective is to achieve that goal through two research fronts: the first is the adaptation of algorithms to scalable programming models, which allow the use of division and conquest methods for data distribution and management; the second consists of researching clustering algorithms that generate a model that adapts itself as the data set is incremented, that is, the data is continuously analyzed by the algorithm. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications (4)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
VALEJO, ALAN DEMETRIUS BARIA; DE OLIVEIRA DOS SANTOS, WELLINGTON; NALDI, MURILO COELHO; ZHAO, LIANG. A review and comparative analysis of coarsening algorithms on bipartite networks. European Physical Journal-Special Topics, v. 230, n. 14-15, p. 2801-2811, . (19/09817-6, 13/07375-0, 19/07665-4, 15/50122-0, 19/14429-5)
CANDIDO, PAULO GUSTAVO LOPES; SILVA, JONATHAN ANDRADE; FARIA, ELAINE RIBEIRO; NALDI, MURILO COELHO. Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation. APPLIED SCIENCES-BASEL, v. 12, n. 13, p. 22-pg., . (19/09817-6)
VALEJO, ALAN DEMETRIUS BARIA; DE OLIVEIRA DOS SANTOS, WELLINGTON; NALDI, MURILO COELHO; ZHAO, LIANG. A review and comparative analysis of coarsening algorithms on bipartite networks. European Physical Journal-Special Topics, . (19/09817-6, 19/07665-4, 13/07375-0, 15/50122-0, 19/14429-5)
ARAUJO NETO, ANTONIO CAVALCANTE; NALDI, MURILO COELHO; CAMPELLO, RICARDO J. G. B.; SANDER, JORG; IEEE COMP SOC. CORE-SG: Efficient Computation of Multiple MSTs for Density-Based Methods. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), v. N/A, p. 14-pg., . (19/09817-6)