Advanced search
Start date
Betweenand

Use of data complexity measures in the support of supervised machine learning

Abstract

Machine Learning (ML) techniques have been successfully employed in the solution of various data classification problems. Recently some studies are devoted to understand how quantitative measures quantifying the complexity of data sets used for classification, such as the geometric overlap between classes, affects the performance of ML techniques. Among the contributions of this approach is a better understanding of the domains of competence and limitations of these techniques. This project will initially study different measures to characterize the complexity of classification problems. Although there is a variety of measures in the literature, studies about their areas of expertise are not frequent, namely in what types of application and analysis they may be more appropriate. We then intend to use these measurements in the support of the reduction in the complexity involved in solving a given classification problem. A first attempt in this direction is to pre-process data, so as to reduce the complexity of new datasets generated. Two pre-processing tasks will be investigated: noise identification and feature subset selection. In another front, the reduction in the complexity in solving a classification problem will be addressed by employing a divide-and-conquer strategy. In this case, the goal is to find sub problems of lower complexity, whose solutions can be combined to solve the original classification problem. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications (24)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
LORENA, ANA C.; MACIEL, ARON I.; DE MIRANDA, PERICLES B. C.; COSTA, IVAN G.; PRUDENCIO, RICARDO B. C.. Data complexity meta-features for regression problems. MACHINE LEARNING, v. 107, n. 1, SI, p. 209-246, . (12/22608-8)
PISANI, PAULO HENRIQUE; POH, NORMAN; DE CARVALHO, ANDRE C. P. L. F.; LORENA, ANA CAROLINA. Score normalization applied to adaptive biometric systems. COMPUTERS & SECURITY, v. 70, p. 565-580, . (12/22608-8, 13/07375-0, 12/25032-0)
GARCIA, LUIS P. F.; LEHMANN, JENS; DE CARVALHO, ANDRE C. P. L. F.; LORENA, ANA C.. New label noise injection methods for the evaluation of noise filters. KNOWLEDGE-BASED SYSTEMS, v. 163, p. 693-704, . (16/18615-0, 13/07375-0, 12/22608-8)
PISANI, PAULO HENRIQUE; LORENA, ANA CAROLINA; DE CARVALHO, ANDRE C. P. L. F.. Adaptive algorithms applied to accelerometer biometrics in a data stream context. Intelligent Data Analysis, v. 21, n. 2, p. 353-370, . (12/25032-0, 13/07375-0, 12/22608-8)
BARELLA, VICTOR H.; GARCIA, LUIS P. F.; DE SOUTO, MARCILIO C. P.; LORENA, ANA C.; DE CARVALHO, ANDRE C. P. L. F.. Assessing the data complexity of imbalanced datasets. INFORMATION SCIENCES, v. 553, p. 83-109, . (13/07375-0, 15/01382-0, 12/22608-8)
OKIMOTO, LUCAS CHESINI; SAVII, RICARDO MANHAES; LORENA, ANA CAROLINA; IEEE. Complexity Measures Effectiveness in Feature Selection. 2017 6TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), v. N/A, p. 6-pg., . (12/22608-8)
TRAMBAIOLLI, L. R.; SPOLAOR, N.; LORENA, A. C.; ANGHINAH, R.; SATO, J. R.. Feature selection before EEG classification supports the diagnosis of Alzheimer's disease. CLINICAL NEUROPHYSIOLOGY, v. 128, n. 10, p. 2058-2067, . (12/22608-8, 13/00506-1, 13/10952-9, 13/10498-6)
MORALES, PABLO; LUENGO, JULIAN; GARCIA, LUIS P. F.; LORENA, ANA C.; DE CARVALHO, ANDRE C. P. L. F.; HERRERA, FRANCISCO. The NoiseFiltersR Package: Label Noise Preprocessing in R. R JOURNAL, v. 9, n. 1, p. 219-228, . (12/22608-8, 13/07375-0, 11/14602-7)
MUNOZ, MARIO ANDRES; YAN, TAO; LEAL, MATHEUS R.; SMITH-MILES, KATE; LORENA, ANA CAROLINA; PAPPA, GISELE L.; RODRIGUES, ROMULO MADUREIRA. An Instance Space Analysis of Regression Problems. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, v. 15, n. 2, . (12/22608-8)
QUITERIO, THAISE M.; LORENA, ANA C.; IEEE. Determining the Structure of Decision Directed Acyclic Graphs for Multiclass Classification Problems. PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), v. N/A, p. 6-pg., . (12/22608-8, 15/17291-3)
PIMENTEL, BRUNO ALMEIDA; DE CARVALHO, ANDRE C. P. L. E.; IEEE. Statistical versus Distance-Based Meta-Features for Clustering Algorithm recommendation Using Meta-Learning. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), v. N/A, p. 8-pg., . (12/22608-8, 17/20265-0, 16/18615-0)
PIMENTEL, BRUNO ALMEIDA; DE CARVALHO, ANDRE C. P. L. E.; IEEE. Unsupervised Meta-Learning for Clustering Algorithm Recommendation. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), v. N/A, p. 8-pg., . (12/22608-8, 17/20265-0, 16/18615-0)
LORENA, ANA C.; MACIEL, ARON I.; DE MIRANDA, PERICLES B. C.; COSTA, IVAN G.; PRUDENCIO, RICARDO B. C.. Data complexity meta-features for regression problems. MACHINE LEARNING, v. 107, n. 1, p. 38-pg., . (12/22608-8)
OKIMOTO, LUCAS C.; LORENA, ANA C.; IEEE. Data complexity measures in feature selection. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), v. N/A, p. 8-pg., . (12/22608-8)
SOUSA, ARUA DE M.; LORENA, ANA C.; BASGALUPP, MARCIO P.; IEEE. GEEK: Grammatical Evolution for automatically Evolving Kernel functions. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, v. N/A, p. 8-pg., . (12/22608-8, 16/02870-0)
QUITERIO, THAISE M.; LORENA, ANA C.. Using complexity measures to determine the structure of directed acyclic graphs in multiclass classification. APPLIED SOFT COMPUTING, v. 65, p. 428-442, . (12/22608-8, 15/17291-3)
PIMENTEL, BRUNO ALMEIDA; DE CARVALHO, ANDRE C. P. L. F.. A Meta-learning approach for recommending the number of clusters for clustering algorithms. KNOWLEDGE-BASED SYSTEMS, v. 195, . (16/18615-0, 17/20265-0, 12/22608-8)
NAGAI, JAMES S.; SOUSA, HERIO; AONO, ALEXANDRE H.; LORENA, ANA C.; KUROSHU, REGINALDO M.; IEEE. Gene Essentiality Prediction Using Topological Features From Metabolic Networks. 2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), v. N/A, p. 6-pg., . (12/22608-8)
SPOLAOR, NEWTON; LORENA, ANA CAROLINA; LEE, HUEI DIANA. Feature Selection via Pareto Multi-objective Genetic Algorithms. APPLIED ARTIFICIAL INTELLIGENCE, v. 31, n. 9-10, p. 764-791, . (09/12963-2, 12/22608-8)
PISANI, PAULO HENRIQUE; LORENA, ANA CAROLINA; DE CARVALHO, ANDRE C. P. L. F.. Adaptive Biometric Systems Using Ensembles. IEEE INTELLIGENT SYSTEMS, v. 33, n. 2, p. 19-28, . (12/22608-8, 13/07375-0, 12/25032-0)
RIVOLLI, ADRIANO; READ, JESSE; SOARES, CARLOS; PFAHRINGER, BERNHARD; DE CARVALHO, ANDRE C. P. L. F.. An empirical analysis of binary transformation strategies and base algorithms for multi-label learning. MACHINE LEARNING, v. 109, n. 8, . (16/18615-0, 13/07375-0, 12/22608-8)
DE MELO, VINICIUS V.; LORENA, ANA C.; IEEE. Using Complexity Measures to Evolve Synthetic Classification Datasets. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), v. N/A, p. 8-pg., . (17/20844-0, 12/22608-8)
GARCIA, LUIS P. F.; RIVOLLI, ADRIANO; ALCOBACA, EDESIO; LORENA, ANA C.; DE CARVALHO, ANDRE C. P. L. F.. Boosting meta-learning with simulated data complexity measures. Intelligent Data Analysis, v. 24, n. 5, p. 1011-1028, . (12/22608-8, 13/07375-0, 18/14819-5, 16/18615-0)
GARCIA, LUIS P. F.; LORENA, ANA C.; DE SOUTO, MARCILIO C. P.; HO, TIN KAM; IEEE. Classifier Recommendation Using Data Complexity Measures. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), v. N/A, p. 6-pg., . (12/22608-8)