Advanced search
Start date
Betweenand

Uncertainty Quantification for Symbolic Regression: Towards Robust Models

Grant number: 25/19085-3
Support Opportunities:Scholarships in Brazil - Post-Doctoral
Start date: January 01, 2026
End date: June 30, 2026
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Fabricio Olivetti de França
Grantee:Julia Lena Reuter
Host Institution: Centro de Matemática, Computação e Cognição (CMCC). Universidade Federal do ABC (UFABC). Santo André , SP, Brazil

Abstract

Regression analysis is a statistical tool with the goal of explaining the relationship between measurable variables. Such a tool is often used in curve fitting applications where interpolation and understanding the trends of the data are important. Another important use of regression analysis is in physical sciences, where the scientist proposes a parametric function that fits the expectation when first-principle models are inadequate. Symbolic Regression (SR) is one option to automate the discovery of such models capable of searching for a model that not only possesses a good fit but also adheres to certain desiderata. An often ignored topic in SR research is uncertainty quantification (UQ) that estimates the uncertainties of the collected observations, the proposed model, and its parameters.This is necessary to understand the limitations of the model, guide the search towards better and physically plausible models, and prompt for additional data collection around uncertainty regions. The sources of uncertainties can be aleatoric and epistemic. The aleatoric uncertainty is related to the data collection and originates from various sources, such as imprecise measurements, external noise, or even unmeasurable variables. This uncertainty is irreducible unless the source of the randomness is attacked (e.g., a more precise measurement equipment is used). Knowing how much of the noise is irreducible, helps to detect overfitting, for example.The epistemic uncertainty is related to the model structure, most frequently the numerical parameters. This happens when the model is over-simplified (e.g., using a linear model) or not having enough data to calibrate the parameters. Since in SR, not only the parameters are fitted to the data but the function structure is selected from the vast search space of hypotheses.This uncertainty also affects the selection of the model structure. As such, quantifying the uncertainties and mitigating its effect is important in SR algorithms to ensure a trustworthy model (or at least knowing how much we can trust them), and guide the search to provide the most plausible model among the candidates given the UQ.This project will start the vast study of UQ within SR context by first making a thorough study about UQ in nonlinear regression and the current applications of UQ to SR.We aim to study and propose the best approaches for mitigating the effects of uncertainties during the search for a model.Moreover, we will provide a reference implementation of a reliable and stable SR algorithm capable of handling the uncertainties and allowing the user to explore robust models for their own observational data. Besides the candidate and the supervisor, this project will rely on the collaboration with researchers of different fields that will share real-world data, their perspectives with UQ in SR, and their own desiderata. We expect that, by the end of this project, we will have submitted three papers for high-impact journals and conferences in computer science, as well as continue the collaboration with other research groups. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)