Advanced search
Start date
Betweenand

Center for Data Science in Public Statistics

Grant number:23/18026-8
Support Opportunities:Research Grants - Science Centers for Development
Start date: October 01, 2024
End date: September 30, 2029
Field of knowledge:Interdisciplinary Subjects
Principal Investigator:Carlos Eduardo Torres Freire
Grantee:Carlos Eduardo Torres Freire
Host Institution:Fundação Sistema Estadual de Análise de Dados (SEADE). São Paulo , SP, Brazil
City of the host institution:São Paulo
Principal investigatorsAlvaro Augusto Comin ; Caetano Traina Junior ; Eduardo de Rezende Francisco ; Kelly Rosa Braghetto ; Paulo Roberto Miranda Meirelles ; Renato Sérgio de Lima
Associated researchers:Adriano Galindo Leal ; Agma Juci Machado Traina ; Alexandre Abdal Cunha ; Alexandre Jorge Loloian ; Alexandre Rocha de Azevedo ; André de Freitas Gonçalves ; André Luis Squarize Chagas ; Bernadette Cunha Waldvogel ; David Esmael Marques da Silva ; Denis Henrique Pinheiro Salvadeo ; Diego Bogado Tomasiello ; Elmer Mateus Gennaro ; Érica Souza Siqueira ; Gabriel Silva Cogo ; Gabriela Spanghero Lotta ; Gustavo de Oliveira Coelho de Souza ; Hilda Carvalho de Oliveira ; Isabela Moutinho Sobral ; Jasmil Aparecido de Oliveira ; Juliana Teixeira de Souza Martins ; LARISSA MARQUES SARTORI ; Lucas Malta Mingardo ; Luís Paulo Bresciani ; Margret Althuon ; Maria Paula Ferreira ; Mariana Abrantes Giannotti ; Mateus Canniatti Ponchio ; Mateus Humberto Andrade ; Matheus Henrique Cunha Barboza ; Monica La Porte Teixeira ; Mônica Landi ; Murillo Marschner Alves de Brito ; Nelson Marconi ; Ney Lemke ; Paulo Borlina Maia ; Ricardo Corrêa Gomes ; Rogério dos Santos Acca ; Tainá Andreoli Bittencourt ; Vagner de Carvalho Bessa ; Valmir Jose Aranha ; VICTOR CALLIL
Associated research grant(s):25/13932-6 - Women in the STEM Labor Market: Trajectories and Inclusion, AP.R
25/13979-2 - Studies in Software Engineering in the Linux Kernel: Code Duplication and Continuous Integration Practices, AP.R
Associated scholarship(s):25/05395-0 - Mapping and Mitigating Bottlenecks in the Linux Kernel Development Model, BP.DD
25/22052-0 - Administrative informations, BP.IC
25/22559-7 - Data scraping on digital platforms, BP.IC
+ associated scholarships 25/22064-8 - Administrative informations, BP.IC
25/18939-9 - Production and analysis of data for the research line on urban mobility, BP.TT
25/16794-3 - Spatial Data for Urban Mobility, BP.IC
25/15710-0 - Intelligent Analysis of News Data to Support Investment Policy Development in São Paulo State, BP.MS
24/22432-4 - Methodological approaches to foster best practices in Software Engineering and Open Science in Research Software projects, BP.IC
25/09811-9 - Production and analysis of data for labor market research, BP.TT
25/11608-7 - Data Governance for Data Science Processing and Analysis Environments, BP.DD
25/06893-4 - Decentralized Data Governance Platform for Supporting the Production of Public Indicators, BP.IC
25/06891-1 - Data Governance for Data Science Analysis and Processing Environments, BP.IC
25/05538-6 - Public Statistics Data Science Center: Structuring and Generating the Economic Activity Monitor, BP.TT
24/22291-1 - Using the Grouping Operator from SQL for Data Preparation processes (ETL) using similarity, BP.IC - associated scholarships

Abstract

The Center for Data Science in Public Statistics (CCDEP) will seek solutions to the following research problem: how to use high-frequency data to produce public policy indicators? To this end, the project will be organized into lines of research with the following characteristics: production of new knowledge for a specific public policy area; use of high-frequency, large-volume, structured or unstructured data; partnerships with public bodies, non-governmental entities, public and private companies to obtain and analyze data and disseminate results. The Center will initially include five lines of research: 1. ECONOMIC ACTIVITY MONITOR: Use high-frequency data and digital administrative records from public agencies, public service concessionaires and other sources to develop indicators, predictive models and analyzes of the São Paulo economy in a timely manner and with geographic disaggregation. 2. WORK MONITOR: Combine traditional databases with different collection methods (via assisted telephone, Interactive Voice Response and Internet) to produce indicators on occupation and unemployment, training and insertion into the job market. 3. MOBILITY MONITOR: Develop indicators from large sets of automatically collected data (user location and ticketing) linked to data traditionally used in transport planning and management (origin and destination research) and GIS (infrastructure, firms, education and health etc.). 4. PUBLIC SECURITY MONITOR: Develop new indicators and improve methodologies for indicators of criminal events, using new data sources and technologies for automating text reading, classification and coding, such as machine learning models. 5. TECHNOLOGIES IN DATA AND SOFTWARE ENGINEERING: Development of technologies for data analysis and generation of applications supported by data sciences and current facilitating technologies: study and development of theoretical aspects; generation of algorithms and technologies; building, producing, supporting and hosting applications and their associated tools. The activities will be aimed at generating practical solutions to support the Center's other lines of research. The Center articulates a network of partners, with SEADE being the host institution, an agency with a tradition of producing statistics for public policies in São Paulo and for the São Paulo State's citizens. They will be research institutions (such as USP, Unesp, FGV, UCL and Cebrap), non-governmental entities (such as the Brazilian Public Security Forum) and government bodies (such as São Paulo State secretariats, Metro and CPTM). New partnerships may be established throughout the Center's existence. SEADE technicians, researchers from partner institutions and public managers from the agencies make up a multidisciplinary project team. The data management plan contemplates the need for a computational infrastructure for storing, processing and disseminating data with flexibility and capacity for ETL (extraction, transformation and loading) processes and developing of machine learning models, in addition to high-quality computing resources. capacity and software (free and proprietary). Research results will be shared among partners and disseminated in: dashboards, data repositories, APIs; analytical bulletins; scientific articles; audiovisual content; and events. Official communication channels (such a website, social network, communications consultancy) will disseminate the content to audiences with different profiles, mainly government bodies, who will be direct beneficiaries of the new knowledge produced by the Center. SEADE's physical and administrative-financial infrastructure and its staff will be used at the Center, which will be structured for long-term operation, with planning and resources for, initially, 4 years. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications (6)
(The scientific publications listed on this page originate from the Web of Science or SciELO databases. Their authors have cited FAPESP grant or fellowship project numbers awarded to Principal Investigators or Fellowship Recipients, whether or not they are among the authors. This information is collected automatically and retrieved directly from those bibliometric databases.)
NEPOMUCENO, PEDRO IVO SIQUEIRA; BRAGHETTO, KELLY ROSA. Managing semantic evolution in databases: From theory to implementation. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, v. 177, p. 14-pg., . (23/18026-8, 23/00779-0)
DE AGUIAR, ERIKSON J.; TRAINA, AGMA J. M.; HELAL, SUMI. MedTimeSplit: Continual dataset partitioning to mimic real-world settings for federated learning on Non-IID medical image data. 2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, BIGDATA, v. N/A, p. 10-pg., . (24/13328-9, 16/17078-0, 23/18026-8, 23/14759-0)
DE AGUIAR, ERIKSON J.; TRAINA, AGMA J. M.; HELAL, SUMI. SentinelAdvMedical: toward adversarial attacks detection on medical image classification via Out-Of-Distribution strategies. MEDICAL IMAGING 2025: COMPUTER-AIDED DIAGNOSIS, v. 13407, p. 7-pg., . (21/08982-3, 23/18026-8, 16/17078-0, 23/14759-0, 24/13328-9)
DO CARMO, ERICA PETERS; TRAINA-, CAETANO, JR.; TRAINA, AGMA J. M.. FairMed-FL: Federated Learning for Fair and Unbiased Deep Learning in Medical Imaging. 2025 IEEE 38TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, v. N/A, p. 6-pg., . (16/17078-0, 23/18026-8)
HONORATO, EDUARDO S.; UCHIDA, MARIANA AYA S.; TRAINA, AGMA J. M.; WOLF, DENIS F.. Improving U-Net with Attention Mechanism for Medical Image Segmentation Applications. 2025 IEEE 38TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, v. N/A, p. 6-pg., . (24/13328-9, 23/18026-8, 16/17078-0)
TINARRAGE, RAPHAEL; PONCIANO, JEAN R.; LINHARES, CLAUDIO D. G.; TRAINA, AGMA J. M.; POCO, JORGE. ZigzagNetVis: Suggesting Temporal Resolutions for Graph Visualization Using Zigzag Persistence. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, v. 31, n. 10, p. 18-pg., . (22/13190-1, 20/07200-9, 23/18026-8, 21/07012-0, 20/10049-0, 16/17078-0)