Advanced search
Start date
Betweenand

Data Governance for Data Science Processing and Analysis Environments

Grant number: 25/11608-7
Support Opportunities:Scholarships in Brazil - Doctorate (Direct)
Start date: July 01, 2025
End date: June 30, 2029
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Kelly Rosa Braghetto
Grantee:Rafael Hideki Suguimoto
Host Institution: Instituto de Matemática e Estatística (IME). Universidade de São Paulo (USP). São Paulo , SP, Brazil
Associated research grant:23/18026-8 - Center for Data Science in Public Statistics, AP.CCD

Abstract

Data can be considered a new form of fuel in the modern era. The Big Data paradigm drives organizations to become data-driven by implementing real-time data processing to achieve immediate results, applying predictive analytics to forecast trends and optimize operations, and leveraging machine learning technologies. Despite these efforts, organizations often struggle to fully realize the potential of their analytical data. This is largely due to their traditional reliance on monolithic data architectures for storing and processing data from multiple sources, and on centralized teams responsible for handling tasks across various organizational domains. Such approaches may lead to significant challenges in data governance, compromising the quality, integrity, security, and availability of organizational data.Efficient techniques for collecting, preparing, and integrating data are essential for governing high-frequency and high-granularity data from diverse sources [1]. Furthermore, there is a growing need to decentralize data architectures, enabling different data assets to be interconnected, easily accessible, and carefully monitored by the respective domain experts within the data ecosystem. This decentralization fosters opportunities to observe and analyze additional relationships among data and helps prevent the formation of data silos [2]. To this end, solutions such as DataOps and Data Mesh can be adopted.DataOps is derived from the DevOps movement-a collaborative and multidisciplinary effort within organizations to automate the continuous delivery of new software versions while ensuring their reliability and correctness [3]. In this context, DataOps incorporates principles such as communication, automation, cross-functional collaboration, continuous delivery, and continuous improvement. It aims to shorten the development lifecycle of data analytics by optimizing end-to-end processes, reducing costs, and supporting the delivery of high-quality data products [1]. This methodology can be combined with the concept of Data Mesh-an architectural and organizational data solution based on decentralization by data domains, where each domain is responsible for managing its own data and data products. Data Mesh treats data as a product to ensure the delivery of high-quality outputs, applies general-purpose tools and infrastructure to each domain to reduce the need for specialized expertise, and enforces federated governance to ensure interoperability across domains while maintaining robust data governance [4].Therefore, in this doctoral project, we aim to contribute to the advancement of Data Mesh research in two main areas. The first involves the implementation of an open-source, decentralized data architecture for scenarios in which data governance, transparency, and clearly defined responsibilities are vital for managing sensitive data. The second aims to address the gap in academic literature regarding the technical aspects of implementing this approach by proposing a methodology for Data Mesh implementation, along with a comprehensive mapping of technologies, methodologies, challenges, and existing solutions.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)