Advanced search
Start date

Multiparametric analysis of the phase problem in protein crystallography by deep learning

Grant number: 18/23675-7
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Effective date (Start): February 01, 2019
Effective date (End): January 31, 2020
Field of knowledge:Biological Sciences - Biochemistry
Principal Investigator:Andre Luis Berteli Ambrosio
Grantee:Mateus Piovezan Otto
Host Institution: Instituto de Física de São Carlos (IFSC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Associated research grant:13/07600-3 - CIBFar - Center for Innovation in Biodiversity and Drug Discovery, AP.CEPID


The phase problem is notorious in X-ray protein crystallography. Fundamentally, technological limitations on the detection systems result in the loss of information on the phases of the waves scattered constructively by the components of the crystal. As a consequence, the direct calculation of the electron density distribution function in the unit cell is impaired. Currently, two experimental approaches can be applied to overcome this problem: (I) partial replacement of the ordered aqueous solvent by electron-dense ions (metallic or halogenic) or (II) selective quantification of the dispersive component (lambda-dependent) of the atomic scattering factor. Alternatively, prior information, in the form of known crystal structures which are functionally related or homologous to components in the crystal, may serve as the source of an initial set of phases. Although challenging, when feasible, the applications of these different methods have already enabled the determination of more than a hundred thousand atomic models for the most diverse proteins (and their complexes). In this project, based on this collection of structural information already available in the \href{}{\textit{Protein Data Bank}}, we propose a multiparametric analysis of the phase problem, based on machine learning by neural networks (or decision trees). Our hypothesis is that the extensive statistical mapping of observations about known phase distributions, as a predictive model, can allow conclusions about the phase target values in unsolved data sets, thus eliminating the need for additional experiments or structures previously known. Recent advances in the field of machine learning, some of which are listed in this proposal, have allowed solutions to problems previously considered insurmountable and, therefore, substantiate our proposal.

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items

Please report errors in scientific publications list by writing to: