Disputation von Jochen Sieg am 22.03.2024, 12:30 Uhr
22. März 2024, von Reinhard Zierke
Einladung zur hochschulöffentlichen Disputation im Rahmen des Promotionsverfahrens von Jochen Sieg
Titel der Dissertation: Methods for Processing and Analyzing Protein Structure Collections for Data-Driven Structure-Property Relationship Modeling
Effective prediction of the properties of biomolecules could answer crucial research questions: Which biomolecule would be an effective drug for a particular disease? Will a mutation in a patient be pathologic? Which biomolecule can break down materials like plastics? The structure-property relationship paradigm is a central concept describing that the biomolecule’s structure determines its properties. Especially for proteins, the so-called building blocks of life, high-quality three-dimensional structure data has increased tremendously in the last years. Data-driven prediction methods, like machine learning, are a promising choice to predict properties from the structure data. However, such data-driven methods are subject to data limitations and need protein representations adequate for proteins’ nature and properties. In this work, methods were developed to analyze and process data sets for improving data-driven property prediction. First, a machine learning-based interpretability method was developed to analyze predictive features on a data set for a given property-prediction task. The technique was first applied to analyze unbiasing strategies in benchmark data sets for structure-based virtual screening in drug discovery. Then, it was extended with the Shapley Values framework and used to interpret stabilizing protein adaptations for protein engineering. Besides important domain-specific trends, the analyses demonstrated that data limitations are a profound bottleneck in structure-property modeling. Obtaining more data is often not possible. An effective alternative can be to process the existing data to derive better protein representations for the task at hand. Two processing methods that describe relevant protein variabilities using structure ensembles were developed. The first method enumerates alternative conformations from AltLoc annotations to represent proteins’ inherent flexibility. The second method constructs structure ensembles through the similarity of residue 3D micro-environments to represent the structural changes upon single mutations. Both methods can be applied to entire protein structure collections and provide essential data and an improved representation of proteins for various property-prediction tasks, method development, and molecular modeling.
The presentation will be given in English
Datum und Uhrzeit: Freitag, 22. März 2024 um 12:30 Uhr
Ort: ZBH, Albert-Einstein-Ring 8-10, Raum 005
Betreuer: Prof. Dr. Matthias Rarey
Prof. Dr. Matthias Rarey
Vorsitzender des Fachpromotionsausschusses Informatik