Structural pattern recognition for chemical-compound virtual screening

  1. García Hernández, Carlos Jesús
unter der Leitung von:
  1. Francesc Serratosa Casanelles Doktorvater/Doktormutter

Universität der Verteidigung: Universitat Rovira i Virgili

Fecha de defensa: 12 von November von 2021

Gericht:
  1. Benoit Gaüzère Präsident/in
  2. Benjami Martorell Masip Sekretär/in
  3. Jesús Vicente de Julián Ortiz Vocal

Art: Dissertation

Teseo: 701714 DIALNET lock_openTDX editor

Zusammenfassung

Studying molecules and predicting their properties is an open problem in chemistry and drug design. Using computers to perform those analyses is known as cheminformatics. It aims to tackle the dimensionality problem and reduce the time and resources required to analyze millions of molecules. Drug discovery and design require satisfying important safety and efficacy objectives; therefore, it is inherently a multi-objective optimization process, making machine learning and graph theory a standard tool in cheminformatics research. Molecules are naturally shaped as networks, making them ideal for studying by employing their graph representations, where nodes represent atoms and edges represent the chemical bonds. An alternative for this straightforward representation is the extended reduced graph, which summarizes the chemical structures using pharmacophore-type node descriptions to encode the relevant molecular properties. Once we have a suitable way to represent molecules as graphs, we need to choose the right tool to compare and analyze them. Graph edit distance is used to solve the error-tolerant graph matching; this methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications (known as edit operations) have an edit cost (also known as transformation cost) associated, which must be determined depending on the problem. This study investigates the effectiveness of a graph-only driven molecular comparison employing extended reduced graphs and graph edit distance as a tool for ligand-based virtual screening applications. Those applications estimate the bioactivity of a chemical employing the bioactivity of similar compounds. An essential part of this study focuses on using machine learning and natural language processing techniques to optimize the transformation costs used in the molecular comparisons with the graph edit distance. Overall, this work shows a framework that combines graph reduction and comparison with optimization tools and natural language processing to identify bioactivity similarities in a structurally diverse group of molecules. We confirm the efficiency of this framework with several chemoinformatic tests applied to regression and classification problems over different publicly available datasets.