Streamlining minimal bacterial genomesanalysis of the pan bacterial essential genome, and a novel strategy for random deletions in mycoplasma pneumoniae
- Daniel Joseph, Shaw
- Luis Serrano Pubull Zuzendaria
- Maria Lluch Senar Zuzendarikidea
Defentsa unibertsitatea: Universitat Pompeu Fabra
Fecha de defensa: 2019(e)ko abendua-(a)k 13
- Manel Porcar Presidentea
- Marc Guell Cargol Idazkaria
- Luis García Morales Kidea
Mota: Tesia
Laburpena
This thesis focuses on the theme of tools designed to increase our knowledge of bacterial genetics, and how this knowledge can help in the process of genetic engineering. It is split into two main areas; the first concerns the development of a methodology that allows for random genome reductions in Mycoplasma pneumoniae, the second with an exploration of the essential genes across the bacterial Domain. In the first part, we document the development and iteration of a novel protocol to allow for the random deletion of genetic material in M. pneumoniae. Traditionally, genome reduction methodologies rely on an a priori justification of what to delete. However, these assumptions may be biased by our incomplete knowledge of both all gene functions, and their epistatic interactions with the rest of the genome. As such, our determinations of what areas of a genome we can remove successfully may not be accurate or optimised. To address this, we developed a methodology to remove sections of the genome in a random manner, thus bypassing any implicit biases on what to delete. We demonstrate how our methodology is effective, and the iterations we undertook to improve its efficacy, that it is self-selective for strains harbouring a genetic reduction, can produce a high level of variation in both size and location of deletions, and outline a modified sequencing protocol capable of detecting and localising deletions in a heterogeneous pool in a high-throughput manner. The second part of the thesis concerns the identification of trends regarding which genes are considered essential across the bacterial domain. Over the last 2 decades, we have been able to fully sequence the genomes of thousands of bacteria, and have found that despite their great diversity, there is still commonalities within them on the level of shared genes. However, there is no data on how essential to life these near universal genes are. The number of bacterial species that have had their essential genes identified is far lower, but we compiled as many as we could find hat shared a common gene disruption and sequencing methodology. A database of genes extracted from a sample of 47 species spanning 8 different phyla was constructed, clustering the genes into groups of homologs and assigning essentiality data from individual studies and functional data from the Cluster of Orthologous Genes (COG) database. This database was then interrogated to see if there were trends relating to which genes were conserved, and which genes were essential. Our list of highly conserved genes matches those found by previous groups well. However, when essentiality is considered, we find very few genes that can be considered to be universally essential. Of these, the vast majority pertain to translational machinery. We also found that there are a subset of genes that are very highly conserved, but rarely essential to cell survival. With regard to genome size vs essentiality, we found that while there is little correlation between the number of genes and genome contains, and the number of essential genes, the composition of a bacteria’s essential genome does change with complexity. The essential genes of a minimal genome are dominated by transcription, translation and DNA replication/repair genes, but as complexity increases the number of essential genes relating to cellular signalling and housekeeping rises, along with a modest increase in metabolism genes. These two parts can work synergistically to improve our knowledge of genome engineering. Random genome deletions can both help minimise bacterial genomes, and also provide information on more complex networks of essentiality by deleting multiple genes simultaneously. This knowledge of essentiality can then be queried against a larger database, and begin to uncover which networks or individual genes can be deleted or are at least non essential in large number of species. This in turn can help us build a greater understanding of which systems are more viable deletion targets in the future, and which appear to have functionalities that we should strive to preserve.