Front matter

Abstract

The discovery of DNA fundamentally changed the world, revolutionizing our understanding of life and the practice of medicine. After a century of studying DNA, medicine entered a new frontier with the completion of the nearly 20-year billion-dollar effort to sequence the first human genome. We can now sequence a human genome in a matter of days for hundreds (not billions) of dollars. Technological advances and medical geneticists’ robust efforts to interpret human variation have led to exponential clinical sequencing growth. The medical genetics community currently faces three primary challenges: (1) variant interpretation; (2) overcoming difficult detection problems (e.g., structural variants and low-frequency variants); (3) moving beyond a linear poorly-representative reference genome. The work herein addresses how to overcome two specific detection problems.

First, I present a novel approach for detecting exon-level copy number variation using exome sequencing. The vast majority of available sequencing collected lacks the power to detect small copy number variants, leading to a significant blind spot in our understanding of genetic variation. I demonstrate how modifying the exome capture step to capture multiple samples simultaneously significantly reduces the inter-sample variance and improves copy number discrimination. I then demonstrate the utility of a novel statistical algorithm specifically for multiplexed-capture exome sequencing.

Second, I outline the shortcomings of noninvasive exome sequencing in prenatal genetics. Utilizing cell-free fetal DNA in maternal circulation, we can diagnose a wide range of genetic conditions noninvasively. Efforts have suggested the possibility of noninvasive fetal genome and exome sequencing. However, to-date, no one has demonstrated accurate fetal genotyping purely from cell-free DNA. I use probability theory to demonstrate why efforts have failed, and suggest a path forward for noninvasive fetal genotyping.

Finally, I briefly outline my ongoing work in prenatal genetics and propose a validation study to further interrogate exon-level copy number variation.

Dedication

To my loving and endlessly supportive wife and parents,

it’s all happening…


The only true currency in this bankrupt world

is what you share with someone else

when you’re uncool.

– Cameron Crowe

Acknowledgments

This work would not have been possible without the support of my PhD mentor, Kirk Wilhelmsen. Kirk provided mentorship, advice, and care in and out of the laboratory. Most importantly, Kirk encouraged me to pursue research and avenues to best serve my career – even when my path did not further his own research program.

My scientific training began with Sara Heggland at the College of Idaho. I spent four years working under Sara’s mentorship; she, more than anyone, taught me the scientific method and galvanized my path to research. I am forever grateful she took me under her wing. After graduating from college, David Reif, Richard Judson, Matthew Martin, and others at the National Center for Computational Toxicology allowed me to find my passion for data science. Working at NCCT gave me the practical skills to thrive in a data science PhD.

Thank you to Stan Ahalt and all of the people at the Renaissance Computing Institute for welcoming and supporting me, particularly Chris Bizon and Phil Owen. Thank you to my committee, Bradford Powell, Stan Ahalt, Neeta Vora, and Yun Li. In addition, I want to thank all of my collaborators for this work, especially: Jonathan Berg, Alicia Brandt, Piotr Mieczkowski, Karen Dorman, Amber Ivins, and Fengshen Kuo.

Thank you to the UNC MD-PhD program for accepting and supporting me both through medical school and my PhD training.

Finally, thank you to my family and friends whose support outside of training have enabled and enriched my life and success.

Preface

I have submitted portions of Chapters 2 and 3 for peer-reviewed publication; both manuscripts are under review currently. Both manuscripts represent a collaborative effort, particularly in the data collection, but I independently performed all analyses, produced the text, code, figures, and tables herein. The manuscript derived from Chapter 2 includes the following authorship: Dayne L Filer, Fengshen Kuo, Alicia T Brandt, Christian R Tilley, Piotr A Mieczkowski, Kimberly Robasky, Chris Bizon, Jeffery L Tilson, Bradford C Powell, Darius M Bost, Clark D Jeffries, Jonathan S Berg, and Kirk C Wilhelmsen. The manuscript dervied from Chapter 3 includes the following authorship: Dayne L Filer, Piotr A Mieczkowski, Alicia Brandt, Kelly L Gilmore, Bradford C Powell, Jonathan S Berg, Kirk C Wilhelmsen, and Neeta L Vora.