Chapter 3 Noninvasive prenatal exome sequencing pitfalls

3.1 Introduction

The beneficial health outcomes from newborn screening programs (NBS) are indisputable. We envision future NBS will begin with prenatal genetic testing to enable care in the immediate newborn period, and open up new possibilities for in utero and genetic therapies. During pregnancy, placental DNA is released into maternal circulation, enabling noninvasive interrogation of fetal genetics (noninvasive prenatal testing, NIPT). NIPT has a well-established clinical utility in screening for common chromosomal abnormalities such as Down syndrome with high sensitivity and specificity.108 More recently, efforts have demonstrated sequencing-based testing for de novo pathogenic variants in a list of 30 genes associated with dominant Mendelian disorders109 and PCR-based testing for a small number of recessive Mendelian disorders.110 Using relative haplotype dosage analysis (RHDO),111 multiple groups have successfully diagnosed single gene disorders112114 including a new offering of noninvasive prenatal diagnosis for cystic fibrosis in the UK Public Health Service.115 RHDO typically relies on collecting parental, and ideally proband, genetic information to resolve parental haplotypes; Jang et al. demonstrated success in diagnosing Duchenne muscular dystrophy by estimating haplotypes solely from maternal long-read sequencing.113 Scotchman et al. provide an excellent review summarizing the history of noninvasive testing.84 To date, no one has reported reliable fetal genotyping purely from maternal cell-free DNA using a sequencing-based approach.

To begin NBS with prenatal genetic testing, we believe we first need a reliable noninvasive test only requiring a maternal sample. Others could reasonably argue the availability of carrier screening, and the immeasurably small risk of invasive testing,116 removes the need for noninvasive testing. Such an argument, however, dismisses (1) the ethical and practical issues surrounding the necessity of involving the biological father, (2) the fact that many genetic disorders arise due to de novo mutations, and (3) the understandable fear and apprehension around invasive testing (especially for rare conditions). Additionally, we believe the prenatal diagnosis community should focus work on sequencing-based (as opposed to PCR-based) approaches. Sequencing generalizes across disorders more easily than PCR techniques, allows multiplexing to a degree not feasible using PCR, and will only continue to decrease in cost.

Snyder et al. provide a review of previous attempts to perform noninvasive fetal genome sequencing, illustrating the cost-infeasibility and suggesting more targeted approaches such as exome sequencing (ES).111,117119 As an exploratory exercise, we performed ES on cell-free DNA (cfES) from three pregnant women with singleton fetuses.

3.2 Methods

3.2.1 Participant selection

Genetic counselors identified pregnant women with suspected genetic disorders based either on family history or fetal sonographic findings. We enrolled three women, blinded to their family history and sonographic findings. All participants were consented and enrolled at UNC Hospitals by certified genetic counselors with approval from the UNC Institutional Review Board (IRB Number: 18-2618); we do not include any identifying information in this manuscript.

3.2.2 Exome sequencing and analysis

We collected cell-free DNA from maternal plasma, prepared sequencing libraries for the Illumina platform, and performed exome capture using the IDT xGen Exome Research Panel v1.0 (Cases 1 & 2) or Agilent SureSelect Human All Exon v7 (Case 3). We processed the data using a novel analytic pipeline developed in Snakemake97 using Anaconda environments for reproducibility (provided in supplemental materials). Briefly, sequencing reads were aligned to hg38 (excluding alternate contigs) using BWA-MEM,96 then base quality scores were re-calibrated using GATK4.37,40,41 We only retained non-duplicate, properly-paired reads with unambiguous mapping and mapping quality >30 for each read. We called variants using bcftools,39 requiring basepair quality scores >20. We suggest the review by Seaby et al. for more information on the specifics of collecting and processing ES data for clinical use.120 Analyses were restricted to the regions overlapping between the IDT and Agilent capture platforms. For cell-free analyses, we required 5 alternate allele-supporting read-pairs, and at least 80 total read-pairs. Using the identified single-nucleotide variants, we applied a novel empirical Bayesian procedure to estimate the fetal fraction (FF; the proportion of placental/fetal to maternal sequencing reads). We then estimated maternal and fetal genotypes using a maximal likelihood model incorporating the FF estimate and observed proportion of minor allele (alternate) reads (PMAR).

3.2.3 Genotyping algorithm

Represent maternal and fetal genotype pairs, given by the random variable \(G\), with capital and lowercase letters, where ‘A’ and ‘B’ represent the major and minor alleles (e.g. ‘AAab’ represents the fetus uniquely heterozygous for the minor allele).

Let \(X,Y\) be random variables for major and minor allele read counts. Define the fetal fraction and PMAR as the random variables \(F\) and \(M\). Then, by definition, \(\text{E}[M] = \text{E}[Y/(X + Y)]\). It’s easily proven:

\[\begin{align} \text{E}[M \rvert G = \text{AAab}, F = f] &= \frac{f}{2} \tag{3.1} \\ \text{E}[M \rvert G = \text{ABaa}, F = f] &= \frac{1 - f}{2} \tag{3.2} \\ \text{E}[M \rvert G = \text{ABab}, F = f] &= \frac{1}{2} \tag{3.3} \\ \text{E}[M \rvert G = \text{ABbb}, F = f] &= \frac{1 + f}{2} \tag{3.4} \\ \text{E}[M \rvert G = \text{BBab}, F = f] &= 1 - \frac{f}{2} \tag{3.5} \end{align}\]

We can then rearrange equations (3.1) and (3.5) and solve for the expected fetal fraction in terms of the PMAR:

\[\begin{align} \text{E}[F \rvert G = \text{AAab}, M = m] &= 2m \tag{3.6} \\ \text{E}[F \rvert G = \text{BBab}, M = m] &= 2 - 2m \tag{3.7} \end{align}\]

Given the average population allele frequency for sequenced variants, we know the probability distribution of maternal/fetal genotypes under Hardy-Weinberg, \(\text{Pr}\{G = g\}\). As shown above, given the fetal fraction, \(F = f\), we know the expected PMAR for each genotype, \(M\). We observe the major and minor allele reads, \(\mathbb{X}\) and \(\mathbb{Y}\) respectively, and wish to estimate \(\mathbb{G}, \hat{\mathbb{G}}\).

We employ an empirical Bayesian expectation-maximization algorithm to identify loci with unique fetal heterozygosity, i.e. \(g \in \{\text{AAab}, \text{BBab}\}\). We pick reasonable starting values for the fetal fraction, \(F = f\), and the average minor allele frequency, then iteratively update the expected allele distribution and expected PMAR values until some convergence:

  1. Initialize the genotype probabilities, \(p_g^* = \text{Pr}\{G = g\}\), and the expected PMAR, \(m_g^* = m_g\), based on reasonable estimates for the average minor allele frequency and fetal fraction

  2. Update \(\hat{\mathbb{G}}\): \[ \hat{g}_i = \mathop{\text{argmax}}\limits_{g \in G}\left\{p_g^*\mathcal{L}(g \rvert m_g^*,x_i,y_i)\right\}, Y_{i} \sim \text{Bin}(x_i + y_i, m_g^*) \tag{3.8} \]

  3. Update the genotype probabilities: \[ p_g^* = \frac{\sum_i \text{I}(\hat{g} = g) + N\text{Pr}\{G = g\} - 1}{\sum_g\left\{\sum_i \text{I}(\hat{g} = g) + N\text{Pr}\{G = g\} - 2\right\}} \tag{3.9} \] where \(N\) is the weight given to the initial estimate of the genotype probability, \(\text{Pr}\{G = g\}\).

  4. Update the expected PMAR: \[ m_g^* = \frac{\sum_i y_i\text{I}(\hat{g} = g) + Nm_g - 1}{\sum_i(x_i + y_i)\text{I}(\hat{g} = g) + N - 2} \tag{3.10} \] where \(N\) is the weight given to the initial estimate of the PMAR, \(m_g\).

  5. Continue updating \(\hat{\mathbb{G}}\) (3.8), \(p_g^*\) (3.9), and \(m_g^*\) (3.10) until \(\hat{\mathbb{G}}\) converges.

  6. For all loci \(j\), such that \(\hat{g} \in \{\text{AAab}, \text{BBab}\}\), calculate \(\hat{f}_j\): \[ \hat{f}_j = \begin{cases} \displaystyle\frac{2y_j}{x_j + y_j}, & \hat{g} = AAab \\[15pt] 2 - \displaystyle\frac{2y_j}{x_j + y_j}, & \hat{g} = BBab \end{cases} \tag{3.11} \]

  7. Let \[ \hat{f} = \text{median}\left(\hat{f}_j\right) \tag{3.12} \]

  8. Calculate the expected PMAR using the fetal fraction estimate, \[ m_g = \text{E}[M|\hat{f},g] \tag{3.13} \]

  9. Finally, for all loci, \(i\), estimate \(\hat{g}_i \in \hat{\mathbb{G}}\), \[ \hat{g}_i = \mathop{\text{argmax}}\limits_{g \in G}\left\{\mathcal{L}(g \rvert m_g,x_i,y_i)\right\}, Y_{i} \sim \text{Bin}(x_i + y_i, m_g) \tag{3.14} \]

3.2.4 Data availability

The data that support the findings of this study are available on request from the corresponding author. The raw sequencing data are not publicly available due to privacy or ethical restrictions. Allele depths, with the alleles masked and genomic location rounded to 10 kilobases are available in the self-contained R121 package reproducing the analysis herein (https://github.com/daynefiler/filer2020B).

3.3 Results

Using the final set of filtered reads, we analyzed single nucleotide loci with >80x coverage and at least 5 reads supporting the alternate allele. At each analyzed site, we alternate allele sequencing depth and total sequencing depth to estimate the fetal fraction and maternal-fetal genotypes using our novel algorithm (Figure 3.1). Table 3.1 lists the known genetic diagnoses for the three cases presented. Genetic counselors recruited the three participants; investigators and cfES analysis was blinded to the eventual genetic diagnoses. In Cases 1 & 2, specific gene sequencing based on family history and sonographic findings, respectively, provided genetic diagnoses. To date, Case 3 does not have a genetic diagnosis. We learned the mother in Case 1 carries a deletion of exon 1 in the gene most-often responsible for Menke’s syndrome (ATP7A). Neither exome capture platform targets ATP7A exon 1; therefore, cfES could not have identified the diagnosis for Case 1 with the platform used. In Case 2, we identified the causal variant using cfES. In this case, we correctly genotyped the fetus, but lacked the power to make the genotyping call with any level of confidence acceptable for clinical use (Figure 3.1B, note the widely-overlapping distributions at the causal variant). We did not identify any known pathogenic variants in the sequencing of Case 3, and despite performing genome sequencing on the newborn, we still do not have a genetic diagnosis for the family.

Distribution of observed proportion of minor allele reads (PMAR) values for the three cases across the possible maternal-fetal genotype pairs. Uppercase letters give the estimated maternal genotype, lowercase letters give the estimated fetal genotype; ‘A/a’ indicates the reference allele, ‘B/b’ indicates the alternate allele. Solid lines show the normal approximation for the theoretical distribution of binomial probabilities, given the frequency of the estimated genotypes. The vertical line in (B) shows the observed PMAR for the known pathogenic variant, rs140468248.

Figure 3.1: Distribution of observed proportion of minor allele reads (PMAR) values for the three cases across the possible maternal-fetal genotype pairs. Uppercase letters give the estimated maternal genotype, lowercase letters give the estimated fetal genotype; ‘A/a’ indicates the reference allele, ‘B/b’ indicates the alternate allele. Solid lines show the normal approximation for the theoretical distribution of binomial probabilities, given the frequency of the estimated genotypes. The vertical line in (B) shows the observed PMAR for the known pathogenic variant, rs140468248.

Table 3.1: Case summaries. GA: gestational age at the time of blood draw for cfES. FF: estimated fetal fraction. Depth: median depth used to estimate genotypes (does not include duplicated/filtered reads). %Dup: percentage of total mapped read pairs discarded as PCR and/or optical duplicates. %Filt: percentage of total mapped read pairs discarded for improper pairing and/or mapping quality.
GA Clinical findings Genetic diagnosis FF Dep %Dup %Filt
1 32w2d 5 prior pregnancies affected with X-linked recessive Menke’s syndrome Menke’s syndrome; del. ATP7A exon 1 0.117 241 42.80 21.96
2 24w5d Fetal sonogram at 21w5d showed femoral bowing with shortened length (<3% for GA) bilaterally Osteogenesis imperfecta type VIII; P3H1 c.1120G>T (rs140468248) 0.122 152 33.32 22.09
3 34w0d Fetal sonogram at 19w0d showed bilateral club foot with bilateral upper limb arthrogryposis None, to date, despite exome and genome sequencing of newborn 0.169 330 53.67 32.65

In Case 3, in addition to cfES, we performed exome sequencing (ES) on fetal, maternal, and paternal samples. Based on previous work demonstrating the differential length of maternal and fetal fragments,122125 we interrogated the distribution of presumed maternal and fetal reads (Figure 3.2). We identified maternal and fetal reads by identifying sites with unique heterozygosity in the direct maternal and fetal ES results; at the informative sites, we extracted reads supporting the allele unique to the mother or fetus. In total, we identified 654,619 maternal reads and 279,508 fetal reads. We found, as others have, a higher proportion of fetal reads falling below 150 basepairs; however, we also observed a slightly higher proportion of longer reads, as well.

Distribution of maternal versus fetal fragment length in Case 3. (A) shows the density; (B) shows the emperic cumulative distribution. The horiztonal axis shows the fragment length (insert size taken from aligned read-pairs). Blue lines show maternal reads, orange lines show fetal reads. We only included cfES reads supporting alleles unique to the mother or fetus, as identified from the direct maternal and fetal ES.

Figure 3.2: Distribution of maternal versus fetal fragment length in Case 3. (A) shows the density; (B) shows the emperic cumulative distribution. The horiztonal axis shows the fragment length (insert size taken from aligned read-pairs). Blue lines show maternal reads, orange lines show fetal reads. We only included cfES reads supporting alleles unique to the mother or fetus, as identified from the direct maternal and fetal ES.

Rabinowitz et al. proposed the Hoobari method which incorporates fragment lengths into fetal genotype estimates,125 finding the difference in accuracy varied from -0.25% to 1.89% when using versus not using fragment length in their exome analyses. To explore the utility of correcting for fragment length in our analysis, we interrogated the PMAR as a function of the short read proportion (fraction of reads with insert sizes less than 140 basepairs; Figure 3.3). We selected 140 as the cutoff based on the Hoobari algorithm. Overall, we found no meaningful relationship between the short read proportion and the observed PMAR and chose not to incorporate fragment length into our genotype estimates.

Proportion of minor allele reads (PMAR) as a function of the short read proportion for genotypes estimated as `AAab.’ Short reads defined as fragments less than 140 basepairs. (A-C) show Cases 1 to 3, respectively. Gray points show the individual sites; blue contour lines show the two-dimensional distribution of values.Proportion of minor allele reads (PMAR) as a function of the short read proportion for genotypes estimated as `AAab.’ Short reads defined as fragments less than 140 basepairs. (A-C) show Cases 1 to 3, respectively. Gray points show the individual sites; blue contour lines show the two-dimensional distribution of values.Proportion of minor allele reads (PMAR) as a function of the short read proportion for genotypes estimated as `AAab.’ Short reads defined as fragments less than 140 basepairs. (A-C) show Cases 1 to 3, respectively. Gray points show the individual sites; blue contour lines show the two-dimensional distribution of values.

Figure 3.3: Proportion of minor allele reads (PMAR) as a function of the short read proportion for genotypes estimated as `AAab.’ Short reads defined as fragments less than 140 basepairs. (A-C) show Cases 1 to 3, respectively. Gray points show the individual sites; blue contour lines show the two-dimensional distribution of values.

Returning to Case 3, we interrogated the fetal genotyping accuracy at all sites with cell-free genotype estimates and reliable calls from the direct fetal sample. Overall, we found a 50.91% accuracy (Table 3.2). Table 3.3 provides the full set of maternal, fetal, and cell-free calls.

Table 3.2: Case 3 fetal versus cell-free genotype calls. ‘0’ represents the major allele; ‘1’ represents the minor allele. Sites with cell-free estimates and reliable direct fetal calls included (reliable defied as passing all quality checks and having a total sequencing depth greater than 30).
Cell-free
0/0 0/1 1/1
Fetal 0/0 1,063 1,857 9
0/1 3,598 7,079 1,454
1/1 76 2,197 1,391
Table 3.3: Maternal, fetal, and cell-free genotype calls. ‘0’ represents the major allele; ‘1’ represents the minor allele. Sites with cell-free estimates and reliable direct fetal calls included (reliable defied as passing all quality checks and having a total sequencing depth greater than 30).
Maternal Fetal Cell-free N
0/0 0/0 0/0 468
0/1 1,159
0/1 0/0 64
0/1 352
1/1 0/0 1
0/1 2
0/1 0/0 0/0 387
0/1 107
1/1 6
0/1 0/0 3,072
0/1 1,967
1/1 713
1/1 0/0 68
0/1 458
1/1 1,291
1/1 0/0 0/1 1
1/1 2
0/1 0/0 3
0/1 1,308
1/1 648
1/1 0/1 1,601
1/1 23

3.4 Discussion

Without the ability to reliably exclude maternal DNA fragments, noninvasive sequencing-based methods to genotype the fetus either require additional sequencing of parental samples or distinguishing genotypes by the proportion of minor allele reads (PMAR). Here, we make no attempt to utilize parental genetic information and demonstrate the difficulty of inferring the genotypes directly from the PMAR. We model the PMAR as a binomial proportion; given the fetal fraction, one can prove the true PMAR defines the maternal and fetal genotypes (supplemental document).

The theoretical bounds of the binomial distribution, therefore, confine our ability to discriminate maternal-fetal genotypes. Using the normal approximation for the binomial variance (valid when the number of observations, i.e. sequencing depth, times the binomial proportion, i.e. PMAR, is greater than 10), we can clearly explain the poor results we observed (Figures 3.4 and 3.5). At sequencing depths up to 500x, the 95% confidence intervals on PMAR distributions still overlap for fetal fractions up to roughly 0.17 (Figure 3.4). When we calculate the degree of distribution overlap (a proxy for classification error rate), we see required sequencing depths in excess of 8,000x for low fetal fraction samples (Figure 3.5). We, therefore, cannot expect cell-free sequencing to reliably differentiate genotypes without substantially higher depth or additional genetic information. No amount of cleverness in the analysis can overcome the fundamental variance bounds when estimating binomial proportions.

95% confidence intervals on the binomial proportions for possible maternal-fetal genotype pairs across increasing fetal fractions. Confidence intervals represent a sequencing depth of 500x. Average fetal fractions by gestational age (in weeks) given in light gray.126

Figure 3.4: 95% confidence intervals on the binomial proportions for possible maternal-fetal genotype pairs across increasing fetal fractions. Confidence intervals represent a sequencing depth of 500x. Average fetal fractions by gestational age (in weeks) given in light gray.126

Expected misclassification rate (Weitzman overlapping coefficient; i.e. the area of overlapping distributions in Figure 3.4) considering ABab versus ABbb as a function of sequencing depth and fetal fraction. The dashed horizontal line shows 5% error. The theoretical error rates for ABab vs ABaa are symmetric and equal; however, the frequency of errors will depend on the population frequency of the reference versus alternate allele.

Figure 3.5: Expected misclassification rate (Weitzman overlapping coefficient; i.e. the area of overlapping distributions in Figure 3.4) considering ABab versus ABbb as a function of sequencing depth and fetal fraction. The dashed horizontal line shows 5% error. The theoretical error rates for ABab vs ABaa are symmetric and equal; however, the frequency of errors will depend on the population frequency of the reference versus alternate allele.

The sequencing herein suffers from three problems: (1) inadequate sequencing depth; (2) biased PMAR values from the removal of duplicate reads; (3) errors in sequencing and/or PCR. We have already illustrated the inadequate depth, but emphasize that the theoretical results we present speak to the final depths (not the raw sequencing depth). In our three cases, we excluded over half the reads taken off the sequencer due to sequencing quality thresholds (Table 3.1). We observe the evidence of problems (2) and (3) by observing the high proportion of both duplicate reads and PMAR values outside the theoretic distributions. Additionally, we observed very poor accuracy in the Case 3 genotype estimates.

Typical sequencing workflows start with randomly fragmenting DNA molecules to build sequencing libraries. Standard bioinformatic practices suggest we remove read-pairs with identical endpoints, because the duplicate read-pairs more likely represent PCR amplification of a single molecule than two molecules with the same fragmentation. Cell-free DNA molecules are shorter than nuclear DNA, not requiring manual fragmentation, and have a non-random distribution of endpoints.123 Therefore, compared to standard sequencing libraries, the likelihood of observing true duplicates in cell-free libraries increases and we cannot necessarily assume duplicates represent PCR amplification. However, for this work we have no way of differentiating reads representing true duplicate molecules versus PCR duplicates and thus excluded duplicate reads from our analysis.

Assuming adequate depth and appropriate handling of duplicate reads and sequencing errors, incorporating the fragment length into the statistical model may prove more beneficial. The high variability of the binomial distribution for small \(n\) obfuscates any meaningful relationship between fragment length and PMAR in our data. We reiterate, however, incorporating fragment length may give better estimates of the binomial proportion but cannot decrease variance beyond the distribution bounds.

To solve the above issues, we advocate a more targeted approach with much greater sequencing depth and unique molecular identifiers. Unique molecular identifiers allow identification of sequencing errors and differentiate true versus artifactual duplicate reads. Given the depth requirements for estimating fetal genotypes by the PMAR, and the challenge of variants of uncertain clinical significance, we advocate against broad sequencing modalities on noninvasive samples. Recognizing that all capture methods introduce bias in the relative sequencing efficiency of different targeted regions,120 the sequencing depths needed for noninvasive fetal genotyping necessitate a targeted approach. Despite the challenges raised by this work, we believe assessing hundreds to thousands of basepairs, rather than the tens of millions targeted in ES, will prove economical and clinically reliable. Doing so, we hope, will foster population-level screening for Mendelian disorders during the prenatal period and, ultimately, unlock new avenues in the treatment of these disorders.