Introduction

Preconception carrier screening (PCS) refers to genetic testing of couples who do not have an a priori increased risk of having a child with a recessive genetic disease before they attempt to conceive.

More than 1,300 (autosomal and X-linked) recessive disorders have been identified so far,1 and these vary greatly in severity and age of onset. Although individually uncommon in general populations, Mendelian diseases are collectively reported to account for ~20% of infant mortality and ~10% of pediatric hospitalizations.2 Detection of carrier status enables identification of couples with a 25% risk of affected offspring. The primary aim of PCS is to provide such couples with informed reproductive choices, including prenatal diagnosis, preimplantation genetic diagnosis, accepting the genetic risk and preparing themselves for (the possibility of) having a child with a certain disease, sperm/egg donation, adoption, or refraining from having children. In certain communities, particularly those with a high incidence of specific severe diseases, disease prevention may be viewed as the primary goal.3 Another potentially favorable consequence of carrier screening is enabling early perinatal diagnosis and treatment that can profoundly reduce morbidity and mortality. The preconception period is considered the optimal timing for carrier screening because only then are all aforementioned reproductive options still applicable.

The prevalence of offspring with major anomalies is higher among consanguineous couples than among nonconsanguineous couples,4 mostly because of autosomal recessive disorders. Increased genetic risks are also applicable to subpopulations with high carrier frequencies of specific mutations.5 The concept of PCS has been applied for decades in some of these populations.6 Typically, one or a few diseases with a relatively high incidence in a population, or subpopulation, are tested in a targeted manner.5 However, with the introduction of next-generation sequencing techniques, simultaneous testing of much larger gene numbers has become possible and cost-effective. Until now, particularly targeted, gene panel-based, next-generation sequencing approaches have been described.2,7,8,9 An approach using targeted analysis after untargeted whole-exome sequencing (WES) has been explored in one study that filtered for identical mutations in approximately 500 genes in both partners of four consanguineous couples.10 However, systematic assessment of the use of WES for PCS has not been performed so far. In our study, we aimed to investigate WES for PCS in a broader setting and to develop a filter strategy that could be used for consanguineous and nonconsanguineous couples.

Materials and Methods

Different filter strategies for PCS were explored in WES data from eight consanguineous couples and five fictive nonconsanguineous couples. The resulting proposed filter strategy was applied to another 20 nonconsanguineous fictive couples.

Subjects

The eight consanguineous couples all have a child with suspected mitochondrial or mitochondrial-like disease. They gave consent for WES, including for their own DNA, in a research setting. The genetic cause of the child’s disease was established in four couples (1, 5, 6, and 7) ( Table 1 ); the affected child was homozygous for the causative mutation(s) and the parents were identified as heterozygous carriers. Couples 1, 3, 5, 6, and 7 were of Moroccan origin, couples 4 and 8 were from Turkey, and couple 2 was from Iraq (Supplementary Table S1 online). For more details on the degrees of consanguinity of the eight couples, see Supplementary Table S1 online.

Table 1 Previously established disease-causing variants in consanguineous couples 1, 5, 6, and 7 and their identification by our filter strategy

The nonconsanguineous couples, originating from the Dutch population, were anonymized male and female exomes from existing diagnostic data sets that were randomly mixed. For the current proof-of-concept study, it was not necessary to use actual couples. The study was approved by the local Medical Ethical Committee.

Analysis

WES for the consanguineous couples and the nonconsanguineous couples was performed as previously described.11,12 Standard filter steps were location (exonic and canonical splice, the latter defined as ≤2 nucleotides intronic), sequencing depth/reads (≥8), and zygosity (homozygous variants removed). Variably applied filter steps were amino acid alteration (synonymous variants removed), annotation in databases (HGMD and/or ClinVar), allele frequency (dbSNP <5% versus <1%), mutation type (frameshift, nonsense, splice site, missense), presence of identical variants in both partners of the couple, presence of variants in the same genes in both partners of the couple, and presence of variants in genes included in gene panels. To establish an adequate cutoff for the allele frequency, disease prevalence was reviewed. Allele frequencies were calculated by taking the square root of the prevalence. The ClinVar database on pathogenicity of known variants was downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar on 2 March 2016 in variant call format. These variants were matched to the patient exome data using SNP identifiers, accounting for multiple SNPs at a given position. For multiple pathogenicity classes assigned to a genetic variant, only classes 4 (“likely pathogenic”) and 5 (“pathogenic”) were taken into consideration for further analysis.

Gene panel

The gene list described by Bell et al. and Kingsmore,2,7 which consists of 508 items comprising 437 genes that supposedly cause 448 severe recessive diseases, was studied. A curated list of 459 genes after the removal of some (particularly nonrecessive) genes/diseases was created, with the addition of the genes/diseases for which preimplantation genetic diagnosis was performed in our center as well as those included in a PCS pilot in Groningen, the Netherlands;13 this list was used in our analysis.

Results

A first observation was the larger number of variants per individual in the consanguineous individuals compared to the nonconsanguineous. Considering the ethnicity of the consanguineous couples (Supplementary Table S1 online), this probably reflects population differences related to human evolutionary origin and migration.14

Proposed filter strategy

Variant filtering in all genes interrogated by WES. As a first and separate step, variants annotated as pathogenic or likely pathogenic in ClinVar and with a <1% or unknown dbSNP frequency were selected from the exome data, yielding an average of eight (5–13) variants per individual (n = 16) for the consanguineous couples and five ((0–11), n = 50) for the nonconsanguineous ( Figures 1 and 2 and Tables 1 and 2 , Supplementary Tables S2–S4 online). From the remainder, HGMD-annotated variants with SNP frequency <1%/unknown were selected, resulting in 20 (11–33) and 11 (1–21) variants, respectively. The residual data were initially filtered for location, sequencing depth, zygosity, amino acid alteration, and SNP frequency. Subsequently, frameshift, nonsense, and splice site mutations were selected and added to the ClinVar-selected and HGMD-selected variants. This total of presumably pathogenic mutations yielded an average of 100 (84–151) variants for the consanguineous individuals and 80 (59–98) variants for the 25 nonconsanguineous. These were compared between both partners of the couple; the variants in genes where the partner has at least one presumably pathogenic variant as well, and vice versa, were selected, resulting in an average of 29 (19–51) variants for the consanguineous couples and 15 (6–30) for the nonconsanguineous. X-linked variants were separately selected for the females, yielding three (1–5) variants and one (0–3) variant, respectively. All six previously identified disease-causing variants were present in the expected variant pools in our data set of the consanguineous carriers ( Table 1 ).

Figure 1
figure 1

Strategy for whole-exome sequencing-based preconception carrier screening.

Figure 2
figure 2

Average variant numbers resulting from our filter strategy for whole-exome sequencing-based preconception carrier screening.

Table 2 Pathogenic variants detected by our PCS strategy, filtering all genes interrogated by whole-exome sequencing, in two consanguineous and two nonconsanguineous couples

For the first two consanguineous and nonconsanguineous couples, the presumably pathogenic variants in shared genes are displayed in Supplementary Table S3 online as examples. The majority were able to be discarded based on high in-house and/or ExAC (http://exac.broadinstitute.org/) frequency in an absolute manner or relative to disease prevalence and/or based on the absence of a monogenic disease association and/or because they appeared to be a sequencing artifact (Supplementary Table S3 online). The remaining relevant variants are summarized in Table 2 . If necessary, low coverage variants may be confirmed by Sanger sequencing. Consanguineous couple 1 (C1) was identified as a clear carrier couple only for the pathogenic mutation in the ADD3 gene, which was causative of their child’s cerebral palsy. For the ABCA4 gene and the PRPH2 gene, both implicated in ophthalmologic phenotypes, only in one of the partners did a pathogenic mutation remain (Supplementary Table S3 online); therefore, the finding is not relevant in the context of PCS. Consanguineous couple 2 (C2) was identified as a carrier couple of a known pathogenic ABCA4 mutation and of a FGG mutation that has been reported once in association with delayed clot formation/dysfibrinogenemia in heterozygous state.15 It is unknown whether homozygosity for this particular mutation would result in a more severe phenotype (congenital afibrinogenemia, OMIM 616004). Furthermore, in both partners a rare PRRX1 missense substitution was detected (Supplementary Table S3 online). The latter has been reported in one association study of atrial fibrillation,16 whereas mutations in this gene are otherwise implicated in the agnathia–otocephaly complex (OMIM 202650), mostly in the heterozygous state, although one homozygous mutation has been reported. In nonconsanguineous couple 1 (NC1), no relevant presumably pathogenic variants were found. Nonconsanguineous couple 2 (NC2) was identified as a CFTR (cystic fibrosis, OMIM 219700) carrier couple, with the female carrying the classic DF508 mutation and the male carrying the mild/nonclassical CFTR mutation L997F.17,18

In X-linked genes, a pathogenic G6PD mutation (G6PD deficiency, OMIM 300908) was identified in consanguineous female 2 and a canonical splice site substitution in the PHF8 gene (siderius X-linked mental retardation syndrome, OMIM 300263) was found in nonconsanguineous female 2 ( Table 2 and Supplementary Table S4 online).

To compare our results to those after adding an extensive gene panel, the same four couples were filtered for 459 autosomal and X-linked recessive genes. This failed to identify the pathogenic ADD3 variant in C1, the pathogenic ABCA4 variant in consanguineous couple 2, and the PHF8 splice site substitution in NC2. The identification of mutations that pose some uncertainties, for example, regarding predicting homozygous phenotype, was not prevented (e.g., FGG). Only one (ANTXR2) of the six genes that were disease-causing in the consanguineous families was included in the gene panel.

Remaining variants. Beyond the annotated variants and presumed pathogenic variants, two variant categories remained ( Figure 2 , Supplementary Figure S1 and Supplementary Table S2 online). The first consists of missense variants that are not present in HGMD or annotated as at least likely pathogenic in ClinVar, with an average of 1,190 (961–2,269; n = 16) variants per individual in the consanguineous couples and 650 (513–924; n = 50) in the nonconsanguineous. Second, nonannotated in-frame deletions, insertions, and complex rearrangements remained: an average of 87 (69–104) in the consanguineous and 80 (42–116) in the nonconsanguineous individuals. These added up to a total of 1,277 (1,030–2,362) and 730 (598–980) remaining variants, respectively. The selection of variants present in the same genes in both partners reduced the numbers to 534 (385–1,150) and 178 (91–329). A further reduction was achieved by applying the gene list: 15 (8–35) and 5 (0–10) variants, respectively.

We used the data from the gene panel to assess the number of carrier couples in the entire cohort (Supplementary Figure S1 online, Supplementary Tables S2 and S5 online). Extrapolating this to the expected number of carrier couples gives an idea of the proportion of pathogenic variants that is missed by our strategy (see Discussion). A total of 5 unique variants in X-linked genes and 21 in the overlapping genes were found in the 33 couples (Supplementary Table S5 online). Again, not all variants remain as (likely) pathogenic. In C2, the aforementioned FGG variant was found. Additionally, as anticipated, consanguineous couple 6 (C6) was identified as a carrier couple of the pathogenic ANTXR2 mutation ( Table 1 ). In the nonconsanguineous couples, no couple other than the previously mentioned CFTR carrier couple was identified (NC2). In nonconsanguineous 3 (NC3), only one partner carried a pathogenic PKHD1 mutation (AR polycystic kidney disease, OMIM 263200), whereas the PKHD1 variant in the other partner could be discarded. In the X-linked analysis, only one relevant variant other than the G6PD (C2) variant was identified: a missense substitution in the AR gene (NC21), which has been reported in two boys with severe virilization defects due to partial androgen insensitivity19,20 and also in a boy with normal phenotype.21 Whether the mutation has a phenotypic consequence appears to depend on the size of polymorphic repeats.20

Altogether, four of the eight consanguineous couples and one of the 25 fictive nonconsanguineous couples were identified as autosomal recessive carrier couples ( Tables 1 and 2 ; Supplementary Table S5 online). X-linked carriership was established in one consanguineous and two nonconsanguineous females ( Table 2 and Supplementary Table S5 online).

Discussion

We explored WES as a method for PCS and provided a filter strategy to rapidly identify the majority of pathogenic mutations in genes shared by the couple and in the female’s X-linked genes. A major advantage of our PCS approach is that WES, used increasingly as a standard cost-effective diagnostic technique with a short turn-around time of a few months, could easily be implemented in the same flow. Whether WES-PCS will be cost-effective will require a proper HTA (health technology assessment) study, but it is clear that in genetic testing the prevention of disease occurrence is (cost-wise) most rewarding and it is likely that, given the high number of unexpected recessive disease cases occurring every year, this is or will become cost-effective soon. Furthermore, using WES enables flexible adjustment of the genes included in the PCS test and of the exact filter strategies used. The most important benefit of PCS based on WES from a clinical point of view is the possibility to screen all known disease genes instead of a selected and (by definition) limited/arbitrary gene panel. Very rare recessive disease alleles, mainly relevant for consanguineous couples, are not likely to be included in such panels.

One of 25 nonconsanguineous couples was identified as a carrier couple. The expected number of nonconsanguineous carrier couples can be derived from the cumulative risk of being a carrier couple of any autosomal recessive disease in the population concerned. The latter was calculated/estimated to be ~2% (based on prevalence numbers of rare diseases provided by the study by Orphanet,22 in which only diseases that can technically be identified by WES were considered and prevalence applicable to our northern European population was applied), including disorders ranging from severe to mild. Although our numbers are small, finding 1 nonconsanguineous carrier couple out of 25 may be grossly in line with this. The one couple identified as being a CFTR carrier is also in agreement with the relatively common CFTR carriership in northern Europe.23 For females, a cumulative X-linked carrier risk of ~1% was estimated,22 again exclusively based on diseases that can be picked up by WES and including conditions with limited clinical relevance, whereby a de novo rate of one-third for lethal/nonreproducing X-linked disorders was taken into account.24 Our findings of 3 (presumed) carriers out of 33 exceed this estimation, indicating, although in a limited data set, that probably not many variants are missed. For X-linked carriership, no differences were expected between consanguineous and nonconsanguineous couples; this was illustrated by our data.

Selection of pathogenic variants. We selected presumably pathogenic variants based on published data (databases) and on predictions by the nature of the variant. Although databases are known to be imperfect, we initially hypothesized that by selecting variants associated with human disease, annotated in the Human Genome Mutation Database (HGMD, http://www.hgmd.cf.ac.uk/ac/index.php),25 additional filter steps such as frequency would not be necessary. However, the remaining numbers were much higher than expected and included high-frequency benign variants or variants associated with multifactorial disease. The same was true for variants that are annotated as (likely) pathogenic in ClinVar. Therefore, we added frequency as an additional step, which largely reduced the numbers without removing presumably pathogenic variants.

In addition, we selected nonannotated (likely) pathogenic variants in our strategy. Synonymous variants as well as homozygous variants in healthy adults were discarded from the variant pool. Other criteria for determining a variant’s pathogenicity are location and frequency. An allele frequency cutoff of <1% (corresponding with a disease prevalence of <1/10,000 for autosomal recessive diseases) was considered justifiable based on the aforementioned Orphanet disease frequency data.22 Most diseases with a prevalence above this cutoff have limited clinical relevance (e.g., congenital isolated thyroxine-binding globulin deficiency), are caused by mutation types that are not detected by sequencing analysis and therefore require a separate test anyway (e.g., SMA, fragile X syndrome), or are more frequent only in specific populations (e.g., thalassemia). The latter is generally well known for such populations, thereby enabling the addition of population-specific tests. Moreover, an allele frequency of <1% in the filter algorithm represents the frequency of a single disease allele, whereas for most diseases multiple disease alleles exist. Exceptions may be disease alleles that are responsible for the majority of disease cases (e.g., p.Phe508del mutation in cystic fibrosis or p.Lys304Glu mutation in MCAD deficiency) or (again population-specific) certain founder mutations. These can be assessed in or added to the data separately.

A frequency cutoff of <5% was also considered but resulted in a 50% increase in remaining variants (data not shown), which is expected to be particularly attributable to nonpathogenic variants. This will significantly increase the interpretation load at the end and, in our opinion, does not outweigh the effort of separate mutation analysis/testing that may be necessary in specific populations. After filtering for generic filter steps, we selected frameshift mutations, nonsense mutations, and canonical splice site mutations, which are often considered pathogenic due to predicted effects on the protein. Obviously, as for the annotated variants, all resulting variants will have to be evaluated because, for various reasons (annotation errors, database errors), not everyone of these is actually pathogenic. Also, if coverage is borderline, then validation by Sanger sequencing is advisable. The number of variants resulting from our filter strategy for PCS was small enough to enable such follow-up in a diagnostic setting. With our strategy, only between 0 and ~20 variants needed manual verification, varying from fast checks (e.g., HGMD/ClinVar, OMIM, Orphanet prevalence list) for most to more extensive evaluation for a few. Of note, in-house variant frequency databases as well as the ExAC database appeared very useful for relatively quickly discarding a first set of variants, even though only variants with low or unknown dbSNP frequency were selected in our algorithm. Benign variants or variants of uncertain significance (VUS) may be present in the final results, with the latter posing the greatest clinical challenge. However, the majority of variants, even when analyzing the entire exome, could be clearly assigned as being either benign or pathogenic. A complicating factor in PCS compared to genetic testing in affected individuals is the lack of a phenotype to which to relate the genetic findings. This is not a WES-specific issue; it also applies to, for example, targeted sequencing of specific genes. The only way to circumvent this issue is to exclusively test for known pathogenic mutations. However, collecting these mutations is very time-consuming and, more importantly, once a certain mutation-based array is designed, mutations are not easily added. Given the daily basis on which new pathogenic mutations are identified, such a method would be outdated in no time.

Pathogenic mutations that may be missed by WES-based PCS. Pathogenic mutations may be missed either because they are not recognized as being pathogenic in the filtering process or because they are not detected by the sequencing technique. The first category consists mainly of amino acid substitutions (missense variants) and, to a lesser extent, in-frame rearrangements not yet known/described to be pathogenic. These variants are often difficult to interpret regarding pathogenicity. As illustrated by our data, their amount in genes shared by the couple can be reduced to manageable numbers by applying gene panels. However, reducing the variant numbers will not solve the interpretation issues. Therefore, we feel this category is not recommended in a diagnostic setting at this point. It is necessary to obtain more experience (in a research setting) with the nature of the resulting variants to further evaluate their use in clinical practice. More importantly, increasing knowledge of pathogenic mutations and collection of these mutations in an aforementioned database will eventually decrease this variant category. Also, newer filtering pipelines that are able to identify other presumably pathogenic subcategories (e.g., stop-loss, start-loss, and/or glycosylation site missense variants) will move variants from the unknown to the presumably pathogenic variant pool. In time, the same will be true for intronic variants, microRNA variants, and others.

Because conservation is an important factor in pathogenicity, one may argue for its inclusion in the filter strategy. However, conservation scores predicting (potential) variant pathogenicity either isolated (e.g., GERP26, PhyloP27) or incorporated in more comprehensive software tools (e.g., SIFT28, PolyPhen29, CADD30) are merely predictions and may give conflicting results, if at all accurate, to distinguish pathogenic from nonpathogenic rare alleles.31 This was also illustrated by a PolyPhen-based spot check in our own data, whereby 134 homozygous variants in a healthy individual were assigned as being pathogenic (data not shown). Taken together, conservation and pathogenicity predictions are more suited for manual variant verification than for the standard filter step. Notably, the issues of discarding unknown pathogenic mutations and encountering interpretation problems are not WES-specific; they are applicable to other sequencing methods as well. Another reason why disease-causing variants may be discarded in the filter process is that allele frequencies exceed the filter cutoff. If relevant for the population concerned, then a complementary test and/or panel can be offered.

Second, there is a category of variants not detected by WES. One of the potential causes is insufficient callable coverage. Although WES coverage has improved considerably over the past years (~95% capturing of the exome32), coverage is not complete. In time, whole-genome sequencing techniques will replace WES in clinical genetic practice, thereby further improving completeness of the data.32 To ensure that the most relevant mutations in the population concerned are covered sufficiently, the presence (and expected frequencies) of these mutations should be evaluated in WES data available to the laboratory. If needed, a specific complementary mutation panel could be offered. Technical characteristics of the test are another non-WES-specific cause of missing mutations by sequencing analysis. The ability to detect exon deletions and repeat expansions varies between sequencing techniques. Including these in PCS may require additional test methods. Some relevant examples are indicated elsewhere in the discussion section.

Gene panels. Our data show that the number of presumably pathogenic variants resulting from WES-based PCS is small enough not to need a gene panel. In fact, by using an extensive gene panel, three of the four known pathogenic mutations in the consanguineous couples were discarded, whereas all four were identified in our algorithm without a gene panel. Also, our other data illustrate that several relevant disease-associated genes are not included in the panel (Supplementary Tables S3 and S4 online). The occasional encounter of a variant with interpretation uncertainties was not prevented by applying gene panels. Therefore, taken together, our results argue against the use of gene panels, especially when aiming a generic PCS test applicable to a broad population with varying ethnicities. The main rationale behind gene panels is to prefilter out certain variants or genes that are deemed undesirable to report for preconception carrier status. Although we do not consider this necessary, if preferred, a gene panel could easily be incorporated in our pipeline before manual evaluation.

Also, creating a gene panel that is complete, given certain agreed criteria, is hardly possible. When omitting gene panels and thereby not restricting the analysis to autosomal-recessive genes, one might encounter autosomal-dominant mutations with potential clinical consequences for the tested individual. In that case, PCS transforms into presymptomatic testing, which is not deemed desirable in this context. However, when combining the data of both partners, the risk for identifying dominant disease genes is very small. Furthermore, if offspring are at risk for inheriting two dominant disease alleles, then this is of particular relevance in the context of PCS.

Another effect of leaving gene panels out is that carrier couples of, for example, relatively mild and/or treatable disease may be identified. We feel this does not differ significantly from issues encountered in current clinical genetics practice. Adequate counseling is the key for responsible consideration of consequences. Moreover, the manual evaluation of remaining variants includes not only pathogenicity but also clinical implications. With such a post hoc approach, the chances of missing highly relevant pathogenic mutations or genes are smallest. Defining the exact criteria of the findings to be reported is a difficult/arbitrary discussion and can be more easily accomplished with the small number of remaining variants in case-by-case decisions (Supplementary Tables S3–S5 online) of clinicians and laboratory geneticists. This stresses the importance of performing WES-PCS in a specialized, academic setting.

In conclusion, our results show that WES is eligible for PCS, both for consanguineous and nonconsanguineous couples, with the remaining number of variants being manageable in a clinical setting. Although pathogenic mutations may be missed due to either technical limitations or current lack of knowledge regarding new mutations, respectively, only the first is WES-specific and, with current WES-performance, not a major issue. Specific mutations or genes may be tested complementarily. Overall, we think that WES is able to identify a higher proportion of relevant carrier couples compared to other PCS tests. The omission of gene panels convincingly added value in our experience. Adequate counseling of couples opting for WES-based PCS is critical. PCS may clearly identify a carrier couple or may pose some additional questions, for example, concerning disease severity, the difficulty to predict the phenotype, and the issue of possibly manifesting carriership of X-linked disease, of which discussion in a multidisciplinary PCS team is crucial. Each couple should also be informed that the test is not 100% sensitive; therefore, residual carrier risks remain. It is important to develop and continuously curate databases with pathogenic mutations to reduce manual work in variant analyses and to increase sensitivity of PCS.

Disclosure

The authors declare no conflict of interest.