Introduction

Mitochondrial disease is a complex and heterogeneous collection of disorders that can result in death or prolonged disability. The prevalence of these disorders could be as high as 1 in 4,300 individuals, which makes it one of the most common forms of inherited illness1,2. There is no general cure for these disorders but the most widespread treatments are vitamin and nutritional supplementation, most commonly with L-carnitine, creatine and ubiquinone (UQ; a.k.a. Coenzyme Q), despite the fact that there is little evidence supporting their effectiveness3,4.

UQ is a redox-active lipid-like molecule that plays a number of critical roles in biological membranes. Its best characterized role is as a key electron carrier of the mitochondrial electron transport chain5. UQ is also a co-factor in a number of other enzymatic processes as well as a potential membrane antioxidant. The rationale for generalized UQ supplementation in mitochondrial disease is thus the hope that it might support mitochondrial function, and that its antioxidant function could ameliorate any increase in oxidative stress. Furthermore, UQ deficiency secondary to other mitochondrial defects is observed in a substantial subset of mitochondrial disease patients6.

There is, however, one patient population that could directly benefit from effective UQ supplementation: individuals suffering from primary UQ deficiency due to mutations in genes required for UQ biosynthesis. Although such patients have been much discussed7,8,9, we are not aware of any formal attempt to estimate the prevalence of primary UQ deficiency. At this point, approximately 70 patients have been described in the published literature, and it has been informally estimated that their prevalence may be less than 1 in 100,0008. Despite clear genetic evidence that UQ deficiency is the primary cause in these patients, UQ supplementation has not met with consistent success, possibly due to poor bioavailability of the highly lipophilic UQ molecule10,11. A better understanding of the possible prevalence of this disorder would help guide decisions regarding investigations into novel UQ formulations or potential drugs which could modulate the UQ biosynthesis pathway.

UQ is composed of a redox-active benzoquinone ring with a lipid tail consisting of a species-specific number of isoprenoid sub-units (ten in humans). Although UQ biosynthesis has been most extensively studied in yeast, human homologues of the critical genes have been identified7,8,9. Thirteen yeast genes are required for UQ biosynthesis (COQ1 – COQ11, YAH1, ARH1). In brief, COQ1 (or the human homologues PDSS1 and PDSS2 acting as a hetero-tetramer) assembles an isoprenoid tail from precursors produced by the mevalonate pathway. COQ2 joins this isoprenoid tail to a tyrosine-derived benzoquinone ring precursor, and COQ3, COQ5, COQ6 and COQ7 are responsible for various methylation and hydroxylation reactions affecting the benzoquinone ring. COQ8 appears to play a regulatory role by modulating phosphorylation of COQ3, COQ5 and COQ7. COQ8 has two human homologues, COQ8A (also known as ADCK3 or CABC1) and COQ8B (ADCK4), both of which can independently result in UQ deficiency12,13. The roles of COQ4 and COQ9 are not well defined, although COQ4 appears to play a role in the assembly of COQ2 – COQ7 into a complex and COQ9 is required for COQ7 function. ARH1 (human homologue FDX1L) and YAH1 (FDXR) transfer electrons to COQ6, while also participating in other pathways. There are two modification steps of the UQ benzoquinone ring that have yet to be assigned an enzyme.

To date, pathogenic variants in nine of these proteins (PDSS1, PDSS2, COQ2, COQ4, COQ6, COQ7, COQ8A, COQ8B and COQ9) have been shown to cause UQ deficiency in human patients7,9. We sought to leverage the recent availability of exome or genome sequences of very large numbers of individuals in order to estimate the frequency of known pathogenic variants in these genes. We used the NCBI ClinVar database14 and conducted a literature search to identify variants in the known UQ biosynthesis genes that result in illness and UQ deficiency. The gnomAD exome and genome database15, with sequences for almost 138,632 individuals divided into seven genetically-distinct populations, was used to estimate the frequencies of these variants. Using these frequencies, we estimated the birth prevalence of individuals homozygous or compound heterozygous for known or predicted pathogenic genetic variants for primary UQ deficiency (assuming Hardy-Weinberg equilibria) on a population-by-population basis and used known population sizes and distributions to estimate the actual numbers of afflicted individuals due to each variant world-wide, as well as in a population with the particular size and mix of the USA. Importantly, the calculation of the number of afflicted individuals on a per-variant, per-population, basis eliminates a potential confounding factor when working with large numbers of variants present at very low frequencies – namely, that many individual variants may be too rare to result in any homozygous or compound heterozygous individuals, and the traditional method of summing these frequencies could yield frequencies high enough to artificially suggest that individuals are affected.

It is likely that many pathogenic variants simply have not been clinically documented at this relatively early stage in our awareness of primary UQ deficiency. To account for this, we also estimated the number of individuals who would be homozygous or compound heterozygous for variants observed in gnomAD but that have not yet been observed in the clinic, focusing on predicted loss-of-function (LoF) or pathogenic missense mutations.

There are many challenges to making estimates of this nature. For example, it is not possible to conclusively determine the pathogenicity of missense variants based on sequence information alone. We attempt to address this by conservatively included only those variants independently predicted to be pathogenic by two separate bioinformatic algorithms (see Methods). There is also extreme variability in severity of primary UQ deficiency, ranging from neonatal lethality (with mouse studies suggesting that embryonic lethality is a possible outcome for null alleles for some genes16,17,18,19) to mild disease that becomes apparent only in later decades of life. This makes accurate predictions of disease prevalence based on allelic frequencies extrapolated from public databases of genomic variants challenging, which is why our results are best interpreted as birth prevalence of individuals homozygous or compound heterozygous for variants likely to cause disease. Actual disease prevalence would be expected to diverge from these estimates. We discuss these issues in greater detail below.

We found that the carrier frequencies for most previously identified pathogenic variants were low (averaging 1/6,420 for the populations in which they were present), and given known population sizes we estimated they would result in a total of 1,016 individuals worldwide due to homozygosity and an additional 649 due to compound heterozygosity, with a total of 192 in the USA. The addition of all predicted loss-of-function and pathogenic missense variants results in a predicted total of 123,789 individuals worldwide and 1,462 in the USA.

Methods

Identification of known pathogenic variants

We identified pathogenic variants of UQ biosynthesis genes (PDSS1, PDSS2, COQ2 – COQ7, COQ8A/ADCK3, COQ8B/ADCK4 and COQ9) using the NCBI ClinVar database and via PubMed literature searches. ClinVar is a public archive (https://www.ncbi.nlm.nih.gov/clinvar/) describing human genetic variants and their relationship to human health14. Variants are extracted from the peer-reviewed literature or directly reported by CLIA certified or ISO 1589 accredited clinical testing laboratories. Variant pathogenicity is reported by the submitter according to the ordinal scale recommended by the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (“pathogenic”, “likely pathogenic”, “uncertain significance”, “likely benign” or “benign”)20. Note that ClinVar results cannot be used to directly estimate birth prevalence and the database does not include fields for incidence frequency. Each ClinVar entry describes a unique variant, and may be derived from multiple submissions.

We queried ClinVar (search conducted on 2017-03) for each gene (e.g., ‘COQ2[gene]’) and identified pathogenic variants using the following inclusion criteria:

(i) At least one submission describes the variant as “pathogenic” or “likely pathogenic”.

(ii) No submitter assigns a significance as “benign” or “likely benign”.

(iii) Variant only affects one gene (i.e., no multi-gene deletions or duplications).

Complete records for all variants meeting our inclusion criteria were manually reviewed, including confirming that the record matches the description in any cited studies. To ensure as complete as possible a record of known variants, we also conducted a systematic literature search via pubmed (search conducted on 2017-01), where we reviewed all clinical studies in the search results for each gene name.

COQ2 transcript start

It was recently shown that the canonical COQ2 transcript, used by most previous studies, erroneously includes a 150 base N-terminal region that is only rarely, if ever, transcribed in humans21. Thus, any variant involving this region is unlikely to be pathogenic. For ease of comparability with previous studies we have retained the canonical numbering, but we have not included any variant affecting this region. For this reason, we did not consider variants such as p.Ala17Argfs (ClinVar allele ID 237155).

Estimation of variant birth prevalence

To determine the birth prevalence of these variants we used the Genome Aggregation Database (gnomAD, http://gnomad.broadinstitute.org/), an updated version of the previously released dataset from the Exome Aggregation Consortium (ExAC)15. The gnomAD release includes exome or genome sequences from a total of 138,632 individuals without severe pediatric diseases. The database assigns ancestry by a principal components analysis based upon a subset of samples of known ancestry. Most sequences clustered into one of seven geographic or endogamous groups (African, Ashkenazi Jews, East Asian, Finnish, non-Finnish Europeans, Latin Americans, and South American), with the remainder (3,234) considered to be ‘other’. Hence, almost all samples included in this dataset have an ancestry that is well-defined on a genetic basis. This dataset has undergone extensive quality control measures to remove poor-quality sequences, related individuals and to flag variants of questionable reliability15, and assessments of variant pathogenicity are provided via the SIFT and PolyPhen2 tools.

When querying gnomAD for each of our genes of interest, variants affecting protein-coding regions were considered equivalent to known pathogenic variants if they resulted in the same change to protein structure (i.e., the same amino acid conversion, a stop codon introduced in the same location, or a frameshift resulting in the same residue changes). For variants affecting splice sites, only variants that exactly matched the nucleotide changes of the pathogenic variants were included. We only considered variants that had passed gnomAD random forest filters.

We acquired estimates of population sizes from various sources (Table S1). The population estimates summed to 6 billion, accounting for 80% of total global population. Estimates of population sizes within the USA summed to 309 million (vs. a total population of approx. 319 million). To estimate the number of affected individuals, we used the individual frequencies for each variant in a population (i.e., not on summed frequencies), and estimates for each variant for a population were rounded down to the nearest whole number.

Rates of compound heterozygosity were determined using data tables of missense and loss-of-function variants for each gene obtained from the gnomAD browser. An R script (available upon request) was written to systematically strip out undesirable variants (e.g., those affecting non-canonical transcripts) and make the multiple comparisons required. We calculated the predicted frequency of compound heterozygotes within each population in which it was possible (i.e., some variants were not observed in the same population, making compound heterozygosity for that variant impossible in that population). When determining rates of compound heterozygosity for the group of predicted loss-of-function (LoF) and pathogenic missense variants, we included known pathogenic/predicted LoF variant pairs in our calculations.

Pearson’s chi-square test of goodness of fit was calculated in Excel 14.0.7180.5002 (Microsoft, USA), and other calculations were performed in R. 95 percent confidence intervals were calculated with the exact binomial test.

Identification of predicted pathogenic variants

To identify predicted pathogenic variants in the gnomAD database, we first excluded variants that did not pass quality-control filters and those in non-canonical transcripts (as defined by gnomAD, the canonical transcript is the longest consensus coding sequence translation with no stop codons). To identify LoF variants, we extracted those annotated as “stop gained”, “frameshift”, “splice donor” or “splice acceptor” and excluded variants which gnomAD had flagged as low-confidence LoF. To identify the missense variants that were most likely to be pathogenic, we extracted only those variants for which gnomAD reported an assessment of “probably damaging” and “deleterious” by PolyPhen2 and SIFT respectively. To reduce the risk of obtaining false positives, we excluded variants with high minor allele (MAF) frequencies. Although a MAF cut-off of 0.5% has been suggested22, we chose a more conservative approach, instead using the highest observed MAF in the list of “known” pathogenic variants as a threshold: thus variants with a global MAF greater than 0.019% or a MAF for any population greater than 0.31% were excluded.

Data Availability

The datasets analyzed during the current study are available at www.ncbi.nlm.nih.gov/clinvar/ and http://gnomad.broadinstitute.org/.

Results

Through ClinVar, we identified 552 reported genetic variants affecting UQ biosynthesis genes (for complete listing, see File S1). Of these, 143 were deletions or duplications affecting multiple genes (all 17 variants reported for COQ3 and COQ5 fell into this category), and were not considered further because the pathogenicity of these variants could potentially be related to the activity of multiple genes. Of the remainder, 315 were excluded because submitters did not assess them as pathogenic (in only one case, ClinVar variation 3645, COQ8A p.Phe331=, were there both pathogenic and benign interpretations – in this case, a MAF of 1.57% supports the benign interpretation). Thirteen of those remaining were subsequently excluded because close inspection of the records and cited works revealed a number of problems, including single-copy variants not consistent with the typically recessive nature of UQ deficiencies (4 records), duplicate records (4), risk factors mis-categorized as causative pathogenic variants (2), an incomplete ClinVar entry (1), a multi-variant haplotype not testable in gnomAD (1), reliance on a secondary, unreferenced, source (1), and one variant present in the untranscribed N-terminal region of COQ2 (see Methods).

Of the remaining 80 records, the majority (49) had been extracted from the peer-reviewed literature, 22 were from the genetic testing company GeneDx (MD, USA), with the remainder from 6 other testing labs (see Table S2 for detailed information). GeneDx and five other testing labs provided detailed assertion criteria for the determination of variant pathogenicity, all adhering to established standards.

To account for the possibility that not all known pathogenic variants are included in ClinVar, we carried out an independent review of the literature, identifying 18 additional pathogenic variants: 2 affecting COQ2, 3 COQ4, 1 COQ6, 9 COQ8A, and 2 COQ8B (see Table S2 for literature references).

In total, we identified 97 pathogenic variants. Of these, 57 resulted in a single residue substitution, 21 in frameshifts, 10 premature stop codons, 7 variants altering splice-site donor or acceptor regions in ways predicted to be pathogenic, and 3 single-residue indels (see Table S2 for a complete listing of all identified known pathogenic variants). COQ8A was most frequently affected, with 40 variants.

To better understand the birth prevalence of these variants we queried the gnomAD exome and genome database. We found 441 carriers, with 49 of 97 pathogenic variants present (Table 1). No variants were present in homozygous form, all missense variants were predicted to be damaging by PolyPhen2, SIFT, or both, and all premature stop, frameshift or splice site-disrupting variants were predicted to be high-confidence loss-of-function. All of these findings are fully consistent with the reported pathogenic nature of these variants. Global allele frequencies ranged from 4.1 × 10−6 to 1.7 × 10−4, yielding a combined frequency of 1.76 × 10−3, implying that 1/321,368 individuals will be homozygous for pathogenic variants at birth.

Table 1 Known pathogenic variants from ClinVar or literature review that are represented in gnomAD sequence database.

Through casual observation it was apparent that several of the known pathogenic variants were not distributed evenly within the different populations. For example, the COQ8A p.Met555Ile variant was observed in 39 European or Finnish individuals, but in no other population, and the COQ8B p.Glu483* variant was observed in 10 individuals from South Asia but only in 1 European, despite the almost 4-fold greater number of European alleles genotyped. Indeed, the six variants with the greatest numbers of carriers had frequencies that were distributed unevenly between populations (Pearson’s chi-squared 24.3 to 537.5, p < 0.001) (Figure S1 - statistically significant differences were rarer among the variants with lower allele counts, potentially due to the decreased statistical power inherent in a lower sample size). Because of this unevenness, subsequent analysis was conducted on a population-by-population basis.

Each pathogenic variant was observed in an average of 1.9 populations (not counting ‘Other’), with an average allele frequency of 1.56 × 10–4 (Table 2). Combined estimates of Hardy-Weinberg homozygosity for all variants for each of the 7 populations averaged 1/5,492,983, ranging from 1/12,021,014 (Latin Americans) to 1/60,113 (Ashkenazi Jews). Predicted homozygous frequency for individual variants averaged 1/5.4 M, with the variant found at the greatest frequency being COQ4 p.Arg240Cys (with a 1/162 carrier frequency among Ashkenazi Jews which would result in the birth of homozygotes at a frequency of 1/104,733). With an estimated worldwide population of 10 M, this would imply 95 afflicted Ashkenazi Jews susceptible to UQ deficiency due to homozygosity for this one variant alone. Considering all variants across all populations, we can predict 1,016 homozygous-at-birth individuals globally, or 122 in the USA (Table 2).

Table 2 Population breakdown and predicted prevalence of afflicted individuals for known pathogenic variants present in the gnomAD database.

The presence of multiple variants within the same populations is consistent with the numerous reports of compound heterozygosity in patients with primary UQ deficiency7. When estimating birth prevalence of compound heterozygosity, pathogenic variants in COQ8A again exhibited a greater prevalence relative to other genes. In fact, the birth prevalence of compound heterozygotes for COQ8A among Ashkenazi Jews (1/725,578), Finns (1/1.4 M) or non-Finnish Europeans (1/1.6 M) alone were all individually greater than the combined prevalence of all other genes (1/17.1 M) (Table 3, full variant-by-variant breakdown in Table S3). We can estimate that 649 individuals worldwide are born as compound heterozygous for pathogenic genetic variants causing UQ deficiency, with 70 in the USA-like population.

Table 3 The predicted occurrence of compound heterozygotes of known pathogenic variants.

Premature stop codons, frameshifts or the disruption of canonical splice sites (LoF) or critical protein residues (via missense mutations) are all expected to result in significant impairments to protein function. Although we can expect an unknown proportion of these predicted pathogenic variants to result in embryonic lethality, those that do allow survival to birth are likely to result in clinically significant illness. We therefore determined the birth prevalence of all predicted pathogenic variants in UQ biosynthesis genes, as described in Methods. Across all UQ biosynthesis genes there were a total of 782 predicted pathogenic variants (including all known pathogenic variants), and 618 possible compound heterozygote combinations (summarized in Table 4, complete variant list in Table S4 and Table S5). The two genes with the highest frequency of predicted pathogenic variants (combining homozygotes and compound heterozygotes) were COQ8A and COQ8B, with cross-population average incidences of 1/193,621 and 1/198,391, resulting in a predicted 27,321 and 44,727 afflicted individuals worldwide, respectively, and 391 and 398 afflicted individuals in the USA. The gene with the lowest frequency was COQ3 (1/57 M), with only 146 predicted affected individuals worldwide, and none predicted in the USA. The population with the greatest total frequency of pathogenic variants was that of East Asia (1/20,170), with a predicted 79,423 afflicted individuals worldwide. The variant with the greatest prevalence in any population was COQ4 p.Arg240Cys in the Ashkenazi Jewish population, with a MAF of 0.0001719 (Table S4).

Table 4 Predicted prevalence of homozygous and compound heterozygous afflicted individuals for all known and predicted pathogenic variants.

Considering the occurrence of both homozygotes and compound heterozygotes averaged across all populations, our results predict a global birth prevalence of 1/52,092. However, not all the populations considered are of equal size, and the predicted number of afflicted individuals worldwide was 41,555 due to homozygosity and 85,581 due to compound heterozygosity, for a total of 123,789 (1/48,495). In the USA, our analysis predicts 1,462 afflicted individuals (1/211,917).

Discussion

Overall, our results predict a worldwide total of 123,789 individuals suffering from primary UQ deficiency, and 1,462 in a population with a composition similar to the USA. Of these, 1,665 and 192 respectively are due to variants that are known to be pathogenic, with the remainder due to predicted LoF and pathogenic missense variants (summarized in Fig. 1A and B). However, the extent to which known pathogenic variants contributed to the total varied between populations. The addition of predicted LoF variants has less impact for Western populations (Ashkenazi Jews, Finnish and non-Finish Europeans: blue in Fig. 1), with inclusion of predicted pathogenic variants resulting in an average 3.5-fold increase in the number of afflicted individuals, relative to known pathogenic variants only (Fig. 1c). In contrast, in populations from non-Western, developing regions (South and East Asians, Latin Americans and Africans: red in Fig. 1), the addition of predicted LoF variants resulted in an average 122-fold increase in the number of afflicted individuals (Fig. 1c). The increased likelihood of pathogenic variants to have been identified in Western populations is consistent with the reality of their relatively higher clinical coverage compared to non-Western populations, where the expense of clinical sequencing has limited the genetic characterization of patients suffering from mitochondrial disease. Our results imply that primary UQ deficiency is substantially under-diagnosed in Latin American, African and Asian populations.

Figure 1
figure 1

Prevalence of primary UQ deficiency based on known and predicted pathogenic variants. (A) Predicted number of afflicted individuals due to compound heterozygosity or homozygosity of known or predicted pathogenic variants, as denoted on x-axis, for each population. (B) Frequency of afflicted individuals within each analyzed population. (C) Contribution of known or predicted pathogenic variants to the frequency or number of afflicted individuals within each population. The fold-difference between known pathogenic variants only, and the total of known and predicted pathogenic variants, is shown on the x-axis.

There are several factors that could induce error in our predictions. For example, LoF variants may be so harmful that a homozygous individual is not viable in the first place. That this is possible is supported by the embryonic lethality of the complete genetic ablation of PDSS2, COQ2, COQ3, COQ6 and COQ7 in mice16,17,18,19, with COQ4 exhibiting pre-weaning lethality17. In contrast, COQ8A23 and COQ924 –null mice have been reported as viable. Indeed, among patients with pathogenic variants in the UQ biosynthesis genes likely to be necessary for life (PDSS1 – COQ7), very few are homozygous or compound heterozygous for severe variants expected to result in significant LoF. Among the severe variants (nonsense, frameshift, splice site affecting) for these genes described in the literature we reviewed, only COQ2 p.Asn401Ilefs*15 was present in homozygous form, resulting in multi-organ failure and death in an infant patient25, and there was only one patient compound heterozygous for LoF variants (COQ6 p.Trp447* and p.Gln461fs*47826). In all three variants the region affected was close to the C-terminus (closer than any other known pathogenic variant for these genes), implying that these patients may have retained some partially functional protein, and that other severe variants may have resulted in complete LoF and embryonic lethality.

It is therefore likely that some of the LoF variants that contribute to our final totals may not actually contribute to disease rates due to embryonic or pre-natal lethality. Variant severity is not easy to predict – for example, COQ9R239X mice that express a partial protein have a much more severe phenotype than COQ9Q95X mice with no measurable protein expression, presumably due to the destabilization of a multiprotein UQ biosynthesis complex by the truncated protein27. However, homozygous or compound heterozygous severe variants in PDSS1 through COQ7 account for only 6,142 out of 123,789 predicted individuals worldwide, and 219 out 1,462 in the US. This suggests that our predictions are not greatly inflated by the inclusion of embryonically lethal allelic combinations.

Our predictions may also suffer from the opposite problem - missense variants identified as damaging by SIFT or PolyPhen2 may, in fact, not have deleterious physiological effects. We attempted to address this by requiring our “predicted pathogenic” variants to be rated as highly likely to be deleterious by both PolyPhen2 and SIFT, but such prediction algorithms are clearly not infallible. For example, COQ4 p.Arg145Gly was rated as “tolerated” by SIFT, yet was reported in homozygous form in a neonate who died 4 h after birth, and it also failed to rescue Δcoq4 yeast28. It is therefore reasonable to expect a certain proportion of predicted-pathogenic missense variants to result in asymptomatic individuals. Interestingly, missense variants seem to be responsible for a lesser proportion of COQ8A and COQ8B-deficient individuals, with patients homozygous or compound heterozygous for LoF variants being relatively common29,30,31,32. Given that COQ8A alone can rescue COQ8-null yeast33, and COQ8A patients with truncating nonsense mutations shown to result in nonsense-mediated decay remained viable in their mid-20’s32, it is likely that these genes may be relatively insensitive to some borderline-pathogenic missense variants. This has the potential to greatly impact our predictions, with homozygous or compound heterozygous variants in COQ8A and COQ9B accounting for 46,654 out of 123,789 predicted affected individuals worldwide, and 559 out of 1,462 in the USA.

COQ8A and COQ8B are also noteworthy in that most of the known patients have relatively well-defined, gene-specific, pathologies. Specifically, symptoms of ataxia (often associated with cerebellar atrophy or other neurological abnormalities) are found with 26 of the 29 known pathogenic variants of COQ8A, and all 13 of the published COQ8B pathogenic variants exhibited nephrotic syndrome (citations provided in Table S2). It would therefore be tempting to claim that our predicted patients would exhibit similar clinical conditions, with, for example, all predicted COQ8B patients suffering from nephrotic syndrome34. However, it is likely (and our results support) that only a subset of primary UQ deficiency patients have been identified at this point, and they may be non-representative of the actual patient population. Of note, many of the known pathogenic variants were identified in studies where clinicians screened cohorts of patients with specific subsets of well-defined symptoms. For example, our knowledge of COQ8B variants largely comes from two studies in which large numbers of patients with nephrotic syndrome were subjected to sequencing of either whole exomes or multi-gene panels designed for nephrotic syndrome29,35. A similar issue can be raised for the ataxic nature of COQ8A variants. For example, two studies described how, after identifying pathogenic COQ8A variants in ataxic patients, they proceeded to sequence COQ8A in other ataxic patients, identifying additional novel pathogenic variants30,32. Additional pathogenic variants were found in later studies in which COQ8A, alone or in combination with other UQ biosynthesis genes, was specifically sequenced in ataxic patients31,36. We hypothesize that future COQ8A or COQ8B patients identified via less targeted methodologies may present with more diverse clinical phenotypes, as is characteristic of other UQ biosynthesis genes such as COQ2 or COQ4.

There are also several factors that could increase the number of afflicted individuals beyond our estimates. For example, we conservatively assumed that primary UQ deficiency is always recessive; however, haploinsufficiency of COQ4 has been shown to cause clinically significant primary UQ deficiency37. Also, violations of Hardy-Weinberg equilibrium (e.g., consanguinity or populations with a large degree of endogamy) could increase the likelihood of an individual being born with two pathogenic variants. It is also noteworthy that 6 of the 29 missense variants known to be pathogenic would not have met our criteria for inclusion as “predicted” pathogenic variants, since they were not assigned the highest level of confidence for pathogenicity by both SIFT and PolyPhen2 (Table 1). This supports the conservative nature of our selection criteria.

Furthermore, there are several reasons why truly pathogenic variants may not appear on our list of known variants. Some variants may have been identified in clinics without being formally described in the literature. For example, COQ2 p.Met128Val and p.Arg387* have been cited as pathogenic in the secondary literature38, but without a formal research citation they would not have met our inclusion criteria as known pathogenic variants. Furthermore, although the latter variant was included as a predicted pathogenic variant, the former was assessed as ‘benign’ and ‘tolerated’ by Polyphen and SIFT respectively, excluding it from our list of predicted pathogenic variants. In addition, many predicted pathogenic variants were more common in non-western populations, meaning that they are less likely to have been identified in the existing clinical reports, which have focussed on western populations. Additionally, our list of known pathogenic variants may not have included variants detected as part of recent large-scale studies39,40,41, and the fact that some UQ biosynthesis genes were found to be associated with disease earlier than others (e.g., COQ2 was first found in 200642, vs. COQ8B in 201329 and COQ7 in 201543) could have delayed the introduction of some genes into widely used genetic screening panels44, meaning that more patients were screened for some genes compared to others. Finally, after the literature review phase of our analysis was concluded, novel pathogenic variants have continued to be described in the clinical literature (e.g., COQ445, COQ646, COQ747, ADCK448,49), indicating that many remain to be reported.

Several aspects of our results point towards their general reliability. For example, there have been no reports of pathogenic variants in COQ3 or COQ5, which is consistent with our prediction of few individuals with primary UQ deficiency due to pathogenic variants in these genes (less than 2,000 individuals worldwide, and only 4 in the USA). Conversely, more patients with defects in COQ8A and COQ8B have been described than for any other UQ biosynthesis gene8, which corresponds to our finding that pathogenic variants in these genes make the greatest contribution to the number of individuals worldwide predicted to suffer from primary UQ deficiency, together accounting for more than half of the predicted 127,136 patients worldwide.

In conclusion, we have made the first estimates of the worldwide and within-population birth prevalence of individuals who are homozygous or compound heterozygous for pathogenic variants causing primary UQ deficiency by combining a decades-worth of clinical genetics with the recently available large-scale full exome/genome sequencing. Our calculations suggest a minimum of 1,665 afflicted individuals worldwide or 192 in the USA (using only variants clinically shown to be pathogenic), up to a maximum of 123,789 worldwide or 1,462 in the USA (with all variants predicted to be pathogenic). Notably, the gap between predictions made using “known” vs. “predicted” pathogenic variants appears smallest in populations expected to have the greatest access to the modern methodologies of clinical genetics. This implies that healthcare providers have already made substantial headway in identifying individuals suffering from this disorder. However, it remains likely that the bulk of patients worldwide suffering from primary UQ deficiency have yet to be recognized.