Introduction

Aggregation of large-scale sequencing data and increasing clinical use of whole-exome sequencing (WES) for genetic diagnosis, is uncovering genetic variation that aids in annotation and interpretation of disease alleles. Increased inclusion of WES data from diverse populations, underrepresented in genomic databases uncovers clinically significant genetic variation that advances our understanding of disease alleles and underlying biology. Elucidating rare genetic diversity and how it contributes to Mendelian inherited disorders has implications for community genetics, where a better understanding of the ancestry-specific genomics can direct carrier testing, presymptomatic diagnosis, and potential interventions to delay the onset of symptoms in a community-specific manner.

Autosomal recessive mitochondrial complex I deficiency (MCID) accounts for approximately 23% of childhood respiratory chain deficiency cases [1]. Mitochondrial complex I is composed of 44 structural subunits and over 10 assembly factors, which underscores its diverse clinical manifestations that diverge based on severity and age of onset [2]. The clinical presentations range from infant onset subacute necrotizing encephalomyelopathy (Leigh syndrome [MIM: 256000]) to adult-onset exercise-induced myopathy. Genetic studies have identified disease-causing variants in 11 nuclear-encoded complex I genes [MIM: 252010], including NDUFV1 [3,4,5,6,7,8,9,10,11,12,13].

In the present study, we report genetic findings from two unrelated probands from the South Asian population (Country of origin: India, region: Southern India) that presented with divergent features of MCID. Proband 1 exhibited symptoms of MCID at 6 months of age and presented with a novel homozygous c.1118T > C (p.(Phe373Ser)) missense variant in exon 8 of NDUFV1. Proband 2 was homozygous for the previously reported c.1156C > T (p.(Arg386Cys)) variant, also in exon 8 of NDUFV1 (Figs. 1c, 2a), but did not exhibit symptoms until 6 years of age [14,15,16]. The discrepancy in clinical presentation and age of onset between these cases is supported by the molecular impact as modeled on published crystal structures of mitochondrial complex I. The NDUFV1 p.(Arg386Cys) variant exhibits higher frequency in South Asian population suggesting a founder effect with implication for community genetics [14].

Fig. 1
figure 1

Brain imaging and NDUFV1 variants in Proband 1 and 2. a T1-weighted axial brain MRI of proband 1 at 6 months demonstrates pachy diffusion involving centrum semiovale, corona radiate (arrows), and periventricular external capsule (stars). Proband 2 T2 weighted axial brain MRI at 6 years of age shows diffuse white matter demyelination with gliotic areas and atrophy. b Pedigrees for family 1 and family 2, both of South Asian population. Filled symbols denote individuals affected with the MCID. Double lines denote consanguinity. c Sequence chromatograms showing the biallelic inheritance of NDUFV1 missense variants (black arrows) in proband 1 and 2 (lower panel, proband) consistent with consanguinity in the family

Fig. 2
figure 2

a Schematic showing the intron-exon structure of human NDUFV1. The cDNA sequence from position 1111 (NM_007103.3) and the corresponding peptide sequence from Ile371 to Glu387 are shown. Genetic variants that affect function or amino acid substitutions identified in mitochondrial complex I deficiency patient are marked in red. b Structure of mammalian mitochondria and close-up depicting membrane embedded respiratory chain complexes. The circled region shows the N module of complex I, composed of NDUFV1, NDUFV2, and NDUFS1 proteins. Crystal structure of N module derived from T. thermophilus (3IAM), where colors denote distinct proteins (blue = NDUFV1, pink = NDUFS, and green = NDUFV2). Colored elements: Fe–S cluster = red and orange, FMN binding pocket = blue and green, and missense variants identified in proband 1 and 2 are encircled with bold circle. Dashed red lines denote the trajectory of shuttled electrons. c Magnified image of substituted amino acid close to FMN binding pocket in proband 1. The amino acid in black protruding out from alpha helix is the phenylalanine at position 373. The variation p.(Phe373Ser) disrupts the FMN binding to the NDUFV1. (d) Magnification of amino acids close to Fe–S cluster disrupted by proband 2 substitution. Arginine at position 386 is shown in red. The p.(Arg386Cys) substitution will alter buffering of Fe–S clusters and electrons transfer across NDUFV1

Clinical reports

Proband 1

One-year-old proband 1, the first born of third-degree consanguineous parents (Fig. 1b), was referred for developmental delay. He was born at full term by normal vaginal delivery. He weighed 2.5 kg at birth. An excessive cry was noted. Seizures were first observed at the age of 6 months. His development was delayed mildly, with momentary loss of head control, and roll over at 7 months of age. He was diagnosed with myopia. He started smiling and reaching for objects after the use of spectacles. On clinical examination at the age of 1 year, height was 78.5 cm (−0.9 SD), head circumference was 49 cm (+1 SD), and weight was 9.8 kg (−1 SD). He was observed to have bilateral lower set ears, nystagmus, mosaic pigmentary anomalies, hepatomegaly, and spasticity in lower limbs, extreme plantar responses and brisk deep tendon reflexes. Electroencephalogram and fundus examinations were found to be normal. Brain imaging sequences show diffuse hyperintensity in the cerebral white matter, cerebellar white matter and brainstem white matter, and small cystic areas in the periventricular white matter (Fig. 1a). Magnetic resonance spectrometry showed reduced N-acetyl aspartate and elevated choline levels and an inverted double peak of lactate.

Proband 2

Proband 2, second born of third-degree consanguineous parents (Fig. 1b), developed normally until the 6 years of age at which time he presented with neuroregression, mild cognitive decline with regressive speech deficiencies, bilateral optic atrophy, and marked motor decline. Proband 2 was of normal height and weight at 137 cm and 25 kg. History of seizures was noted. Sibling with similar clinical features died at the age of 15 years. Physical examination showed a height of 137 cm (−1 SD), head circumference of 46 cm (−4 SD), and weight 25 kg (−3 SD) with spasticity in all four limbs, clonus, and nystagmus. Optic atrophy was reported on ophthalmology evaluation. However, blood lactate and pyruvate levels were within normal range. Brain imaging displayed a diffuse white matter demyelination with cystic areas consistent with neurodegeneration (Fig. 1a). Based on these features, proband 2 was provided with a diagnosis of leukodystrophy.

Material and methods

Ethics statement and clinical sample collection

The present study is a part of an ongoing combined clinical/research project (Ethical approval number: Indo-Foreign/Neuro/154/2015) that started in the year 2013 to recruit individuals with inherited neurodevelopmental disorders. A total of 450 consanguineous families from Southern India have been recruited to date. The parents provided written informed consent for WES. DNA was isolated from peripheral blood by standard procedures [17].

Whole-exome sequencing

Exome libraries of affected and unaffected genomic DNA were generated using the Illumina TruSeq DNA Sample Prep kit, following the manufacturer’s instructions. Parents-proband WES was performed on family 1. Coding sequences were prepared and captured with the Agilent SureSelect All Exon kit-v4 and sequenced on an Illumina HiSeq 2500 instrument as described previously [18]. Only proband 2 underwent WES in family 2, as previously described [19].

Variant calling and filtering

WES data was processed using GATK callers and SeqMule. ANNOVAR was used to functionally annotate the detected genetic variants [20]. Variants were further filtered against public databases such as 1000 Genomes Project phase 3, ExAC, National Heart, Lung, and Blood Institute and Exome Sequencing Project Exome Variant Server (ESP6500SI-V2). Variants flagged as low quality or putative false positives (Phred quality score < 20 and low quality by depth < 20) and minor allele frequency > 1% was excluded from the analysis.

Protein modeling

Mitochondrial complex I crystal structure of bacterial [T. thermophilus (PDB:3IAM)] and bovine [Bos Taurus (PDB:5lc5)] was obtained from Protein Data Bank (http://www.rcsb.org/pdb/). Sequences were mapped to Homo sapiens using ClustalW. Pymol software was used to model the structure of N-modules of bacterial and bovine mitochondrial complex I.

Results

Variant prioritization

In Proband 1, seven rare variants, including those in NDUFV1 (NM_007103.3) were prioritized as candidates for the clinical presentation. Among these, only one was responsible for neurodevelopmental disorders; homozygous missense variant in NRXN2 was ruled out because of its prevalence in public databases and inconsistencies in associated phenotypic outcomes (Supplemental Table 1). Alternatively, the novel homozygous missense c.1118T > C (p.(Phe373Ser)) NDUFV1 variant accounts for the MCID phenotype that overlaps with the clinical features of proband 1. The NDUFV1 c.1118T > C (p.(Phe373Ser)) variant was not detected in ExAC browser. The variant was submitted to ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/variation/431452/; Submission Accession: SCV000588197.1).

However, in proband 2, three rare variants were prioritized in the genes NDUFV1, KIRREL3, and TFG. Among these, the biallelic NDUFV1 variant c.1156C > T (p.(Arg386Cys)) located in exon 8 (numbered as in NG_013353.1; NM_007103.3) was deemed to affect function in silico by PolyPhen-2 http://genetics.bwh.harvard.edu/pph2/) supported by a NDUFV1 c.1157G > A (p.(Arg386His)) substitution previously associated with severe MCID and onset of symptoms in infancy (Fig. 2a) [21]. Variant p.(Arg386Cys) is a previously published variant in additional unrelated individuals from the South Asian population and its submission accession is SCV000566902.2. The parents were heterozygous for both the variants identified in Proband 1 and 2, as confirmed by Sanger sequencing and consistent with consanguinity (Fig. 1c)

Protein modeling analysis

We modeled the functional impact of proband 2 homozygous NDUFV1 p.(Arg386Cys) variant relative to the previously characterized p.(Arg386His) variant that alters the same amino acid Arg386, on the bacterial and bovine crystal structure of mitochondrial complex I [22,23,24,25] (Fig. 2b). The amino acid substitution p.(Arg386Cys) is predicted to disrupt the protein–protein interactions that facilitate Fe–S cluster buffering. In the three-dimensional conformation of complex I, Arg386 is in close proximity to Cys385, which participates in Fe–S cluster stabilization within the complex and buffering (Fig. 2d). The volume of the Cys side chain is small compared to that of His as observed in amino acid substitution p.(Arg386His). Increased bulkiness of His may perturb local interactions between Fe–S binding motifs of NDUFV1. However, the functional consequence of having two consecutive Cys at the Fe–S binding motif, as would be the case with the p.(Arg386Cys) substitution is predicted to be less disruptive to the Fe–S binding motif, and further compensate for this substitution by facilitating Fe–S cluster buffering.

Protein modeling and crystal structure analysis of p.(Phe373Ser) suggests that the variant is located in FMN/NADH binding site of complex I and is involved in FMN binding. Substitution of the highly hydrophobic Phe to a small and polar Ser is predicted to diminish the affinity for FMN to the active pocket of NDUFV1, altering the first step of electron transfer that promotes the redox activity of complex I. We surveyed the published variants in NDUFV1 relative to the functions ascribed by yeast genetic screens to demonstrate the predictive value of evaluating pathogenicity relative to the available mammalian crystal structure (Fig. 2c). These findings are summarized in supplemental Table 2.

Variant interpretation: ACMG guidelines

According to the variant classification guidelines of ACMG, genetic variant p.(Phe373Ser) is categorized as “likely pathogenic” whereas variant p.(Arg386Cys) is categorized as “pathogenic” (Supplemental Table 3).

Discussion

Mitochondrial complex I of the respiratory chain functions to liberate and transfer electrons from NADH to ubiquinone for ATP production. Missense variants in NDUFV1 can disrupt three important functions of complex I: (1) binding of FMN and NADH, (2) transfer of electrons between iron–sulfur clusters, and (3) structural integrity required to maintain the interactions between complex I subunits. Yeast functional studies have proven useful for assaying deleterious alleles in NDUFV1, but do not differentiate between alleles corresponding to the clinical outcome of differing severity. Modeling the impact of missense substitutions in NDUFV1 relative to these functions, using the bacterial and bovine complex I crystal structures, provide evidence to support the molecular impact of deleterious amino acid substitutions. Using this approach, we provide a rationale for how substitutions of the same amino acid residue, i.e., p.(Arg386Cys) and p.(Arg386His) result in MCID with different age of onset and clinical outcomes. Genetic variants p.(Arg386Cys) and p.(Arg386His) were both classified as strongly functionally impacted variants in the yeast genetic screens [24]. Based on the bacteria and bovine crystal structure, the p.(Arg386His) substitution results in an amino acid side chain that is bulkier than the p.(Arg386Cys) substitution. The bulkier His side chain is predicted to more severely disrupt protein–protein interaction between NDUFV1 and NDUFS1 and buffering between the Fe–S clusters through complex I than the Cys, providing a molecular rationale for the less severe later onset MCID associated with the p. (Arg386Cys) variant, as compared with p. (Arg386His) that is associated with early onset of severe clinical manifestations followed by death in infancy [21].

Functional validation of our novel p.(Phe373Ser) variant has not been performed in yeast. Modeling studies suggest that p.(Phe373Ser) lies in the close proximity to Arg88, Lys111, Ala117 and Glu246, NDUFV1 amino acids shown to be deleterious in functional assays when mutated to clinically relevant MCID substitutions. These amino acids maintain the conformation of the FMN/NADH binding pocket, implicating a role for variant p.(Phe373Ser) in complex I function. This analysis approach predicts that p.(Phe373Ser) affects complex I function and the severity of clinical outcomes can be traced back to the nature of the amino acid substitution, with those most dissimilar to the wildtype amino acid negatively impacting FMN binding kinetics and clinical outcomes.

Heterozygous NDUFV1 c.1156C > T (p.(Arg386Cys)) was detected at a frequency of 0.0001 in the ExAC database. While rare, it is noteworthy that 13 of the 15 c.1156C > T (p.(Arg386Cys)) alleles described were observed in South Asian populations and accentuates a pervious report that associated this allele to late-onset MCID, in unrelated South Asian families [14]. Additionally, our inherited neurodevelopmental disorder cohort of 450 consanguineous families also identified NDUFV1 c.1156C > T (p.(Arg386Cys)) in one of the two families. Altogether, the recessive inheritance of c.1156C > T, independent of consanguinity, suggests this allele is a founder variant, yet defining enrichment in this population is complicated by the underrepresentation of this population in ExAC or related sequence databases [16]. Our results demonstrate the value of including genetically diverse populations in genomic medicine research. Consequently, first-tier genetic screening of c.1156C > T (p.(Arg386Cys)) may prove to have high molecular diagnostic yield for South Asian children that present with cardinal features of late-onset MCID.

In conclusion, our results provide a rationale for how substitutions of the same amino acid residue can be associated with different ages of onset. We also demonstrate the value of including ancestrally diverse population in genomic medicine and clinical research studies, to improve the reliability of molecular diagnosis and reduce global health disparities.