Investigating the influence of mtDNA and nuclear encoded mitochondrial variants on high intensity interval training outcomes

Mitochondria supply intracellular energy requirements during exercise. Specific mitochondrial haplogroups and mitochondrial genetic variants have been associated with athletic performance, and exercise responses. However, these associations were discovered using underpowered, candidate gene approaches, and consequently have not been replicated. Here, we used whole-mitochondrial genome sequencing, in conjunction with high-throughput genotyping arrays, to discover novel genetic variants associated with exercise responses in the Gene SMART (Skeletal Muscle Adaptive Response to Training) cohort (n = 62 completed). We performed a Principal Component Analysis of cohort aerobic fitness measures to build composite traits and test for variants associated with exercise outcomes. None of the mitochondrial genetic variants but eight nuclear encoded variants in seven separate genes were found to be associated with exercise responses (FDR < 0.05) (rs11061368: DIABLO, rs113400963: FAM185A, rs6062129 and rs6121949: MTG2, rs7231304: AFG3L2, rs2041840: NDUFAF7, rs7085433: TIMM23, rs1063271: SPTLC2). Additionally, we outline potential mechanisms by which these variants may be contributing to exercise phenotypes. Our data suggest novel nuclear-encoded SNPs and mitochondrial pathways associated with exercise response phenotypes. Future studies should focus on validating these variants across different cohorts and ethnicities.

www.nature.com/scientificreports/ with no studies looking at the more subtle effects of synonymous and non-coding changes 11,[17][18][19][20] . Further, these studies have often based haplogroup analyses on sequencing or genotyping of the mitochondrial hypervariable region(s) (~ 500-1,000 bp), with no consideration for the remaining mitochondrial genome (~ 15,000 bp) and the specific haplogroup of exercise participants. For instance, 3`UTR (untranslated regions) variants that do not directly affect protein function may however affect translation, mRNA shuttling to specific organelles, or epigenetic modification such as microRNA silencing 21 . Intronic variants may also lead to splice site changes directly contributing altered protein structure and function 22 . As Next Generation Sequencing has become more widely available and affordable, sequencing of the whole mitochondrial genome (16,569 bp) is now feasible to uncover genetic variants associated with physical fitness phenotypes. When used in combination with SNP genotyping arrays, it is possible to examine, not only the 37 mitochondrially-encoded genes, but variants within all nuclear NEMP genes simultaneously. Genetics may influence exercise response in conjunction with environmental factors such as diet, repeated exercise bouts, and age. Whilst these are modifiable, it is difficult to gauge the contribution of these factors to exercise response within short term exercise studies. Further, the additive effects of genetic variants to exercise response are not well understood as only a few genetic variants have been consistently replicated in the field. Therefore, the aim of the present study was to examine the association between genetic variants (i.e. mitochondrial variants and NEMPs), and aerobic fitness measures in the well-characterised Gene SMART cohort. We hypothesise that by utilising whole-mitochondrial sequencing, we will uncover novel genetic variants associated with exercise responses.

Exercise responses and principal component analysis (PCA). Participant characteristics and
response to exercise for all phenotypes are detailed in Table 1. P-values shown for delta variables are respective of one tail of a paired samples t-test.
From our mtDNA sequencing, we obtained an average depth coverage of 615X over the mitochondrial genome. Following sample annotation with Mitomaster, we found that there were 60 distinct haplogroups within the Gene SMART completed cohort of 62 participants. As such, there were no statistically significant associations between the mitochondrial haplogroups with exercise response traits. A summary table of the mitochondrial haplogroups found within the Gene SMART participants is shown in Table 2. The confidence scores (0-1) represent the number of mtDNA variants found in each participant that belong to their respective haplogroup.
Association between genetic variants (mitochondrial encoded and nuclear encoded) and exercise phenotypes. Following  www.nature.com/scientificreports/ chondrial genomic variants for each trait is shown in Fig. 1 23 . The exonic variants passing the nominal threshold from the mitochondrial association results are summarised in Table 3. 28 variants passed the nominal significance threshold of P unadjusted < 0.05 in various delta traits and principal components. Of these, 8 were located within the hypervariable control region and therefore discounted from further analyses. A further 2 genetic variants were located within a mitochondrial rRNA gene, 1 within the tRNA Leu gene, 1 within the mitochondrially encoded ATP synthase membrane subunit 6 (ATP6) gene, 2 within the mitochondrially encoded NADH: ubiquinone oxidoreductase core subunit 4 (ND4), 2 in mitochondrially encoded NADH: ubiquinone oxidoreductase core subunit 5 (ND5) and 1 in mitochondrially encoded cytochrome B (CYB). None of the mitochondrial genomic variants were associated with composite response traits or individual response traits at FDR < 0.05. A manhattan plot of the NEMP variants is shown in Fig. 2. A summary of the association statistics for the variants passing a nominal threshold of P unadjusted < 1e −4 in both the NEMP associations are shown in Table 4.
A full list of variants reaching the nominal P-value threshold (< 0.05) may be found in (Supplementary  Table SI). 6 SNPs in 5 distinct genes were associated with ΔTT and 2 SNPs in 2 distinct genes were associated with PC2. The most significant variant was rs2041840 associated with PC2 and located within NDUFAF7; we Table 2. Summary of mitochondrial Haplogroups within the Gene SMART study. *Sample SG168 contained sequence identical to the rCRS reference genome and therefore stratification into mtDNA haplogroup was not based on genetic variation but sequence homology to the reference. www.nature.com/scientificreports/ found that the rs12712528 variant also within NDUFAF7 had a moderate correlation with rs2041840 (R 2 = 0.5) Fig. 3e. This variant was also trending towards significance in the Δ-Weight and Δ-VO 2max response phenotypes ( Table 4). The T allele at rs2041840 was associated with a better response to exercise. The Locus Zoom plot (Fig. 3d) surrounding the MTG2 gene was also gene-rich with 11 proximal genes. The two associated variants (rs6062129 and rs6121949) were moderately correlated (R 2 = 0.5), however there were no SNPs found within the proximal genes. The locus zoom plot for the variants found within the AFG3L2 gene ( Fig. 3f) was proximal to 6 genes within 200 Kb. There was also a proximal SNP within the SLMO1 gene however this was not in linkage with the variants identified within the AFG3L2 gene.

Discussion
In this study, we utilised state-of-the-art mitochondrial sequencing, along with high-throughput targeted genotyping of mitochondrial-related variants encoded by the nucleus (NEMPs) to discover novel genetic variants associated with responses to exercise. A total of 28 mitochondrial and 4,325 nuclear encoded mitochondrial associated variants passed the nominal significance thresholds for the various candidate gene association tests.  www.nature.com/scientificreports/ We did not detect mitochondrial variants associated with exercise response, but we uncovered eight NEMPs in seven distinct genes associated with exercise response. It should be noted that we have attempted to control for the contribution of environmental stimuli on each exercise phenotype within our study. For instance, the www.nature.com/scientificreports/ arguably largest contributors to exercise response (diet, age, repeated bouts) were carefully scrutinised in order to ascertain the genetic contribution to each phenotypic trait. Further, we have ascertained which genetic variants contribute to multiple phenotypic traits using composite traits built with PCA data reduction.
Novel exercise loci. The most significant variant was associated with the composite exercise response phenotype and located within an intron of NDUFAF7 (rs2041840). The T allele was associated with better exercise response as shown by the positive beta values. NDUFAF7 encodes an arginine methyltransferase that is essential for mitochondrial complex I assembly 24 . We have showed that this variant was in a gene rich region with 8 proximal genes (Fig. 3a), indicating possible effects for this variant in any of the proximal genes or indeed for genes that may be further away from the loci. Specifically, the interaction between this variant and the Glutaminyl-Peptide Cyclotransferase (QPCT) and Protein Kinase D3 (PRKD3) genes has been previously described in a recent GWAS study 25 . In a recent RNAseq profiling study of exercise training, it was demonstrated that the QPCT gene was upregulated following 12 weeks of training 26 . As such, we expected the variant seen within the NDUFAF7 gene to be associated with differing levels of the QPCT transcript following prolonged exercise training.
The two intronic variants within the MTG2 gene were found to be associated with the change in time trial measures and appeared to be moderately linked (Fig. 2b). The MTG2 gene resides in a gene rich locus with 11 proximal genes. When assessed for functionality within the UCSC genome browser, we noted that both the MTG2 variants were located in a regulatory element (GH20J062181) that interacts with the MTG2 transcription start site. Further, there was a large amount of layered H3K27 acetylation at the variant site, and the linked MTG2 promoter region. The MTG protein regulates the assembly and function of the mitochondrial ribosome. As such, dysregulation of the gene could result in the downregulation of mitochondrial translation, and therefore a lower response to exercise training. The variants also showed a 20% recombination rate with the 5` www.nature.com/scientificreports/ region of the TAF4 gene. The TAF4 protein forms part of the transcription factor II D (TFIID) complex and has a central role in mediating promoter responses to transcriptional activators and repressors. Dysregulation of this gene could introduce global translational repression and therefore lack of response to HIIT training. This is supported by the positive effect size for the C and G alleles of the MTG2 variants rs6062129 and rs6121949 respectively (β = 587.7 s). An intronic variant within AFG3L2 was also shown to be associated with the composite exercise response phenotype (rs7231304), but this gene has not previously been associated with exercise response. However, mutations in AFG3L2 have been shown to cause spinocerebellar ataxia through the development of mitochondrial proteotoxicity 27,28 . As such, the intronic variation within this gene might inhibit exercise response through dysregulation of mitochondrial structure and function. Further, this variant is in a locus with 6 proximal genes (Fig. 3c), however no genes within this locus shared a recombination rate above 10%. There were two proximal SNPs with a moderate correlation to the SNP of interest also within the AFG3L2 genic region. When assessed for functionality through the UCSC genome browser, we noted that the SNP was within a DNAseI hypersensitivity region, and therefore may have effects on the mRNA half-life rather than protein functionality in response to training.
The T allele at the exonic rs7085433 variant in the TIMM23 gene was associated with the change in time trial phenotype (Δ-TT) causes a non-coding transcript of the TIMM23 gene. This gene is one of the targets of transcriptional activators NRF-1 and GA binding protein (GABP/NRF-2) 29 , in which we have previously shown genetic variants associated with athletic performance 30,31 . TIMM23 is one of the mitochondrial transmembrane subunits that form the mitochondrial protein import (TIM23) complex. Therefore, this subunit is essential for the transport of peptide containing proteins across the inner mitochondrial membrane. The non-coding transcript resulting from the variant would render the complex non-functional and as such impaired transport of biomolecules across the inner mitochondrial membrane may impair exercise potential. The effect size of this variant was very highly positive (β = 587.7 s) and therefore, this non-coding transcript may result in a slower time to complete the time trial.
The rs1063271 variant lies within the 3` Untranslated Region (UTR) of the SPTLC2 gene. UTR variants have been shown to influence transcript half-life; through the dysregulated binding of transcript shuttle proteins; or change the binding site of miRNAs resulting in epigenetic silencing of the gene 32 . As many current miRNA binding site analysis tools require targeted sequences, we examined the interaction between specific miRNAs previously found within exercise training with the 3`UTR of the SPTLC2 gene 33 . We noted that all of the miR-NAs included in our STarMir curation were able to bind to the 3`UTR of the SPTLC2 gene in both seed and seedless sites. As such, it was not possible to indicate a specific miRNA mechanism within the context of this study although we note that this genetic variant in SPTLC2 should be computationally explored in future studies. The SPTLC2 protein is involved in the de novo biosynthesis of sphingolipids by forming a complex with its counterpart; SPTLC1 34 . Overexpression of this protein has also been shown to cause elevated sphingolipid formation and therefore mitochondrial autophagy 35 . Much like the TIMM23 rs7085433 variant, the effect size for time to completion in Time Trial (β = 587.7 s) indicated that carriers of T allele/genotype have slower TT and therefore poorer response to exercise when compared to carriers of the C allele/genotype. We hypothesise that the T allele for this variant may induce a novel miRNA binding site in the transcript resulting in the silencing of the SPTLC2 gene.
The rs11061368 variant lies within an intronic region of the DIABLO gene. The protein encoded by this gene functions to induce apoptotic processes through the activation of caspases in the Cytochrome C/Apaf-1/ caspase-9 pathway. When viewed in UCSC genome browser, it was evident that the SNP was not affiliated with any regulatory elements and therefore we were unable to determine the true functionality of this intronic variant. However, we postulate a molecular mechanism that should be explored in future exercise related studies. Although the associated variant does not show functionality within this gene, the dysregulation of the DIABLO gene could prevent adequate muscle remodelling resulting in the lack of response to training. Further, the variant also lies ~ 50 Kb away from the Interleukin 31 (IL31) gene, a pro-inflammatory cytokine associated with the activation of Signal Transducer and Activator of Transcription 3 (STAT3) pathways, which have already been extensively studied and implicated in exercise training responses.
The FAM185A gene has to date had limited previous research and as such we were unable to elucidate any specific molecular function within the context of exercise training. However, the gene has been previously associated with plexus-forming angiogenesis within the context of foetal lung tissue 36 . It is plausible that the gene is involved in angiogenic processes outside embryonic development. Further, the gene is proximal to 9 other genes within 200 Kb. There was no evidence of high recombination rates with any of the proximal genes, and there were no proximal SNPs correlated with the rs113400963 polymorphism. When we assessed the polymorphism using the UCSC Genome browser to determine functional consequence of this variant and found no link between the intronic variant with any regulatory or epigenetic regions.

Mitochondrial.
None of the mitochondrial genetic variants identified in this study were associated with exercise response in the present study to a threshold of FDR < 0.05. Additionally, we lacked enough statistical power to associate mitochondrial haplogroup with exercise responses as the cohort was extremely heterogeneous. Although nominal significance was achieved, due to the hypervariable nature of the control region, we chose not to focus on the SNPs within this region.
The g.A8701G variant within the ATP6 gene causes a missense change within its respective protein (p.Thr59Ala) and has been well characterised in hypertensive cases 37 . This variant was nominally significant in both the Δ-LT phenotype and the PC3 composite trait within the cohort. As the Δ-LT trait provided a smaller www.nature.com/scientificreports/ contribution to PC3, the variant was assumed to be partially associated with a mixture of the Δ-TT and Δ-VO 2max phenotypes. The effect size of this variant indicated a poor response to exercise training (β = − 26.19 LT). Interestingly, all the variants associated with PC4 were related to the utilisation of the amino acid Leucine. Firstly, the g.A12308G variant within the mitochondrial coding region for the tRNA for Leucine. Whilst the effect of this variant was unclear, it appears to have influenced the composite phenotypes within PC4. Mutations within tRNA genes have previously been associated with reduction in organelle quantity and downregulation of protein synthesis 38 . Secondly, both synonymous variants in the ND4 (g.A11467G) and ND5 (g.G12372A) genes result in a codon that is used far less frequently (CUA [276] > CUG [42] ) in mitochondrial translation processes 39 . As the biosynthesis of tRNAs is costly with respect to intracellular energy levels, it is possible that the combination of the dysregulation of the tRNA leu and the codon usage frequency change in two subunits of the mitochondrial membrane respiratory chain NADH dehydrogenase (complex I) may result in premature intracellular energy (ATP) deficiency and contribute to the poor response to exercise training associated with these traits. It should be noted that the stringent thresholds for association in the mitochondrial association tests could also have resulted in false negative results. Additionally, mitochondrial genetic variants rarely influence phenotypic traits in isolation.
We have identified novel nuclear-encoded, mitochondrial-related SNPs and loci associated with adaptations to High Intensity Interval Training. Additionally, we have postulated the mode of action for different molecular mechanisms that may be responsible for the variability in response to exercise intervention. It should be noted that performing mitochondrial sequencing on muscle tissue as opposed to blood may yield more informative results with heteroplasmic associations due to the high concentration of mitochondria. We note that while we have utilised comprehensive sequencing and high throughput arrays in combination with robust exercise phenotypes, the variants associated with responses in this study, need to be replicated in larger cohorts of both the general population and elite athletes. Further, the variants assessed within the current study were tag SNPs within the genotyping arrays and further information may be gained from the imputation of additional SNPs within the regions we have discovered. This could be achieved by leveraging on large multi-centre initiatives such as the Athlome consortium 40 . Additionally, functional genomic analyses are required to determine the effect of these variants on the molecular pathways commonly involved in exercise response. Such studies could include transcriptomics, epigenetics and functional cell work in a multi-omics approach.

Methods
Participants. At the time of analysis, 77 participants had taken part in the study, 62 of whom successfully completed 4 weeks of High-Intensity Interval Training (HIIT) intervention protocol in the Gene SMART (Skeletal Muscle Adaptive Response to Training) study 41 at Victoria University, Australia. Ethical clearance for this study was provided by the Human Research Ethics Committee at Victoria University (Approval Number: HRE13-233), and the clearance was transferred to and also provided by the QUT Human Research Ethics Committee (Approval Number: 1600000342). All participants provided informed consent prior to the study and all methods were carried out in accordance with relevant guidelines and regulations. We analysed the 62 participants who did not drop out of the study and all had healthy BMI and were moderately trained with an age range of (31.33 ± 7.94 years).
The Gene SMART study design has been previously reported 41 . Briefly, participants were required to provide medical clearance to satisfy the inclusion criteria. Following familiarisation, baseline exercise performance was determined on a cycle ergometer during a 20 km time trial (TT), and two graded exercise tests (GXTs); these tests were administered a few days apart and no more than two weeks apart to limit temporal variability in performance.

Molecular methods.
Genomic DNA was extracted for 77 participants regardless of completion status from 2.0 mL of whole blood using a QIAmp DNA blood midi kit (QIAGEN, Hilden, Germany). Briefly, the concentration and purity of genomic DNA (gDNA) from all samples was assessed via Nanodrop spectrophotometry and Qubit fluorometry. We used an in-house sequencing method recently developed by our group at the Genomics Research Centre, Queensland University of Technology, Australia to sequence the whole mitochondrial genome of each participant 42 . Illumina Infinium Microarray was used on HumanCoreExome-24v1.1 bead chip to genotype all samples for ~ 550,000 loci. For all samples, 1 µg total gDNA was sent to the Australian Translational Genomics Centre, Queensland University of Technology Australia, for SNP genotyping on the arrays. Data filtering. A bioinformatics pipeline (SAMtools, BCFtools) was utilised to generate variant call files (VCF) for all samples as described previously 42 . VCF files were then aligned to the revised Cambridge Reference Sequence (rCRS) and all sequences were stringently left aligned back to this reference genome to account for the single end (SE) reads generated from Ion Torrent sequence information. FASTA files were generated for all samples and then merged VCF and FASTA files were produced for the entire data set. The merged FASTA files were annexed using MITOMASTER, a mitochondrial sequence database, to call haplogroups and obtain variant annotation information for all samples 43,44 . The merged VCF file was converted to PLINK (v1.90p) format using the function '-make-bed' for further association analysis. A detailed description of the analysis pipeline may be found in markdown format in the GRC computational genetics GitHub account (https ://githu b.com/GRC-CompG en/mitoc hondr ial_seq_pipel ine), including all necessary files (NEMP locations BED file) and scripts (mitochondrial Solarplot R script) to replicate our analyses within other data sets.
The ped file generated from Illumina GenomeStudio v2.0 software was converted into binary format. We did not impute any genotypes to prevent false positive associations and a larger multiple testing burden. There were 551,839 typed SNPs; subsequent SNP and individual filtering and trimming was based on (1) SNPs with > 20% missing data (239 removed), (2) individuals with > 20% missing data (0 removed), (3)  Exercise-response phenotypes. Participant stratification into high and low response groups lead to a loss of statistical power in association testing. As such, and to avoid classifying responders and non-responders via arbitrary thresholds, we chose to keep the phenotypes as continuous variables for association testing 50 .
To ascertain variants that were associated with exercise response for key physiological traits, we utilised the delta (Δ) change (Post phenotype-Pre phenotype) quantitative trait data for; peak power output (ΔWpeak in Watts); power at lactate threshold (ΔLT in Watts); peak oxygen uptake (ΔVO 2peak in mL/min/kg body weight); and time to completion measurement for a 20 km time trial (ΔTT in seconds). As the quantitative traits were all continuous and to keep maximal statistical power, we did not use arbitrary response thresholds. With multiple, correlated response phenotypes, we conducted a Principal Component Analysis (PCA) of the response phenotypes using the R package FactoMineR 51 . PCA is a dimensionality reduction method that computes linear combinations of the multiple response phenotypes into principal components (PCs) so that the variance between individuals is maximised. Every individual is then represented by one value for each PC, considered a composite trait of the different response phenotypes. A more detailed description of PCA for composite trait association testing is shown in Supplementary Fig. 1.
Missing phenotypic values were excluded from the phenotype table prior to PCA to prevent skewing of data and to maintain appropriate PCs. Following the PCA, these variables were set as "missing" for the association analysis. We also tested the individual response phenotypes and compared the significance levels of variants between the composite traits with those within each PC. This resulted in 4 PCs that cumulatively explained > 90% of the variance between participants.
Statistical analysis. Analysis of the response traits was performed in SPSS using a paired samples t-test.
SPSS was also used to test associations between mitochondrial haplogroups and exercise response with a Wald test. Analyses for the mitochondrial SNPs and NEMP SNPs were kept separate for analysis using different association models. We used PLINK V1.90p to perform quantitative linear association tests (95% CI) with both dominant and recessive models. An additive model was also attempted but yielded the same results as our dominant model. We adjusted all association results for age and effect sizes were determined using raw beta regression coefficient values (i.e. genotype X is associated with β [unit specific to trait of interest] changes in the phenotype). Variants that passed a nominal P-value threshold of P < 0.05 were considered for further analysis whereas variants that passed multiple testing adjustment using the Benjamini-Hochberg False Discovery Rate (FDR < 0.05) method were considered significant associations. We performed adjustment for multiple testing for each phenotype separately. Performance of an apriori power calculation for this study indicated that the linear modelling approach with additive genotypic effects for our sample size (n = 62) was sufficient for at least 80% power to detect SNP-based heritability of 13% or more at the relaxed alpha level of 0.05. SNP-based heritability estimates were approximated by genotype Vs outcome R 2 values from linear regressions. We also note that the Gene SMART cohort is a tightly controlled study with rigorous physiological measures, all performed in duplicate, which by itself significantly increase the power of the detected apriori.
All variants from the association tests were plotted in R using the tidyverse, ggplot2, and qqman packages. The script for the mitochondrial solar plot depicted in Fig. 1 has been adapted from Stephen turner's GitHub account 23 . Locus zoom plots were generated with the online locus zoom software using a compilation of the dominant and recessive nominal results from our association tests 52 . As the participants were all Caucasian, SNP linkage r 2 values were calculated with the HapMap CEU database. As intronic SNPs may affect genes far away, we termed genes within 200 Kb of the SNP of interest as "proximal" regardless of NEMP status.
We utilised the UCSC genome browser (hg19/GRCh37) to ascertain the genomic affect and therefore consequence of all statistically significant variants. The GeneHancer track was utilised to determine the regulatory element affect for each variant 53 . We chose to postulate molecular mechanisms for variants that were purely intronic and did not show affinity for epigenetic or transcription factor binding. However, these should be interpreted with caution and therefore we note that the variants in this category were found to be association based only and should be confirmed through replication analysis in larger exercise cohorts.