The role of control region mitochondrial DNA mutations in cardiovascular disease: stroke and myocardial infarction

Recent studies associated certain type of cardiovascular disease (CVD) with specific mitochondrial DNA (mtDNA) defects, mainly driven by the central role of mitochondria in cellular metabolism. Considering the importance of the control region (CR) on the regulation of the mtDNA gene expression, the aim of the present study was to investigate the role of mtDNA CR mutations in two CVDs: stroke and myocardial infarction (MI). MtDNA CR mutations (both fixed and in heteroplasmy) were analysed in two demographically-matched case-control samples, using 154 stroke cases, 211 MI cases and their corresponding control individuals. Significant differences were found, reporting mutations m.16145 G > A and m.16311 T > C as potential genetic risk factors for stroke (conditional logistic regression: p = 0.038 and p = 0.018, respectively), whereas the m.72 T > C, m.73 A > G and m.16356 T > C mutations could act as possible beneficial genetic factors for MI (conditional logistic regression: p = 0.001, p = 0.009 and p = 0.016, respectively). Furthermore, our findings also showed a high percentage of point heteroplasmy in MI controls (logistic regression: p = 0.046; OR = 0.209, 95% CI [0.045–0.972]). These results demonstrate the possible role of mtDNA mutations in the CR on the pathogenesis of stroke and MI, and show the importance of including this regulatory region in genetic association studies.

In general, these disparities could occur because mtDNA mutations in the CR may not be directly tied to any form of pathology, but could be capable of influencing mitochondrial function through changes in the number of copies, inducing profound effects on the expression of mitochondrial-encoded gene transcripts and related enzymatic activities (complexes I, III, and IV) 18,19 .
The ability of mtDNA mutation to influence in the development of CVDs is directly related to its prevalence and the severity of its impact on mitochondrial function. In addition, several studies have demonstrated that due to the differences in the prevalence of the main etiological factors between intra-and extracranial arteries, the effect of mtDNA mutations in individuals with stroke or MI could be different [20][21][22][23] . The main aim of the present study was to investigate the role of CR mtDNA mutations (fixed or in heteroplasmy) in two CVDs; stroke and MI.

Results
Analysis of fixed and heteroplasmic mtDnA mutations with stroke. A detailed matrix of all mtDNA positions analyzed in stroke cases and controls are reported in Supplementary Table S1 and the frequencies of fixed mutations found are showed in Table 1. The percentages of m.16145 G > A and m.16311 T > C were overrepresented in stroke cases (5.2% and 18.2%, respectively) comparatively to controls (1.9% and 9.7%, respectively). After correction for the effect of CV risk factors with significant differences between stroke cases and controls (hypercholesterolemia 24 (Table 1).
Stability analyses were performed to predict the impact of these mutations. Several measures as the number of hits in the mtDNA phylogeny, the probability of mutation, the frequency in the population database and the conservation index (CI) at nucleotide level, were calculated, and results are showed in Table 1. The results obtained revealed m.16145 G > A and m.16311 T > C as non-stable positions since they present a minimum of 37 hits in the phylogeny, a high probability of mutation, a high frequency of the variant in the population database (here denoted by minor allele frequency [MAF] > 5%) detected on m.16311 T > C or low-frequency (MAF 1-5%) in m.16145 G > A and a maximum nucleotide CI of 58% (Table 1). To infer about the impact of m.16145 G > A and m.16311 T > C on the stability of secondary structures of the mtDNA, a prediction of different structures with the Revised Cambridge Reference Sequence (rCRS) and mutant variant was performed. It seems that m.16311 T > C implies a conformational rearrangement, resulting in structure of Fig. 1 as the new predicted minimum free energy solution (−0.40 kcal/mol), causing a stability reduction of the region. No structural or thermodynamic differences were found for m.16145 G > A.
The distribution of the heteroplasmic positions between stroke cases and controls are reported in Table 2. Eighty-eight stroke cases (57.1%) and eighty-six controls (55.8%) presented point and/or length heteroplasmy (PH and LH, respectively), and no significant differences were obtained between groups. The most prevalent variant detected was a length heteroplasmy located in the poly-C tract of the HVRII (between positions 303-315 of the mtDNA), which was present in a 52% of stroke cases and in 46.7% of controls. Point heteroplasmies were found in six stroke cases and six controls, involving eight different positions of the mtDNA: 146, 150, 152, 185, 204, 16092, 16129 and 16399.
The analysis of stability performed to predict the impact of these heteroplasmic positions is presented in Table 3. In general, these mutations have a minimum of 16 hits in the phylogeny, were located in hotspots positions, have a high frequency of the minor variant in the population database and a low conservation index, indicating that these heteroplasmies have typical characteristics of non-stable positions. Morever, the frequency of the minor variant is relatively low ( Table 3).
Analysis of fixed and heteroplasmic mtDNA mutations with MI. MtDNA positions studied for MI cases and controls are available in Supplementary Table S1 and frequencies of fixed mutations are reported in  Table 4. The m.72 T > C, m.73 A > G and m.16356 T > C were more frequent in MI controls (12.3%, 49.3% and 3.3%, respectively) than cases (7.6%, 38.9% and 1.4%, respectively). When corrected for the effect of CV risk factors with significant differences between MI cases and controls (hypertension and hypercholesterolemia 24 ),  (Table 3). In order to predict the impact of these mutations, several measures were calculated to analyze the stability of each position, and results are showed in Table 4. The results obtained revealed that m.72 T > C, m.73 A > G and m.16356 T > C are non-stable positions since they present a minimum of 9 hits in the phylogeny, a high probability of mutation, a high frequency of the minor variant in the population database (MAF > 5%) for m.73 A > G and low-frequency (MAF 1-5%) for m.72 T > C and m.16356 T > C, and a maximum nucleotide CI of 79% (Table 4). Using the proposed previously method to predict the impact of these three mutations on the stability of secondary structure of the mtDNA, it seems that m.72 T > C, m.73 A > G and m.16356 T > C led to a folded structure with the same minimum free energy as the rCRS structure, which means that these mutations do not condition the stability of the region.
Classification of heteroplasmic positions in MI cases and controls is available in Table 2. One hundred twenty-five MI cases (59.2%) and one hundred and twenty controls (56.8%) presented point and/or length    www.nature.com/scientificreports www.nature.com/scientificreports/ heteroplasmy, being the length heteroplasmy located in the poly-C tract of the HVRII the most prevalent variant in both MI cases (54.03%) and controls (48.34%). In this analysis, point heteroplasmy was significantly more frequent in MI controls (n = 11; 5.21%) than cases (n = 2; 0.94%) (logistic regression: p = 0.046; OR = 0.209, 95% CI The stability analysis to identify the impact of these point heteroplasmic positions is presented in Table 3. All of them were considered non-stable positions. As previously stated, these positions presented a minimum of 12 hits in the phylogeny, were located in hotspots positions, have a high frequency of the minor variant in the population database and low conservation index at nucleotide level. No different trends were observed between stability of these positions in MI cases and controls. Morever, the frequency of the minor variant is relatively low ( Table 3).

Distribution of mtDnA mutations between haplogroups.
Haplogroup assignment was previously performed by Umbria et al. 24  The search of the mentioned positions in the updated mtDNA phylogeny -mit. Tree build 17 25 shows that with the exception of m.72 T > C and m.73 A > G, which can be observed in several European haplogroups such as HV   In the same line, note that all the individuals identified in this study with m.16356 T > C belongs to haplogroup U. Even though m.16356 T > C is found in different U subgroups (U2e3, U3a1c, U4 and U5b1). Hence, the analysis of the distribution of these mutations demonstrated that they were not associated with any particular mtDNA haplogroup. Therefore, these positions also could act as an independent genetic factor.

Discussion
In western countries, where the burden of CVD is growing due to effect of CV risk factors, several studies have already shown the strongly relation of the genetic factors. However, little is known about the role of mtDNA CR mutations in development of stroke and MI [7][8][9]11,[13][14][15]17 .
An association of several mtDNA alterations (fixed and in heteroplasmy) in the two diseases have been detected in the present study. As regards fixed mtDNA mutations, the set of mutations in stroke and MI cases was compared to controls and significant differences were found in the two diseases, reporting the m.16145 G > A and m.16311 T > C as a potential genetic risk factors for stroke, and m.72 T > C, m.73 A > G and m.16356 T > C as possible beneficial genetic factors for MI. It has been previously described that the CR mutations can be associated across multiple diseases, and that the same variant could had opposite effect (increase or decrease the risk) for two different diseases 9 . This finding would support our original hypothesis about the consequences that can affect the mtDNA mutations in the CR depending on the disease.
The CR variants do not act directly on the ETC affecting mitochondria bioenergetics or ROS generation; they may impact mtDNA transcription 18,19 because contains the main regulatory sequences for replication initiation and transcription 3 .
Transitions 16145 G > A and 16311 T > C seem to have a pathogenic role in stroke. The analysis of distribution of these mutations clearly showed that they are located in many different haplogroups and consequently these mutations act as haplogroup-independent risk factors. The m.16145 G > A is located between MT-TAS sequence (nt. 16157-16172) and MT-TAS2 sequence (nt. 16081-16138). According to the classic strand-asynchronous mechanism, recent studies demonstrated that the 5'end of the D-loop is capable of forming secondary structures 26 , which act as a recognition site to molecules involved in the premature arrest of H strand elongation 27 . The biological importance of this region was confirmed by Brandon et al. 28 who also observed multiple tumor specific mutations in the pre-TAS region. These observations suggest that mutations arising near to this conserved motive might be responsible of the alterations in mtDNA replication and transcription. In the same line, m.16311 T > C has been found to be significantly associated with certain types of cancer [29][30][31] . This mutation was previously described by Chen et al. 29 in patients with prostate cancer and also has been reported in colorectal cancer 30 and more recently in acute myeloid leukemia 31 . This mutation is located between the control elements Mt5 sequence (nt. 16194-16208) and the Mt3l sequence (nt. 16499-16506). In this case, our results showed that m.16311 T > C may implies a reduction in the stability of secondary structure of this region, which would affect in the binding grade to mtDNA transcription factors, ultimately affecting on the intensity of transcription regulation 32 . In both cases, these findings strongly suggest that mtDNA CR dysfunction may cause a decrease on the mtDNA copy number, which could affect the efficiency of ETC, lowering the ATP:ADP ratio and increasing ROS production 18,19 , contributing in stroke development.  www.nature.com/scientificreports www.nature.com/scientificreports/ Concerning MI, our results showed that m.72 T > C, m.73 A > G and m.16356 T > C act as a beneficial factor for MI. Although a high percentage of individuals with these mutations belonged to the haplogroups HV0, H or U, which have been shown to may have higher oxidative damage 9,24,33,34 , the distribution of mutations in positions 72, 73 and 16356 in our samples was independent of these haplogroups. Since the role of the mitochondrial genome in CVD susceptibility remains uncertain, it is difficult to explain how these mutations can decrease or counteract the progression of MI. Although m.72 T > C, m.73 A > G and m.16356 T > C have been previously related to certain types of cancer 35 , many studies consider that they are recurrent variants common in humans 36 .
Even though the most deleterious mutations are removed by natural selection, a wide range of milder bioenergetic alterations are introduced in certain populations 37 . Some of these variants as m.72 T > C, m.73 A > G and m.16356 T > C could be advantageous and seen as way to facilitate survival in specific environments. In contrary, other mutations as m.16145 G > A and m.16311 T > C escape of intraovarian selection and could cause significant mitochondrial defects and stroke development. Much of the progress in linkage disequilibrium mapping of complex diseases has been made using the major assumptions of the CDCV hypothesis, that is, that common alleles cause common diseases. After found positive associations with common alleles (e.g., those found by Umbria et al. 24 ), it was necessary to replicate the results and then look for rarer variants, with potentially greater penetrance. However, all the mutations analysed in the present study had a minor allele variant >5% (common variant) or between 1-5% (low-frequency). Although common variants are often associated with OR between 1.2 and 1.5 38 , our results showed higher effect size for pathogenic variants, with OR values of 2.4-4.4, and similarly, higher protective effect size of variants found at higher frequency in controls (ORs < 0.5) relative to cases.
Our findings also showed a significant increase of point heteroplasmy in MI controls in comparison to cases. This result is contrary to expectations, because the presence of heteroplasmy has been commonly associated with aging and degenerative diseases, due to a decline in mitochondrial function in both these processes 39 . Our registered heteroplasmic positions (16399, 16129, 16093, 16092, 73, 146, 150, 152, and 204) were located in hotspots positions of the hypervariable segments. Recent evidences demonstrate that an important fraction of mutations detected in heteroplasmy are germinal or originated in very early stages of the development 40 . Moreover, it is probable that germinal heteroplasmy has a beneficial or risk effect, and our results revealed that most of point heteroplasmy were overrepresented in MI controls individuals. This fact, is no surprise because some heteroplasmic positions detected, as m.73 G > A, have been linked as possible beneficial genetic factors for MI and also it has been suggested that other heteroplasmic positions, such as 146 T > C, 150 C > T or 152 T > C may increase longevity 41 .
Many studies have shown that heteroplasmic variants without apparent genetic or cellular functional consequences are observed in apparently healthy individuals [42][43][44][45] . In the present study, the frequency of MI conrols with point heteroplasmy in the CR (5.2%; 95% CI [0.045-0.972]) are slightly higher than those reported by Santos el at. 44 (3.81%; 95% CI [0.166-0.737]), but less than described by Ramos et al. 45 (7.9%; 95% CI [0.041-0.149]), demonstrating that heteroplasmy occurs with appreciable frequency in the general population 42 . This idea is even more reinforced in front of the present data, since the higher representation of point heteroplasmies detected were located in non-stable positions and by the fact that no significant differences were found between frequencies of point heteroplasmy in stroke cases and controls. www.nature.com/scientificreports www.nature.com/scientificreports/ It should be noted that this study has some limitations and further functional analysis may be required in independent large samples to clarify the contribution of mitochondrial control region variants to stroke and MI development. At the moment there are no experimental models that prove that alterations in these variants really have consequences. In fact, there are mtDNA regulatory sequences that have been assigned to control region although they remain practically unproved. In addition, it must also be taken into account that measurement of mitochondrial genetic variation in blood may differ from that in cardiovascular tissues. Another important point is that, although the population specificity of mtDNA exists, it has been observed that positions reported in this work are worldwide distributed (m.16145 G > A, m.16311 T > C and m.16356 T > C) or at least, in several Eurasian haplogroups such as HV and H (m.72 T > C and m.73 A > G) (http://www.phylotree.org/). For this reason, it would be interesting to explore if our results are extrapolable to other populations, since in a different genetic context, these positions might not present the same effects.

conclusions
In conclusion, the statistically significant differences in the frequencies of variants in the mitochondrial control region sequence between stroke and MI cases and controls at five positions provide new evidence and better understanding of the cellular mechanism by which mtDNA variants contribute to CVD, and endorse the importance of including this regulatory region of the mtDNA in genetic association studies.  46 in a cross-sectional, observational and descriptive study performed in Castile and Leon (centre-north region of Spain), whose design and analysis have already been described by Umbria et al. 24 .

Material and
For each individual, we also obtained information about history of hypertension (≥140/90 mmHg), history of diabetes, history of hypercholesterolemia (>200 mg/dl), cigarette consumption (smokers, former smokers and non-smokers), presence of overweight or obesity (body mass index ≥ 25 kg/m2), presence of high abdominal perimeter in risk range (risk: ≥80 cm for women and ≥94 cm for men) and presence of high levels of triglycerides (≥170 mg/dl). The study was approved by the ethical committee from Universitat Autònoma de Barcelona and appropriate informed consent was obtained from all the individuals.
MtDnA sequence analysis and heteroplasmy authentication. The mtDNA sequences used in the present study were previously obtained by Umbria et al. 24 although, they were strictly used to classify samples into mtDNA haplogroups. In brief, the control region of the mtDNA was amplified and sequenced between positions 15907 and 580 using primers and conditions described by Santos et al. 44 . Moreover, coding region phylogenetic mtDNA informative polymorphisms 7028, 11719, 12308, 12705, 13708 and 14766 were further analyzed as previously detailed by Santos et al. 47 , to improve haplogroup classification.
In the present study, sequences were reassessed and analysed at the nucleotide level to identify not only fixed mutations but also mutations in heteroplasmy. The alignment in relation to the to the revised Cambridge Reference Sequence (rCRS) 48 and the heteroplasmy detection were performed using the SeqScape 2.5 software (Applied Byosistems, Foster City, USA) considering a value of 5% in the Mixed Base Identification option. Only sequences with satisfactory peak intensity and without background/noise were considered. In this context, some samples were amplified and sequenced several times (using the same methodology described in Umbria et al. 24 ) to obtain accurate sequences to heteroplasmy detection. Moreover, additional analyses were performed in order to authenticate heteroplasmies.
The authentication of mtDNA heteroplasmy was performed following a similar strategy to that used by Santos et al. 44,49 . 1. PCR amplification and sequencing of the control region of the mtDNA. 2. To authenticate the results for samples presenting heteroplasmy in step 1, a second PCR amplification and sequencing were performed. 3. In addition, to exclude a possible contamination of the samples, an analysis of Short Tandem Repeat (STR) DNA profiling was carried out employing AmpFlSTR ® Identifiler ® PCR Amplification Kit (Applied Biosystems, Foster City, USA) following the manufacturer's protocol.
Thus, point heteroplasmic positions were accepted if they appeared in all the validation steps and no evidence of sample contamination was detected.
Levels of heteroplasmy were determined using the height of peaks in the electropherograms 44 . To calculate the average heteroplasmic levels, the results obtained for at least two sequence reads of each heteroplasmic position were used.

Data analysis. Statistical analyses.
To compare differences in the CR profile between cases and controls in both stroke and MI, all fixed and heteroplasmic mtDNA mutations were compiled into a matrix considering cases and controls analysed for each disease.
All fixed mtDNA mutations detected in cases and controls (present in a minimum of 10 individuals) were tested together by using a conditional logistic regression analysis (forward stepwise model), adjusting the association analysis for the potential confounding effect of CV risk factors previously detected by Umbria et al. 24 . The authors used McNemar's test or marginal homogeneity test to compare the frequency of sociodemographic, biochemical and clinical characteristics above mentioned between stroke and MI cases and controls (Table 5). Hypercholesterolemia was considered a CV risk factor with a potential confounding effect for stroke, while both hypertension and hypercholesterolemia were considered for MI samples. Therefore, Odds Ratios (ORs) and their 95% Confidence Intervals (CIs) were calculated adjusting for the effect of these risk factors in each disease. To compare the presence or absence of point and length heteroplasmy, a logistic regression analysis was used to correct for the effect of CV risk factors above mentioned 24 .
Finally, mtDNA mutations were revised to infer if they were located in haplogroup-defining positions (previously examined in Umbria et al. 24 ), or acts as an independent genetic factor.
Statistical analyses were performed using IBM SPSS ver. 22.0 (SPSS Inc.). All differences were considered significant at p < 0.05.
Hits in the phylogeny, population database and Conservation Index (CI). The stability of fixed mtDNA mutations and point heteroplasmic position were analysed as previously detailed by Ramos et al. 45 . The number of hits in the phylogeny for each position was compiled from the updated mtDNA phylogeny -mit. Tree build 17 25 -and from Soares et al. 50 . From these data, it has been possible to calculate the probability of mutation as the ratio between the observed and the total number of hits. An mtDNA position was considered a hotspot if the mutation probability was ten times higher than the expected mean value. In order to calculate the frequency of each variant for a particular nucleotide position, a database of 3880 mtDNA complete sequences was used. Sequences were aligned using Clustal W and formatted for further frequency analyses using the SPSS software. The nucleotide conservation index (NCI) was estimated only across reference sequences of different primate species (for the list of species and accession numbers see Supplemental Table S2). Sequences were analyzed using the same method previously mentioned 45 .
Structure prediction. Secondary structures were performed to understand the structural impact of the different variants found. Secondary structures for each position were generated from sequences (A-M) identified by Pereira et al. 26 . All sequences were submitted to the RNAfold web server (http://rna.tbi.univie.ac.at/cgi-bin/ RNAWebSuite/RNAfold.cgi) using default parameters for DNA secondary structures calculations. The minimum free energy prediction and base pair probabilities were used to estimate the implication in the molecule.

Data availability
The data that supports the findings of this study are within the paper and its Supplementary Material File.