MtDNA population variation in Myalgic encephalomyelitis/Chronic fatigue syndrome in two populations: a study of mildly deleterious variants

Myalgic Encephalomyelitis (ME), also known as Chronic Fatigue Syndrome (CFS) is a debilitating condition. There is growing interest in a possible etiologic or pathogenic role of mitochondrial dysfunction and mitochondrial DNA (mtDNA) variation in ME/CFS. Supporting such a link, fatigue is common and often severe in patients with mitochondrial disease. We investigate the role of mtDNA variation in ME/CFS. No proven pathogenic mtDNA mutations were found. We then investigated population variation. Two cohorts were analysed, one from the UK (n = 89 moderately affected; 29 severely affected) and the other from South Africa (n = 143 moderately affected). For both cohorts, ME/CFS patients had an excess of individuals without a mildly deleterious population variant. The differences in population variation might reflect a mechanism important to the pathophysiology of ME/CFS.

www.nature.com/scientificreports www.nature.com/scientificreports/ Clinically proven pathogenic mtDNA mutations are recognized as a cause of maternally-inherited disorders, with a minimum prevalence rate of 1 mutation in 5,000 (20 per 100,000) people 7 . Such mtDNA mutations frequently cause multisystem disorders with fatigue being prevalent in these patient groups 8 . Hundreds or even thousand of copies of mtDNA are present in a /cell, this is linked to the energetic demands of the cell. These copies can either be identical, a state called homoplasmy, or two or more species of mtDNA can be present in a state referred to as heteroplasmy. Thus, a possible approach is to investigate whether mtDNA mutations are present either at heteroplasmy levels sufficient to cause mtDNA disease or at sub-clinical levels that are too low to cause primary mtDNA disease, but are perhaps at sufficient levels to act as a risk factor or affect the course of a complex disease such as ME/CFS.
Beyond such recognised pathogenic mtDNA mutations, many studies have suggested a role for common mtDNA variants in complex diseases, with mtDNA variants either modulating susceptibility to a disease and/ or affecting the course of the disease, including those where fatigue is an important feature of disease 9,10 . While many studies have reported a significant association of specific mtDNA haplogroups with a number of complex disorders, there is often substantial disagreement among different studies examining the same phenotype.
Another possibility is that rare mtDNA population variants might have a role in the disease process, since rare variants are predicted to be more mildly deleterious as such variants are removed by purifying selection over generations 11 . This is supported by recent works 12,13 , using a computational tool, MutPred, that has been widely used in the mitochondrial context 14 . Given this it might be expected to see a greater number of rare/mildly deleterious variants in any given patient group 15 , which may implicate a role in the disease process.
In addition to acting as a susceptibility factor to disease, it has also been suggested that mtDNA variants might modify the course of common complex diseases 16 . In line with this a recent study of 193 ME/CFS patients and 196 age-and gender-matched controls, reported haplogroups J, U and H, as well as eight mtDNA SNPs are significantly associated with particular ME/CFS symptoms in patients (course of disease), but not with increased susceptibility to ME/CFS (onset of disease) 10 .
In the current study, using mtDNA sequence data from ME/CFS patients from both the United Kingdom (UK) and South Africa (RSA), we ask whether mtDNA population variants alter susceptibility to ME/CFS. Due to the maternal inheritance pattern of mtDNA, linking mtDNA variation to common complex traits has been recognized as a difficult task, with models used in nuclear association genetics proving unsuitable 17,18 . There have been a number of improvements in the design of haplogroup association studies over the last 10 years, such as the importance of a replicate cohort being recognized 15,19 . However, there are still many doubts as to whether this simple methodology is the correct approach to assess the role of mtDNA variation in complex disorders. Indeed, a number of prior studies have applied the haplogroup association approach in complex diseases where fatigue is an important clinical feature, such as multiple sclerosis 9,20 , but the results were inconclusive. Here, we applied an improved approach which focuses on variants predicted computationally to be mildly deleterious, most of which are rare. This approach takes advantage of the advances in bioinformatics 21,22 using the mtDNA-server 23 and MutPred tools 14,23 .

Results
Heteroplasmy analysis. To investigate the possibility that ME/CFS patients harbour clinically proven mtDNA mutations, either above the threshold required for mtDNA disease or at a sub-threshold level, the complete mitochondrial genomes of the (n = 89 moderate; 29 severely affected) patients from the UK and (n = 143) South African ME/CFS patients were analysed. Only two mutations associated with clinically proven mitochondrial disease were seen, m.3337G>A 24 in a patient and m.11778G>A 25 in a control participant, both in the South African cohort. However, these mutations were heteroplasmic at the 5% and 16.5% level respectively, and as a result these mutations are unlikely to have a phenotypic role. No other clinically proven mtDNA mutations were detected in any of the patient or control groups. It should be noted that such frequencies of low level pathogenic mtDNA mutations are entirely consistent with large population studies considering this question 26 , as well as prior studies on this phenotype 10,27 . Haplogroup distributions. Human mtDNAs can be assigned to one of several haplogroups. This traditional classification system was originally based on the presence or absence of one or a small number of likely benign polymorphisms 28 , rather than mutations with functional consequences. However, the advent of large-scale sequencing has led to the identification of a vast number of haplogroup-specific polymorphisms, allowing haplogroup classifications to be expanded into more and more subgroups 29 . To put our analysis into the context of prior studies, a simple haplogroup distribution analysis was performed.
To ensure a robust comparison between the groups from the UK and the RSA, only the frequencies of the nine haplogroups of European origin (HVUKTJIXW) from the RSA ME/CFS cohort were used, as the group from the UK (North East England) was of lower diversity. The haplogroup distributions for the UK and RSA cohorts -ME/CFS and control -are given in Table 1. Comparing the haplogroup distributions between the UK ME/CFS patients and controls with a Monte Carlo-based approach, no significant difference was observed (ns, p = 0.31). Similarly, there was no significant difference in haplogroup distribution between RSA patients and controls (ns, p = 0.55). Thus as in the work of Billing-Ross et al., mtDNA haplogroup was not seen to affect the susceptibility to ME/CFS 10 .
Network analysis. Functional networks were produced for the UK cohort (Fig. 1a) and the RSA cohort ( Fig. 1a). These networks represent all possible least complex phylogenetics trees, based on the variants included as described in Methods. Sequences that are very similar cluster together in smaller or larger nodes. Nodes are ordered along a phylogenetic tree with links indicating the variants by which a connected node deviated from another 30 . In both cohorts, controls were more abundant in nodes that were separated by several variants from www.nature.com/scientificreports www.nature.com/scientificreports/  (7) 14 (14) U 5 (17) 13 (15) 11 (17) 32 (17) 20 (20)  www.nature.com/scientificreports www.nature.com/scientificreports/ the central, haplogroup H-dominated nodes. These peripheral nodes frequently contained mtDNAs that could be assigned to haplogroups T, K and U, although as noted in the previous section, no significant haplogroup associations were found.
Individuals with a potentially mildly deleterious mtDNA variant. Haplogroup association studies are confounded by a number of factors, including population stratification, with this and other factors resulting in elevated type 1 error 18,31 . These limitations, coupled with the relatively low statistical power of haplogroup association studies 32 , underscore the fact that new methods are required for discerning the pathogenic and/or etiological role of mtDNA mutations in a common complex disorder 21,22 . We determined the number of individuals with mtDNA sequences containing a variant with a MutPred score over the 0.5 threshold, such variants being considered "actionable hypothesis" or candidates for having a functional effect 14 , here referred to as "mildly deleterious".
The number of controls and patients in each population that harbours none, one or more mildly deleterious variants are summarised in Table 2. Figure 2 illustrates these numbers as percentages of each group.
For the cohorts collected in the North East of England, 49 (55%) of the moderate ME/CFS patients were without a mildly deleterious variant. Comparing this result to the controls from the United Kingdom, only 17 (27%) of these individuals had no such variants. Conversely 45% of moderate ME/CFS patients had one or more mildly deleterious variants, compared to 72% of controls from the UK who harboured one or more such variants (see Fig. 2). These differences were found to be highly significant (p = 0.0008) with a Fisher's Exact test. This observation suggests that individuals with ME/CFS are less likely to have an mtDNA variant that is predicted to have a functional impact.
As mentioned above, as many as 25% of those with ME/CFS are severely affected, being house or even bed bound. A cohort of such patients was recruited in the North East of England (UK). Of the 29 severe ME/CFS patients 15 (52%) were without a mildly deleterious variant. Comparing these numbers with those of UK controls using a Fisher's Exact test, a significant difference (p = 0.03) was found. Thus even the most severely affected patients had fewer mildly deleterious variants than the controls. It is important to note that when the moderately and severely affected patients are compared using a Fishers exact test no difference is observed (p = 0.83).
The analysis was repeated in the RSA cohort, including only those ME/CFS cases that fall within the nine haplogroups designated as European (HVUKTJIXW) to ensure that differences in lineage diversity could not impact upon the results. Conducting the same analysis with the ME/CFS 143 patients from the RSA, there were 80 (56%) individuals with predicted mildly deleterious variants; considering the healthy controls from the RSA, 40 (41%) individuals had no such variants. That leaves 59% of controls from South Africa compared to only 44% of ME/ CFS patients who harbour one or more mildly deleterious mtDNA variants. A Fisher's exact test again found these differences to be statistically significant (p = 0.03). Our analyses thus show, for both UK and RSA cohorts, those with ME/CFS have fewer mildly deleterious variants than controls.

Discussion
We have investigated a possible role of mtDNA variants in ME/CFS. This was done firstly by reviewing the sequence data of each individual for the presence of clinically proven pathogenic mtDNA mutations associated with primary mitochondrial disease. We ruled out the possibility that previously identified pathogenic mtDNA mutations are contributing to ME/CFS, a result that was not unexpected 10,27 . Secondly, we considered the possible role of population variants in ME/CFS, with a focus on variants that are predicted to be mildly deleterious by computational methods. These variants are frequently rare at the population level resulting in an absence of population stratification, and thus in a lower chance of false association or type 1 error. Such false associations are believed to be common in mtDNA association studies applying the haplogroup association model 18 . The current approach has been utilized in two prior studies 21,22 that considered Alzheimer and cardiovascular disease respectively.
We compared the number of individuals in both the ME/CFS patient and control groups with variants classified as mildly deleterious, to the number of individuals without such variants. In both the UK and RSA cohorts, there was a significant difference between the patients and the controls, with the ME/CFS patients having a higher percentage of patients without a variant predicted to be mildly deleterious. Although surprising, this observation was seen in two independent cohorts, suggesting that these differences are not due to chance. One of the reasons for this observation might have been that deleterious variants confer disease susceptibility in only severely affected ME/CFS patients or modulate fatigue severity among ME/CFS patients. However, when comparing  www.nature.com/scientificreports www.nature.com/scientificreports/ patients that were severely affected with controls in the UK cohort, again ME/CFS patients had fewer variants predicted to be mildly deleterious. Furthermore, there was no difference in the number of patients with such variants between the moderate and severely affected groups. Taken together, these data suggest that our observation is not the result of a simple patient stratification effect, although additional replication cohorts of moderate and severely affected ME/CFS cohorts are needed to confirm this.

Number of individuals harbouring variants with MutPred scores >0.5 (% in brackets)
It has been proposed that those with ME/CFS might not have a problem with ATP production 33 , but rather with ATP utilization. Therefore, we consider this genetic difference in mtDNA variation as an accurate observation which may prove to have a biological relationship to the function and regulation of the OXPHOS system and subsequent wide-ranging immediate and downstream consequences on energy metabolism 34,35 . It should be considered that the wider homeostasis and/or responsiveness of the various elements of energy metabolism are affected in ME/CFS, rather than merely single "segments" of energy pathways such as ATP production.
A number of mitochondrial abnormalities in ME/CFS have previously been reported, indicating that mitochondrial dysfunction may play a role in the pathogenesis of disease, at least in a sub-set of patients 36 . Previously ME/CFS patients have been shown to have significantly lower mitochondrial function than healthy controls 5 . Detection of these mtDNA variants has the potential to be used as a tool for measuring mitochondrial dysfunction in ME/CFS. Another avenue of investigation in the mitochondrial field is copy number analysis. Mitochondrial DNA copy number (mtDNAcn) has been reported as an indirect representative of mitochondrial function and as a biomarker of disease with studies. As an example, in Parkinson's disease (PD), mtDNAcn was elevated in the pedunculopontine nucleus (PPN), which is a brainstem region associated with progression of motor and non-motor symptoms of PD 37 . Other neurodegenerative conditions in which alterations in mtDNAcn have been reported include Multiple Sclerosis. The results in this phenotype are conflicting, with some showing evidence for reduced copy number 38 , while others indicate an increase 39 . Additionally. MtDNAcn was shown not to be associated with fatigue status in Primary Sjögren's Syndrome 30 . Taken together these papers demonstrate that in heterogeneous diseases with a variable course small single time point studies will produce data that is difficult to interpret and is likely to conflict between studies. www.nature.com/scientificreports www.nature.com/scientificreports/ In conclusion, this is the first paper to demonstrate mitochondrial genetic differences between ME/CFS patients and controls. It also demonstrates the power of mtDNA analysis focused on variants likely to be of a functional effect to detect differences between case and control cohorts where the traditional haplogroup association method frequently fails to do so. Future studies need to include larger cohorts from multiple centres, within and between nations, with standardized sample handling. These studies need to take a multi-disciplinary approach linking genetics, including mtDNA copy number analysis and bioenergetics. Given the changing nature of the disease, longitudinal studies would seem to be essential to further understanding by allowing us to determine how mtDNA varation and mitochondrial dysfunction relates to fluctuations in symptom severity.
Methods ethical approval and informed consent. This study was conducted in two independent cohorts, one from England and the other from South Africa. All experimental protocols were approved by the corresponding intuitional committees, namely the CRN National Coordinating Centre (CRNCC) | NIHR Clinical Research Network (CRN) in England (UK ME/CFS -IRAS ID 221364) and the Health Research ethics committee (HREC) of the North-West University in South Africa (SABPA: NWU 00036-07-S6, CFS: NWU 00102-12). All study procedures carried out in accordance with relevant guidelines and regulations of Newcastle University and North-West University. All participants gave informed consent and were over the age of 18. patient cohorts. We used two well-characterised cohorts of ME/CFS patients, one from the North East of England (n = 89 moderately affected; 29 severely affected) and the other from South Africa (n = 143 moderately affected). Both cohorts met the Fukuda diagnostic criteria 1 . Potentially confounding causes of fatigue, including depression, were excluded in all patients. The two control cohorts were regionally matched and had been collected for two prior studies. The controls from the North East of England (n = 64) were used previously for a variant load study on Alzheimer's disease 15 . The control cohort from South Africa (n = 98) was comprised of healthy high school teachers assembled previously for a study on hypertension and diabetes 22 . sequencing. Sequencing of the ME/CFS samples from both the UK and RSA was carried out at Source Bioscience using a fluidigm technology. The sequencing methodology for the controls has been described previously for the UK controls 15 and for the RSA controls 22 . The reference sequence used in all datasets was the revised Cambridge reference sequence (rCRS).

Network analysis.
After selective pruning of the total mtDNA variants to those in (a) protein-encoding genes with MutPred pathogenicity scores above 0.5, and (b) rRNA and tRNA variants, a functional maximum parsimony (MP) network analysis was performed with the NETWORK (version 5.0.0.3) software package (http:// www.fluxus-engineering.com/sharenet.htm). Transversions, being chemically less likely to occur, were weighted three times more than transitions. Star contractions were performed using a maximum radius of 5. This step was followed by reduced median (RM) 40 processing (reduction threshold r was 1). The "Frequency >1" criterion was activated to exclude sequences which are unique to the dataset. Network figures were produced using NETWORK Publisher (version 2.1.1.

Data analyses.
Sequencing data was processed using the online mtDNA-server (mtdna-server.uibk.ac.at) 23 tool. With this tool, homo-and heteroplasmy variants were identified. The pathogenicity status of heteroplasmic variants were assessed using various clinical and online databanks e.g. MitoMap, and the application of accepted scoring criteria 41,42 . For all other analyses described below, only homoplasmic variants and those with a heteroplasmy level above 90% were used, thus only inherited and not somatic variants were considered. Haplogroups were assigned using the online Haplogrep 2.0 tool (haplogrep.uibk.ac.at) 43 . The control data from the UK is Sanger Sequence data, the processing of which is described in 44 . Classification and selection of mtDNA variants, as compared by group. Analysis using variants predicted to be mildly deleterious are less prone to the effects of population stratification because they typically analyse rare variants that are not stratified between geographical locations. The variants are assessed using the MutPred program, which assigns a "pathogenicity" score between 0-1 (an amino acid change with 0 is predicted to be perfectly benign). A score above 0.5 for an amino acid change is classified as an "actionable hypothesis" 14 . Variants with pathogenicity scores below the "actionable hypothesis" threshold (0.5) are considered less likely to be deleterious or to have an impact on protein function; instead, they are more likely to be common population variants. The inclusion of these more numerous but low-scoring, low-impact variants in the analyses could be problematic, especially because the resultant "noise" differs greatly among different population groups 45 . Therefore, as in some of our previous studies, we have not included variants scoring below 0.5 in the current analyses 22 . statistical analyses. All statistical analyses were performed using SPSS Statistics (Version 25), Prism (Version 23) or GraphPad (Version 7). Haplogroup distribution between ME/CFS patient cohorts and their corresponding control were performed using a Monte Carlo based approach, this methodology is part of the standard package but more accurate than the Chi Square estimation. Fisher's exact tests were utilised to compare the number of patients and controls in each cohort that have mtDNA variants with MutPred scores above 0.5 (mildly deleterious) with those who do not.