A statistical algorithm showing coenzyme Q10 and citrate synthase as biomarkers for mitochondrial respiratory chain enzyme activities

Laboratory data interpretation for the assessment of complex biological systems remains a great challenge, as occurs in mitochondrial function research studies. The classical biochemical data interpretation of patients versus reference values may be insufficient, and in fact the current classifications of mitochondrial patients are still done on basis of probability criteria. We have developed and applied a mathematic agglomerative algorithm to search for correlations among the different biochemical variables of the mitochondrial respiratory chain in order to identify populations displaying correlation coefficients >0.95. We demonstrated that coenzyme Q10 may be a better biomarker of mitochondrial respiratory chain enzyme activities than the citrate synthase activity. Furthermore, the application of this algorithm may be useful to re-classify mitochondrial patients or to explore associations among other biochemical variables from different biological systems.

With this background, our aim was to develop and apply an exploratory statistical procedure to assess muscle CoQ content and CS activity as biomarkers of mitochondrial activity evaluated by the analysis of MRC enzyme activities. After the initial statistical assessment, subpopulations of individuals displaying a high linear correlation coefficient among the different biochemical variables were identified.

Material and Methods
Patients. During the last 15 years, we have studied 448 muscular biopsies from patients suspected of mitochondrial disorders (age range 1 month-16 years; mean: 3.6 years). Results of both CoQ and CS were available in 447 samples. Of this population, 179 showed normal results for CoQ levels and all MRC enzymes and citrate synthase activities. Data were compared with those of a previously reported control population (N = 37; age range 2-16 years; average 9.2 years) 11 . Ethical issues. The  Biochemical studies. Muscle biopsies were taken and prepared according to standard procedures.
NADH:cytochrome c oxidoreductase (complex I + III), succinate:cytochrome c reductase (complex II + III), succinate dehydrogenase (complex II), ubiquinol-cytochrme C oxidoreductase (complex III), cytochrome C oxidase (complex IV) and CS activities were determined using described spectrophotometric methods 13,14 . Enzyme activity results were expressed as nmol/min* mg of protein and mUnits/CS Units. Total muscle CoQ levels were determined by reverse-phase high-pressure liquid chromatography (HPLC, Waters, MA, USA) with electrochemical detection (Coulochem II, ESA, MA, USA) (Montero et al., 2008). The CoQ values were expressed as nmol/ gram of total protein content measured by the Lowry method 15 .
Statistical methods. Pearson linear correlation coefficients were initially computed between MRC enzyme activities, CS activity and CoQ content in muscle homogenates from patients. Statistical significance was evaluated using p < 0.01. Calculations were performed using the R program (version 3.2.3). See, for example Ugarte et al. 16 .
Two statistical procedures were developed to further explore the correlations detected amongst the different biochemical variables. In particular, two algorithms were implemented to identify subpopulations of individuals in which a high correlation was reached (r > 0.95). 1) Agglomerative procedure: Initial linear axes were found using the robust algorithm developed by García-Escudero et al. 17 , which was specifically designed to detect linear clusters. Then, an iterative procedure was implemented by adding individuals to the initial axes until a fixed high correlation was achieved.
The procedure was specifically constructed as follows. Let X and Y be the variables of interest (that initially showed linear association), and let M (0) be the set consisting of the three nearest points to the simple regression line Y i = β 0 + β 1 X 1 + ε i (calculated by ordinary least squares) fitted over the individuals selected by the linear clustering method implemented by García-Escudero et al. 17 ; i.e., the three individuals with the smallest residuals ˆε = − Y Y i i are selected. Next, a "correlation loss measure" between the set M (0) and any other individual i is defined as ( 0) is the Pearson's correlation coefficient between the variables X and Y within the set i.e., i * is the individual with the smallest correlation loss. 2.
Step 1 is repeated while , where r * is the desired correlation to be reached. 2) Divisive procedure: Unlike the agglomerative procedure, there is no need to define an initial axis because all of the individuals are considered as starting points in this method. Then, an iterative procedure is implemented by deleting one case (individual) at each step until a fixed high correlation is achieved.
More explicitly, the procedure can be described as follows: Let M (0) be the set of all individuals in our target population. We define the "correlation gain measure" between the set M (0) and the individual i as, is the Pearson's correlation coefficient between the variables X and Y within the set that is, the individual with the largest correlation gain.

2.
Step 1 is repeated k times until , where r * is the desired correlation to be reached. Both algorithms were implemented in R (version 3.2.3). After the application of the two algorithms, subpopulations displaying high correlations were identified.

Results
Biochemical results in muscle biopsies in the entire cohort of patients together with our reference values are stated in Table 1. Primary data about CoQ values, CS and CIII activities are stated in Supplementary Table 1.
Pearson correlation coefficients and significance values of primary data are stated in Tables 2 and 3 (data were normalized to muscle total protein concentration, CoQ values and CS activity). CoQ correlation coefficients with all the MRC enzyme activities were higher compared with those obtained between CS and the other MRC enzymes (Table 2). When the CoQ concentrations and MRC enzyme activities were normalized to CS activity values, the highest correlation coefficient was observed between CoQ and complex II + III (Table 3). Then, in the whole population, we used CoQ as a normalizer of the CS and MRC activities. We saw that the MRC complexes that correlate more with CS activity were II, III and IV (Table 3).   Agglomerative and divisive algorithms. From the whole population (n = 447), observations with missing data and potential outliers were removed (ranging from 1 to 7 values depending on the variable). When applying the agglomerative procedure, three initial linear axes were used to find the corresponding clusters of individuals with a high correlation (r > 0.95) between the variables CoQ and CS. The cluster with the highest number of individuals was chosen. The divisive method identified essentially the same subpopulation (98% of individuals in common, n = 214) (Fig. 1). We started with MRC enzyme activities and their association with either CS or CoQ. In the agglomerative method, three initial linear axes were considered, except for CS and complexes I + III and II + III, in which the agglomerative algorithm did not provide any sensible results and only the divisive method was applied. We finally selected the population with the highest number of individuals and compared it with those cases selected by the divisive method, which offered only one solution. Most MRC enzyme data showed that, when correlated with CoQ, populations with a higher number of individuals were detected (Table 4), especially for CoQ-dependent enzymes (CI + III and CII + III). The percentage of cases sharing the same correlation (r > 0.95) using agglomerative and divisive methods was also higher for CoQ when compared with CS, except for complex III (Table 4).

Discussion
This is the first report to analyze CoQ and other MRC biomarkers in a large cohort of samples. We have developed a statistical algorithm to assess the feasibility of using both CoQ and CS as biomarkers for MRC activities.
After applying the first statistical approach to the different biochemical variables (Pearson single correlation test) across the whole cohort of patients, several observations were made: 1) Correlation of CoQ values with CS was high. 2) Correlation of CoQ values and MRC was strongest than correlation between CS activity and MRC (data normalized to total protein content). 3) By using other normalization strategies for MRC (with either CS or CoQ values), as expected, CoQ was highly correlated with CII + III when these activities were normalized with CS activity, and CS showed a high correlation coefficient with both CIII and IV activities when they were normalized to CoQ content. Thus, differences between the association of CoQ and CS with the different MRC activities were evident.
After these preliminary observations, the next step was to develop the algorithms to further explore these associations. We did not consider age as a potential confounding variable because it has been suggested that it is not related to the activities of most MRC enzymes and CS activities 18 .
The initial step of the agglomerative method looked for robust linear clusters to determine initial linear directions to start the iterative steps. Although it only provided a single linear cluster as the optimal solution, we decided to explore three different linear cluster solutions and applied the iterative steps to these three potential solutions. The cluster with the highest number of individuals was chosen. To further validate the agglomerative algorithm, we compared the final results with a divisive method, which offered a unique solution from the whole cohort of patients. Interestingly, the degree of agreement between the solutions provided by the two algorithms was very high (see Table 3). The algorithm was then able to provide subpopulations of individuals with a high linear correlation coefficient between the variables of interest. In our case, a single subpopulation seemed to be the most reasonable solution.
We chose a correlation coefficient value of 0.95 because it is remarkably high when we consider biological variables of this complexity. Notably, the number of cases where this correlation was detected was high, especially for CoQ and CS correlation, supporting the hypothesis that CoQ may be employed as a marker for MRC activity normalization. Therefore, the normalization of MRC activities to CoQ seems advisable for a better classification of mitochondrial patients and it would be a good predictor of MRC alterations 10 .  From the data shown in Table 4, we observed, in terms of number of individuals displaying a correlation >0.95, that CoQ was a better marker than CS for CI + II, CII + III and CII activities, while CS was better for CIII and similar for CIV activities. Furthermore, in all cases (except for CS, CI + III and II + III), the degree of agreement between agglomerative and divisive methods was very high, supporting the usefulness of this new statistical approach. One explanation for the fact that agglomerative algorithm did not provide any sensible results only for CS and complex I + III and II + III activities is probably because both MRC complexes need CoQ for a proper electron transfer. Moreover, the measurement of these complexes, especially that of CI + III, is technically complex.
Although a biological explanation is difficult, the proposed supramolecular organization of MRC could illustrate why CoQ may be a better mitochondrial biomarker than CS. The individual MRC complexes (except    complex II) can assemble into different supercomplexes. Until now, the proposed supercomplexes consisted of the respirasome (complexes I, III, and IV), complexes I and III and complexes III and IV. Supercomplexes are presumed to be functional entities, which follow a fluidity model, and it is proposed to modulate respiratory chain efficiency and reactive oxygen species production 19 . By genetic modulation of interactions between complexes I and III and between complexes III and IV, it has been shown that these associations define a dedicated CoQ pool and organizes the electron flow to optimize the use of available substrates 20 . Additionally, defective mitochondrial enzymes that reduce CoQ such as Electron Transfer Flavoprotein Dehydrogenases (ETFDH) or depletion of mtDNA that induces defective respiratory complexes cause CoQ deficiencies 21 . It is then expected a high correlation between CoQ levels and respiratory complexes I, II and III.
In conclusion, we have developed a new algorithm for exploring associations among complex biological variables. In this specific example, we demonstrate that CoQ may be used as biomarker for MRC activities. Hence, its routine determination in the research of mitochondrial diseases seems advisable. Potential applications of the agglomerative method may be the re-classification of patients according to CoQ values, which might lead to a better understanding of mitochondrial disorders. Furthermore, this algorithm may be employed to identify populations displaying high correlations among other biological variables.