Introduction

The term mitochondrial cytopathies describes a clinically heterogeneous group of metabolic disorders characterized by anomalies in both mitochondrial ultra-structure and the functioning of oxidative phosphorylation. The term “mitochondrial myopathy” was used for the first time by Luft (Luft et al, 1962) to describe an adult case of nonthyroid hypermetabolism with both uncoupling of oxidative phosphorylation and abnormal mitochondria in the skeletal muscle. The term was later used for all clinical manifestations that showed alterations in the number or structure of mitochondria in skeletal muscle. It was rapidly extended to varied syndromes with myopathies associated with other neurologic signs (DiMauro et al, 1985; Morgan-Hughes, 1986; Wallace, 1992). In recent years, it has become clear that respiratory chain deficits can also appear as multivisceral patterns, involving any tissue. This has led to the concept of mitochondrial cytopathies or mitochondrial diseases (Wallace, 1993). More recently, genetic studies have shown that mitochondrial DNA mutations are associated with some of these pathologies (Holt et al, 1988; Inui et al, 1991; Kobayashi et al, 1991; Lauber et al, 1991; Lestienne and Ponsot, 1988; Noer et al, 1991; Reardon et al, 1992; Rötig et al, 1989; Shoffner et al, 1990; Wallace et al, 1988; Zeviani et al, 1991).

Diagnosis of mitochondrial cytopathies is usually based on the results of a number of different investigations, including clinical examinations, biologic studies, histo-enzymologic studies, molecular biology studies, and biochemical assays of oxidative phosphorylation complexes on muscle biopsies.

Our biochemical assays on crude muscle homogenates or on permeabilized muscle fibers resulted in a set of eleven enzymatic and polarographic parameters (Letellier et al, 1992). A combination of some of these parameters, mainly for standardization, by citrate synthase (CS), succinate dehydrogenase (SDH) or protein concentration in a crude homogenate (CP), improves the overall understanding of the set. Nevertheless, except for clear-cut cases, analyzing the data using classical statistical methods (univariate analysis, regression analysis) remains complex. It is not always possible to unequivocally conclude whether or not a patient is affected by mitochondrial dysfunction. Indeed, with these methods of analysis, several problems make the diagnosis of mitochondrial cytopathies difficult. Firstly, in the field of mitochondrial diseases analysis, there is no possibility to perform the biochemical studies on an a priori defined control population. Usually, the determination of control values (mean and standard deviation) are performed on patients whom, after biochemical investigation, are found, a posteriori, to have “normal” respiratory complex values. This clearly implies a subjective choice of the patient repartition into two subgroups, affected and unaffected. Secondly, the small size of the population studied and the nongaussian distribution, including extreme values, makes the construction of confidence intervals difficult. Thirdly, the large number of variables studied on each patient's biopsy as well as sometimes apparently contradictory results render both analysis and interpretation problematic.

To solve these problems, we have used a multifactorial statistical approach, known as the principal component analysis (PCA) (Jolliffe, 1986, 1992) on the whole set of data obtained in our biochemical study to identify patients with respiratory chain defects. The objective of this method is to reduce the number of variables and, hence, the dimensionality of the data set, whilst preserving as much of the original information as possible.

In this study, PCA was applied to the biochemical data obtained from 35 muscle biopsies of children suspected of having mitochondrial disease. For each respiratory chain complex, this method makes it possible to distinguish different subgroups of patients: (i) a control subgroup, (ii) a subgroup deficient for the studied complex, and (iii) an in between, borderline subgroup. These three subgroups are separated by the “first principal component” (Jolliffe, 1986, 1992) allowing a quantification of the defect.

PCA was also applied to the whole set of biochemical data, and indicated that most of the affected patients have a multiple deficit in respiratory chain complexes, and that there is a possible new subgroup of patients constituted by newborns with high values in all of the complex activities. PCA also allows the detection of the most discriminating variables for each respiratory chain complex. Our results show that PCA allows a localization of patients into subgroups and could help clinicians make diagnoses.

Results

Table 1 describes the main clinical features of the 35 children from whom a muscle biopsy was taken to look for a mitochondrial defect. For each biopsy, polarographic and biochemical measurements were performed. These measurements are summarized in Table 2 (see “Materials and Methods”). For a better localization of the deficit, these measurements were split into three subsets of variables, which were representative of the activities of complex I, III, and IV (listed in Tables 3, 4 and 5). The PCA method was performed on each of these three subsets (Figs. 1, 2 and 3) and on the whole set of variables (Fig. 4).

Table 1 Case Reports and PCA Results
Table 2 Abbreviations
Table 3 Complex I Activity Data
Table 4 Complex III Activity Data
Table 5 Complex IV Activity Data
Figure 1
figure 1

Principal component analysis (PCA)results for complex I. □: Complex I deficiency; •: No complex I deficiency. DRESP I, D I, and DNCR indicate the fictitious patients built with the minimal values of Table 3 for Resp. I, ΔI, and ΔNCR respectively, the other variables taking the mean value calculated on the control population; DCPLX I indicates the fictitious patient built with all of the previous variables at their minimal values. MCP I indicates the fictitious patient built with the mean value of all of the variables calculated on the control population defined by the first principal component (comp.1) > 0.

Figure 2
figure 2

(A) PCA results for complex III. □: Complex III deficiency; •: No complex III deficiency. DRESP III, D III, and DSCCR indicate the fictitious patient built with the minimal values of Table 4 for Resp. III, ΔIII, and SCCR respectively, the other variables taking the mean value calculated on the control population; DCPLX III indicates the patient type built with all of the previous variables at their minimal value. MCP III indicates the fictitious patient built with the mean value of all of the variables calculated on the control population defined by comp.1 > 0. (B) Same as (A) without patients 4 or 15.

Figure 3
figure 3

PCA results for complex IV. □: Complex IV deficiency; •: No complex IV deficiency. DRESP IV and DCOX indicate the fictitious patient built with the minimal values of Table 5 for Resp. IV and COX respectively; DCPLX IV indicates the fictitious patient built with all of the previous variables at their minimal value, the other variables taking the mean value calculated on the control population. MCP IV indicates the fictitious patient built with the mean value of all of the variables calculated on the control population defined by comp.1 > 0.

Figure 4
figure 4

PCA results for all complexes. •: No complex deficiency; □: Deficiency in all complexes; : Deficiency in complex I or in complex IV only; : Deficiency in complexes I and III; : Deficiency in complexes I and IV.

The two first principal components (comp.1 and comp.2) represent 70% of the total variability for complex I, 65% for complex III, 75% for complex IV, and 54% for all complexes together. This allows us to use the two first principal components to describe the patient's data in a biplot (see “Materials and Methods”) (Figs. 1, 2, 3 and 4).

Projection of Patients

For each complex, the biplots (Figs. 1, 2 and 3) show at least two subgroups of patients, which can be assigned to a normal and an affected subgroup, when we consider the comp.1 value. In fact, negative comp.1 values correspond to low values of the original variables in Tables 3, 2, 3, 4 and 5, which are usually regarded as reflecting a defect in the corresponding complex. The affected cases, represented by squares in Figures 1, 2 and 3, are always concentrated in the left half-plane of the biplot (comp.1 < 0) while the control subgroup is concentrated in the right half-plane of the biplot (comp.1 > 0). Thus, comp.1 (the abscissa) clearly discriminates the cases with or without an oxidative phosphorylation deficiency. To make the localization of the patients in these two groups clearer, we built examples of fictitious patients (bottom of Tables 3, 2, 3, 4 and 5) harboring the minimal values found in Tables 3, 2, 3, 4 and 5 for one or several variables, and used the mean values calculated from the controls for the other variables (comp.1 > 0) (points D I, DRESP I, DNCR, D III, etc.). The examples of fictitious patients with all of the variables affecting a complex chosen to be the minimal values are indicated by DCPLX I, DCPLX III, and DCPLX IV in Figures 1, 2, and 3 respectively. All these examples of fictitious patients plot on the left-hand side of the biplot, and indicate the direction to which actual patients affected in the corresponding variable(s) will tend to plot. We have also plotted a “Mean Control Patient” with the mean values of all variables calculated from the control population (comp.1 > 0): MCP I, MCP III, and MCP IV in Figures 1, 2 and 3 respectively. As expected, “mean control patients” are projected on the center of the control cloud. Nevertheless, a few patients with comp.1 close to zero remain difficult to classify (patients 16 and 17 are typical in this respect). Such patients are considered borderline and are the most difficult to interpret based on biochemical and polarographic data alone. Therefore, for these cases, other investigations such as clinical or histo-enzymatic studies are necessary before the final diagnosis can be made. These borderline subgroups will be further examined in the “Discussion” section.

Projection of Variables

The arrows in Figures 1, 2 and 3 represent the original variables in the new plot. Their components indicate the extent of their participation in the construction of the new coordinates of the point representing a patient. That is, the higher the projection on an axis, the greater the participation in the corresponding coordinates of the point.

Figures 1, 2 and 3 show that the most discriminating parameters are the enzymatic determination for complexes I, III, and IV and their standardization. However, standardization by CS seems slightly less discriminating than that by SDH or CP. The same results are also obtained from regression analysis (applied to the same data), and the scatterplots obtained for the couples of variables observed (Δ I versus Δ I/CP, Δ I versus Δ I/SDH, etc., data not shown).

Moreover, it is also apparent that the results of the polarographic studies Resp. I, Resp. III, and Resp. IV play a rather negligible role for localizing a precise defect in the respiratory chain complex. Figures 1, 2 and 3 show that comp.1 of the corresponding vector, ie Resp. I, Resp. III, and Resp. IV, is always small; this means that polarographic studies are less discriminating to evidence a defect in a respiratory complex than the direct measurement of the complex activities themselves. This is confirmed by the fact that fictitious patients, with the minimal respiration values (DRESP I, DRESP III, and DRESP IV respectively), map near the “Mean Control Patient” for each complex.

Stability of PCA Results

To obtain an idea of the stability of our PCA results, k-fold cross-validation was used. The basic idea of k-fold cross-validation is that all but k observations are used for the analysis, although k observations are left out. Of course, one must repeat the analysis and compare the results for all Cnk subsamples of size n to k (where n is the sample size). With our data, for k = 1, 2 the results of PCA were practically the same for each of the complexes, both in the percentage of the variability explained and in the form of the biplots and the projections of the test sample observations to the plane spanned by the first two principal components. In the case of k = 2, one obtains C352=595 biplot representations. The coordinates (comp.1 and comp.2) of a given patient in all of these biplots showed a very small estimated dispersion, ie, a low variance (on average equal to 0.01, and at the most, equal to 0.13), confirming the great stability of PCA. We illustrated this stability in the case of the complex III biplot by removing the two patients who are representative of one of the greatest dispersions (4 and 15). The comparison of Figure 2a and b and shows no significant difference in the disposition of the 33 remaining patients.

Discussion

Oxidative phosphorylation analysis on biopsies is often difficult and sometimes prevents clear conclusions on the presence of a respiratory chain defect. Indeed, the use of classical statistical methods (regression models, confidence intervals, etc.) may present problems such as:

The difficulty to globally analyze, for each patient (or for each patient's complex), all of the measurements involved, including their standardization.

The difficulty in gaining access to a true control population for the same type of muscle. This leads one to extract, a posteriori, for each measured variable, a control population from the whole population studied. This operation is difficult, and more or less subjective.

The difficulty to construct confidence intervals for healthy values because of the small size of the extracted control subsample and its unknown distribution function.

To circumvent these difficulties, we used the PCA method. The objective of this method is to reduce the number of variables whilst preserving the original information as much as possible. This method makes it possible to represent the original variables and the individuals in a graph and, without a priori selection, to distinguish several subgroups in the whole population and to assess the discriminating power of the original variables. Another benefit of PCA is the fact that, in practice, it is not limited by the number of individuals and variables.

We applied the PCA method to a set of 23 variables obtained from the biochemical study of 35 children suspected to be affected by a respiratory chain defect. To clearly identify the localization of a hypothetic respiratory chain defect, this analysis was performed on three subsets of data characteristic of each respiratory chain complex (Tables 3, 4 and 5).

The results of these analyses show that, for each of the complexes, the first two principal components explain more than 65% of the total variability. For this reason, the projection on the plane spanned by the first two principal components explains the structure of the data with a good classification power. This representation, (Figs. 1, 2 and 3), allows the isolation of three populations (control, affected, and borderline) for complexes I, III, and IV of the respiratory chain. The affected cases are always concentrated in the same left half-plane and the abscissa clearly discriminates the respiratory chain deficit; the more negative the value of comp.1 (abscissa), the more pronounced the deficit.

The classification of patients for whom the value of comp.1 is in the neighborhood of zero sometimes raises a diagnostic issue. Indeed, these patients are difficult to classify by the examination of the individual original variables separately. In fact, these cases can be explained either by a low but still normal value of the complex or by a slight deficiency, or even by a physiologic compensation between several discordant variables. An advantage of the PCA method is to identify this borderline subgroup of patients, for which it is impossible to conclude a respiratory chain defect. In these cases, further independent investigations (histo-enzymology, electron microscopy, molecular biology, analysis of other tissues or cell lines, etc.) are essential to reach a final diagnosis.

Further analysis of the borderline cases evidenced, in some, a defect in another complex. This was the case for patients 16, 18, and 33, which were borderline for complex III but had a complex IV deficit; similarly, patients 16, 17, 23, and 25 were also borderline for complex III but had a complex I deficit; and patient 27, borderline for complex I, had a double deficit for complexes III and IV. This raises the question of associated defects that may be caused by (i) a pleiotropic mutation (eg, mitochondrial tRNA mutation or a deletion) that can directly but differently affect the activity of the respiratory chain complexes, or (ii) a mutation indirectly affecting the respiratory chain, eg, Menkes disease (Pedespan et al, 1999) or Friedreich's ataxia (Rötig et al, 1997).

We used the PCA method to analyze all of the variables together to see if it was possible to distinguish between the deficits in the different complexes, or if some diseases appeared as a particular subgroup. This analysis, given in Figure 4, shows that there is a clear discrimination between healthy (right half-plane) and affected (left half-plane) subjects along the abscissa; and that the patients affected in all complexes (5, 19, 20, 22, 28, 29, 30, 31, and 35) form a clear subgroup. Patient 27 seems to belong to this group, although it was impossible from Figure 1 to conclude that there was a deficit in complex I (borderline subgroup). This indicates that this patient could also suffer from a mild deficit in complex I along with the clear complex III and IV deficiencies. A third subset in the neighborhood of the origin (patients 4, 8,16, 17, 23, 24, 25, and 34) can be distinguished between the subgroup of healthy patients and those affected in all complexes. This subgroup involves patients affected by one or two complexes or mildly affected by all complexes.

Among the normal subjects, newborns seem to be concentrated in the region of the biplot characterized by the highest values of comp.1 (1: 5 months; 7: 8 months; 9: 2 months; 13: 5 months; and 15: 2 months). This corresponds to high values of the activity of the complexes, especially for complex IV. This subset could indicate a special mitochondrial pathology with early onset. This is also indicated by the fact that two of three Leigh patients (7 and 13) belong to this subgroup. However, the existence of such a subgroup needs to be confirmed with a larger number of patients.

Additionally, there does not seem to be any difference between males and females in Figure 4. However, this set of 35 patients includes only 8 females.

The representation of the measurements for all complexes in Figure 4 gives a good representation of all of the respiratory chain defects. Nevertheless, a more detailed diagnosis requires an analysis of the data complex by complex, as performed in Figures 1, 2 and 3, especially in the case of the third subset situated between the subset of all deficiencies and those without deficits.

A problem encountered in mitochondrial cytopathy studies is to reveal the most discriminating variables for the determination of a deficit. The orthogonal projection of the weight vectors (cf. biplot) on the abscissa, which is the most discriminating axis in our study, offers a solution: the greater the value of such a projection, the more discriminating the variable for the diagnosis. Analysis of the results shows that for all three complexes, the polarographic study is the least discriminating factor. This can be explained by the existence of a threshold in the expression of the enzyme deficiency on the respiratory flux. For instance, we have shown that as low as 30% of the cytochrome c oxidase (COX) activity is sufficient to maintain a normal respiratory rate (Letellier et al, 1992; 1993; 1994; Rossignol et al, 1999). Thus, it is difficult to demonstrate the presence of a deficit in a respiratory chain complex from the polarographic study. However, even if the polarographic measurements do not help to detect a respiratory chain defect, it does not imply that polarographic studies are of little value for the final diagnosis of a mitochondrial cytopathy. On the contrary, respiration rate measurements are the only way to detect uncoupled respiratory chain, coenzyme Q (CoQ), and cytochrome c deficiencies or to detect defects outside the respiratory chain (pyruvate dehydrogenase [PDH], ATPsynthase, etc.).

Standardization of the activities is often presented as a way to obtain better discriminating variables. The choice of the activity used for standardization has physiologic significance. Normalizing by CP means that the amount of mitochondrial complex activity is compared with the cell content. On the other hand, normalizing by CS or SDH means that the complex activity is compared with the content in mitochondria, based on a matrix or membranous activity. Figures 1, 2 and 3 show that the projection of variables and their normalization by CP and SDH are projected in the same area on the biplot. On the other hand, standardization by CS gives slightly different results, which can be explained by the wide variability of this activity in our hands. As a rule, all variables should be considered for analysis. Nevertheless, our PCA results show that the relative weights of the raw activities and their standardizations by CP and SDH are always greater than those of standardization by CS (when the first two principal components are used). It is to be noted that the same results are also obtained from regression analysis, scatter plots, and the analysis of correlations (data not shown).

Another contribution of the PCA method is in the detection of anomalous observations, which are projected far from the main cloud(s) of patients. Some of them may be because of transcription or coding errors, with severe consequences for the diagnosis, and can be corrected in this way.

In summary, the analysis of our data set, focused on the defects in respiratory chain complexes I, III, and IV, clearly differentiates between affected and unaffected subjects for these complexes (see Table 1). Our analysis also evidences a borderline subgroup of patients in between, which deserves further independent analysis. The consideration of all of the 23 variables together evidences a great number of patients with simultaneous deficits in all complexes, and a subgroup of newborns with high values of the variables. Other variables (eg, state 4 respirations, respiration on fatty acids, pyruvate dehydrogenase activity, ATPsynthesis, etc.) can be added to the original set of data and treated in the same manner to detect other types of deficiencies, particularly those outside the respiratory chain.

In conclusion, the advantage of PCA is to give an objective conclusion, which is difficult to obtain by classical statistical methods because of the great number of variables, the limited number of patients, the large variability of the data, and the absence of an actual control sample. The PCA method, particularly the result presentation on simple graphs allowing a localization of patients in subgroups, could help clinicians in making diagnoses.

Materials and Methods

Patients

Thirty-five children, suspected of mitochondrial disease, were selected at the Centre Hospitalier Régional de Bordeaux, France. The patients, 27 male and 8 female, ranged from 2 days to 14 years of age (mean age: 31.20 months; median age: 12 months). Patients were diagnosed by clinical examinations, biologic studies (especially blood lactate and pyruvate levels) and histochemical studies (Table 1). Control muscle biopsies were taken from patients who underwent muscle biopsy for neuromuscular symptoms but were ultimately found free of any mitochondrial disease. Muscle biopsy was taken from the quadriceps muscle under general anesthesia (with informed consent from the parents).

Biochemical Study

Oxygen consumption rate was measured polarographically at 30° C on permeabilized muscle fibers as described in Letellier et al (1992). The respiratory activities (Resp. I, Resp. III, and Resp. IV) were expressed in natom O/minutes/mg fiber.

The polarographic study was complemented by biochemical assays in which the activity of each respiratory chain complex was measured. The activities of the different respiratory chain complexes were evaluated by their capacity to transfer electrons from a natural substrate of the chain, NADH, succinate, ubiquinol, or cytochrome c (II), to an electron acceptor, oxygen, ubiquinone, or cytochrome c (III). The choice of electron donor and acceptor and the utilization of specific inhibitors made it possible to restrict the transfer of electrons to a single part of the respiratory chain. Substrate oxidation rates or acceptor reduction rates were measured spectrophotometrically. The whole set of experiments and the methodologies and abbreviations used are summarized in Table 2. All of the enzymatic activities were measured on a crude muscle homogenate and expressed in μmoles of product formed per minutes and per gram of tissue (Letellier et al, 1992).

To take into account a possible heterogeneity of mitochondrial distribution in the different biopsies, the results are normalized by CP, CS, and SDH activities. The biochemical data characterizing each patient are summarized in Tables 3, 4 and 5. The zero values of ΔI measurements in Table 3 (patients 2, 5, 29, and 30) are due to the insensitivity to rotenone inhibition revealed in the calculation of the final ΔI value; thus, these values are not in contradiction with the nonzero values of I+III measurements, but are the indication of an alteration, which will appear in Figure 1.

PCA

PCA is concerned with interpreting the variance-covariance structure through a limited number of linear combinations of the original variables. Its general objectives are dimension reduction and ease of interpretation. To this purpose, the design of the new variables as linear combinations of the original ones (the principal components) is such that the first new variables will preserve most of the information contained in all of the original ones. If we restrict ourselves to the first two principal components, which, in our case, maintains more than 54% of the variability (see “Results”), the results may be represented graphically by a biplot. The biplot makes it possible to project both the original variables and the transformed observations onto the plane spanned by the first two principal components. The original variables are represented by arrows, which indicate the proportion of the variability explained in the first two principal components. The direction of the arrows shows relative “loadings” of the original variables on the first and second principal components, ie, the participation of the original variables in the construction of the first two principal components (for details see Jolliffe, 1986, 1992). PCA is performed according to the usual method (see for example Jolliffe, 1986, 1992 with S-Plus v. 3.2 (Venables and Ripley, 1994) for details on this software) using the data of Tables 3, 4 and 5.