Impact of analytical and biological variations on classification of diabetes using fasting plasma glucose, oral glucose tolerance test and HbA1c

Historically, diabetes is diagnosed by measuring fasting (FPG) and two-hour post oral glucose load (OGTT) plasma concentration and interpreting it against recommended clinical thresholds of the patient. More recently, glycated haemoglobin A1c (HbA1c) has been included as a diagnostic criterion. Within-individual biological variation (CVi), analytical variation (CVa) and analytical bias of a test can impact on the accuracy and reproducibility of the classification of a disease. A test with large biological and analytical variation increases the likelihood of erroneous classification of the underlying disease state of a patient. Through numerical simulations based on the laboratory results generated from a large population health survey, we examined the impact of CVi, CVa and bias on the classification of diabetes using fasting plasma glucose (FPG), oral glucose tolerance test (OGTT) and HbA1c. From the results of the simulations, HbA1c has comparable performance to FPG and is better than OGTT in classifying subjects with diabetes, particularly when laboratory methods with smaller CVa are used. The use of the average of the results of the repeat laboratory tests has the effect of ameliorating the combined (analytical and biological) variation. The averaged result improves the consistency of the disease classification.

Diabetes mellitus is a chronic disease characterized by impaired glucose metabolism. The hallmark of this disease is persistent elevation of glucose concentration in the blood 1 . Historically, diabetes is diagnosed by measuring plasma glucose concentration and interpreting it against recommended clinical thresholds, which are dependent on the state of satiety (fasting, post-oral glucose load or random) of the patient. More recently, glycated haemoglobin A1c (HbA1c) has been included as a diagnostic criterion by several professional bodies, including the American Diabetes Association and World Health Organisation 1,2 .
The advantages and disadvantages of the different diagnostic test have been well discussed 3,4 . These discussions tend to focus on the clinical performance or operational convenience of the tests without explicit consideration for the impact of analytical and biological variation on the classification of diabetes. Yet, biological and analytical variations of a test can have a large impact on the accuracy and reproducibility of the classification of a disease. A test with large biological and analytical variation increases the probability of the result of a patient falling further from his true homeostatic set point. This increases the likelihood of erroneous classification of the underlying disease state of a patient 5 . Indeed, it has been reported that only 70% of patients diagnosed with diabetes using oral glucose tolerance test were still classified as having diabetes when the test was repeated three weeks later 3 .
Accurate classification of patients is important for optimal clinical care. It also improves the selection of patients in clinical studies and sharpens the distinction between disease and control groups 5 . This in turn enhances the observed difference between disease and control groups. Occasionally, repeat testing is performed to resolve discrepant disease classification by different laboratory tests. However, there is currently a lack of consensus on how best to interpret the replicate results.
Through numerical simulations based on the laboratory results generated from a large population health survey, we examined (1) the impact of biological and analytical variations and between-laboratory method bias on the classification of diabetes using fasting plasma glucose (FPG), oral glucose tolerance test (OGTT) and HbA1c, and (2) the best strategy to interpret limited repeat blood testing to reduce the variation in result to achieve more accurate disease classification.

Methods
Study subjects. This study used data from the cross-sectional National Health Survey (Singapore) collected between 17 March 2010 and 13 June 2010. All participants provided written informed consent for further analysis of the collected data. This study design received ethics board approval (Medical & Dental Board, Health Promotion Board, ref: 005/2009) and its details are available in the official report. This study only involves statistical analysis of data previously collected from the survey and the methods described in this study were performed according to relevant local guidelines and regulation.
Briefly, the survey employed a two-phase sampling strategy. In the first phase, the geographical zones and residential dwelling units were stratified and selected to yield a representative dwelling type distribution. In the second phase, 7,696 individuals were randomly sampled from households identified in phase 1 and invited to participate in the survey. Ethnic minorities were over-sampled to achieve an ethnic composition of 30% Chinese, 30% Malays, 30% Indians and 10% others.
Of the 7,512 eligible individuals aged 18 to 79 years, 4337 participated in the survey (representing a participation rate of 57.7%). Only subjects without prior history of diabetes were included in the final analysis.  Blood tests. Blood samples from participants were collected after an overnight fasting of at least ten hours, using standard phlebotomy procedure. The OGTT was performed by orally administering 75 g of glucose (Trutol), and measurement of the plasma glucose concentration was repeated two hours later.
World Health Organisation definition of glycaemic status. Normal FPG was defined as <6.1 mmol/L while impaired FPG was defined as 6.1 mmol/L to 6.9 mmol/L. Normal OGTT was defined as <7.8 mmol/L and impaired OGTT was defined as 7.8 mmol/L to 11.0 mmol/L. Normal HbA1c was defined as <6% while pre-diabetes was defined as 6% to 6.4%. Diabetes is defined as having any of the following: FPG ≥7.0 mmol/L, OGTT ≥11.1 mmol/L or HbA1c ≥6.5%.
American Diabetes Association definition of glycaemic status. Normal FPG was defined as <5.6 mmol/L while impaired FPG was defined as 5.6 mmol/L to 6.9 mmol/L. Normal OGTT was defined as <7.8 mmol/L and impaired OGTT was defined as 7.8 mmol/L to 11.0 mmol/L. Normal HbA1c was defined as <5.7% while pre-diabetes was defined as 5.7% to 6.4%. Diabetes was defined as having any of the following: FPG ≥7.0 mmol/L, OGTT ≥11.1 mmol/L or HbA1c ≥6.5%.
Analytical variation and bias data. The data for analytical performance were extracted from external quality assurance programs and the literature 3,6 . The analytical variation was defined as the analytical coefficient of variation  Table 2. Proportion of patients who are misclassified when laboratory testing for diabetes is repeated (World Health Organisation criteria). A between-laboratory positive bias of +7% and +2.5% were introduced for the fasting plasma glucose/oral glucose tolerance test, and HbA1c, respectively. A between-laboratory negative bias of −6% and −3% were introduced for the fasting plasma glucose/oral glucose tolerance test, and HbA1c, respectively.
(CVa) of the laboratory tests as reported in the external quality assurance programs and literature. The CVa for the plasma glucose and HbA1c measurements were 2.5% and 3.5%, respectively. On the other hand, between-laboratory bias for plasma glucose and HbA1c measurements were −6% to +7%% and −3% to +2.5%, respectively. For HbA1c, an additional level of CVa of 2% that could be achieved by certain laboratory methods was also examined. The CVa for plasma glucose is the same for FPG and OGTT since they share the same laboratory method.
Biological variation data. The within-person biological variation (CVi) data was obtained from the database curated by Ricos and her colleagues. The CVi for FPG, OGTT and HbA1c were 5.7%, 16.7% and 1.8%, respectively. The biological variation for plasma glucose and HbA1c were assumed to be similar in healthy subjects and patients with diabetes, as demonstrated previously 7 .
Statistical analysis. For the purpose of this study, the original laboratory results of the subjects derived from the National Health Survey were considered as the homeostatic set point (i.e. 'true values') of the subjects. They were used to assign the disease classification of the subjects according to the diagnostic criteria above.
To examine the reproducibility of the disease classification for each biochemistry test for diabetes, 10,000 random results were generated from a normal distribution, which incorporated the CVa and CVi, around the true value of each subject. This had the effect of simulating repeat testing of an individual patient. Each of the randomly generated results was then classified according to the diagnostic criteria above. The percentages of the  Table 3. Proportion of patients who are misclassified when laboratory testing for diabetes is repeated (American Diabetes Association criteria). A between-laboratory positive bias of +7% and +2.5% were introduced for the fasting plasma glucose/oral glucose tolerance test, and HbA1c, respectively. A between-laboratory negative bias of −6% and −3% were introduced for the fasting plasma glucose/oral glucose tolerance test, and HbA1c, respectively.
simulated results falling into different disease classifications for all the subjects were compared against the disease classification using the 'true value' of each subject and summarized.
In clinical practice or research setting, repeat testing is sometimes undertaken to resolve discrepant classification based on different laboratory test. Generally, the tests are repeated not more than three times due to operational, financial and ethical constraints. To examine the best strategy to interpret these replicate results, two and three random results were generated for each subject as described above. The average value of the simulated results was then used to determine the disease classification for each of the subjects.
Furthermore, a 'best-of-two' interpretation, where any two concordant disease classification produced by the three simulated results was taken as the final classification for the subject, was also examined. The above exercises were simulated for 10,000 rounds for each subject. The percentages of the simulated results falling into different disease classifications for all the subjects were compared against the disease classification using the 'true value' of each subject and summarized.

Results
In total 3326 subjects without prior history of diabetes were included in this study. The demographic characteristics of the participants are summarised in Table 1. The distribution of the FPG, OGTT and HbA1c results of these subjects and the dispersion (as represented by 95% confidence intervals) of the results around the WHO diagnostic thresholds are shown in Supplemental Figs 1-3. The density plots of the laboratory results by race are shown in Supplemental Figs 4-6. The correlation among the three laboratory tests (FPG, OGTT and HbA1c) in the study population is provided in Supplemental Table 1.
Overall, FPG had the most consistent classification of subjects with diabetes, and was followed by HbA1c, whose performance improved and became comparable to FPG when a smaller CVa of 2% was considered (Tables 2 and 3). On the other hand, FPG and OGTT were most consistent in classifying normal subjects. HbA1c was most consistent in classifying subjects with pre-diabetes. The presence of positive bias improved the consistency of the classification of subjects with diabetes but worsened the consistency of classification of normal subjects. By contrast, the presence of negative bias improved the consistency of classification of normal subjects.
The potential impact of such misclassifications on disease prevalence can be assessed by simply dividing the number of misclassified subjects with the original number of subjects within each diagnostic category ( Table 4). The prevalence of impaired glycaemia/pre-diabetes categories were most affected by the misclassified subjects due to a combination of a relatively high number of normal subjects being misclassified as being impaired glycaemia/ pre-diabetes and a relatively low number of subjects who were originally classified in that category.
When the laboratory tests for diabetes were repeated more than once, they improved the consistency of the disease classification over just using a single testing episode ( Table 5). The average of the results of three repeat testing generally had better performance over the average results of two repeat testing or the 'two of three' classification strategy. The notable exception to this was the classification of prediabetes using HbA1c, which was most consistently made by the 'two of three' classification strategy. When the 'two of three' strategy was used, there were rare occasions when the three repeated results were all classified differently (i.e. no concordant results).

Potential increase in prevalence caused by misclassified subjects
Original classification WHO criteria ADA criteria  Table 4. Impact of the subject misclassification secondary to analytical and biological variation alone on the potential increase in the prevalence of various diagnostic categories. This is calculated by using the dividing the number of misclassified subjects with the original number of subjects within each diagnostic category.

Discussion
This study provided an additional dimension to the discussion on the choice of laboratory test for identifying patients with glycaemic disorders in the community setting. The impact of biological and analytical variation on diabetes classification is related to the distribution of the laboratory results in the population examined as well as the diagnostic thresholds applied. For example, for the diagnosis of prediabetes/impaired glucose tests, the classifications for the different tests under the WHO diagnostic criteria were relatively comparable. This was because the ratio of the diagnostic interval, defined as (upper limit of diagnostic threshold -lower limit of diagnostic threshold/lower diagnostic threshold) to the combined biological and analytical variation was comparable between the three tests ( Table 6).
On the other hand, when the ADA criteria were applied, the diagnostic intervals were widened for FPG and HbA1c while the OGTT remained unchanged. Hence, the ratios between the diagnostic interval to the combined variation for FPG and HbA1c were close to each other. This was reflected by the relatively comparable consistency in classification of subjects with prediabetes. By contrast, the OGTT had significantly lower ratio of diagnostic interval to combined variation, which was accompanied by much lower consistency in classifying subjects with prediabetes.
From the results of this study, HbA1c has comparable performance to FPG and is better than OGTT in classifying subjects with diabetes, particularly when laboratory methods with smaller CVa are used. Interestingly, some groups propose the testing strategy where a positive HbA1c test near the diagnostic threshold should be confirmed by OGTT, which is more likely to classify a patient erroneously.
For laboratory tests with small CVi, the CVa should be equally small so as not to increase the overall variability (noise) in the results. As a general rule of thumb, a CVa to CVi ratio of 0.75 is considered the minimum analytical requirement and a ratio of 0.25 is optimal 8 . At the former CVa specification, the analytical imprecision will add 25% of variability to the overall test result variability while the latter will add 3% variability. Because of the very tight CVi for HbA1c, the routine laboratory methods are unable to meet the stringent analytical requirement. Nevertheless, as shown in this work, choosing a HbA1c laboratory method with smaller CVa that is currently routinely available can improve the diagnostic performance considerably. Alternatively, CVa can be reduced by repeat testing on the same blood sample 5 .
Nonetheless, for single testing episodes, impact of the misclassification due to biological and analytical variations alone is perhaps better measured by the potential increase in disease prevalence. When assessed in this manner, the impaired glycaemia/ prediabetes category is most vulnerable to such misclassification, followed by  Table 5. Proportion of patients who are misclassified when laboratory testing for diabetes is repeated three times under different classification strategy.
diabetes. This can potentially have significant implication for epidemiological studies and classification of diabetes using FPG appears to be least affected by the random variations. By contrast, the presence of between-laboratory bias has different effect on different tests under different diagnostic criteria. In general, the ADA diagnostic criteria are more resilient than the WHO diagnostic criteria to the effect of between-laboratory bias. This study also sought to resolve the conundrum of interpreting the results of repeat laboratory tests. The use of the average of the results of the repeat laboratory tests has the effect of ameliorating the combined (analytical and biological) variation. The averaged result improves the consistency of the disease classification.
In theory, HbA1c should be the test that will most consistently classify subjects with diabetes given its narrow CVi and CVa (i.e. lowest total variability). However, another important factor in that may affect the diagnostic performance of a qualitative test is the distribution of data around the diagnostic thresholds. Once this was considered, FPG performed better than HbA1c. Because the performance of these biomarkers is dependent on the laboratory performance and the population examined, including different age and ethnic groups that may show significantly different result distribution 9 , they should be verified with local data to optimise decision making. Table 6. Ratio of diagnostic interval to the combined biological and analytical variation for World Health Organisation (WHO) and American Diabetes Association (ADA) diagnostic criteria. FPG = fasting plasma glucose, OGTT = oral glucose tolerance test, HbA1c = glycated haemoglobin A1c. Diagnostic interval is calculated by [(upper threshold − lower threshold)/lower threshold], e.g. for FPG = [(6.9-6.1)/6.1 × 100].