Introduction

Cardiovascular diseases (CVD) remain the leading cause of mortality worldwide, claiming around 18 million lives annually1,2. The prevalence of atherosclerotic cardiovascular diseases (ASCVD), the most common CVD, is estimated to be approximately 6% among the Saudi Arabian population2,3, and a 2021 study reported that 50% of Saudi Arabians are hyperlipidemic4. The great health and economic costs associated with ASCVD are preventable by maintaining ideal blood lipids’ ranges3,5, 6.

Elevated low-density lipoprotein cholesterol (LDL-C) has been identified as a major risk factor for ASCVD, and attaining LDL-C treatment targets has successfully shown major cardiovascular (CV) risk and mortality reduction7. Therefore, LDL-C is considered the primary treatment target in all clinical practice guidelines for dyslipidemia management, which categorize CVD and adjust their treatment recommendations on the basis of LDL-C levels8,9,10. Moreover, LDL-C is a screening parameter for dyslipidemia among those with family history of genetic dyslipidemias, as well as chronic diseases with high CV risk11.

In light of this, it is not inexplicable to foresee the dire consequences of obtaining an inaccurate LDL-C level; an underestimated LDL-C can lead to suboptimal treatment and losing LDL-C target attainment, while contrarily, an overestimated LDL-C can induce preventable therapeutic adverse effects and burdens health care resources5,6, 11, 12. The gold standard reference measurement method for LDL-C is the direct beta quantification (βQ), which is based on ultracentrifugation for separating apolipoprotein-B (ApoB)-containing particles13, yet, it is a laborious, costly, slow process, and is not always convenient for routine laboratory use14,15.

Several years of endeavors to assure the accuracy of LDL-C estimation have resulted in the derivation of various equations as an alternative to the ßQ method, which was initiated by William Friedewald in 197216. The Friedewald equation has been the most widely utilized in medical laboratories, yet it has some limitations. First and foremost, the equation does not account for interindividual variances in the triglyceride (TG):very-low-density lipoprotein cholesterol (VLDL-C) (TG:VLDL-C) ratio17; it is based on a presumption that this ratio is constantly 5:1, when in fact, it is highly variable through different TG and non-high-density lipoprotein cholesterol (non-HDL-C) levels14,17; the Lipid Research Clinics Prevalence Study reported a mean TG:VLDL-C ratio that ranges from 5.2 to 8.9 18. Therefore, it is invalid at TG levels > 4.5 mmol/L19; particularly observed in diabetes mellitus (DM) and metabolic disorders20. Some studies have also questioned its accuracy with TG levels < 4.5 mmol/L and at optimal TG levels (< 0.80 mmol/L) 20,21,22. Additionally, it overestimates LDL-C levels in type III hyperlipoproteinemia with high VLDL23, and inaccurately estimates LDL-C in alcoholic liver cirrhosis and nephrotic syndrome24. Furthermore, it requires a fasting state, because the cholesterol-poor chylomicrons can overestimate VLDL-C in a non-fasting sample, which ultimately underestimates LDL-C22. Moreover, it can be inaccurate with low levels of LDL-C (< 2.4 mmol/L) 12, however, levels that low were not tested when the Friedewald equation was derived16. The Friedewald equation’s shortcomings have been tolerable for years, but nowadays, LDL-C reduction became easily sustainable with the evolution of lipid lowering regimens; levels < 2 mmol/L are seen with proprotein convertase subtilisin/kexin type 9 (PCSK9) inhibitors combined with statins25, and with the emerging small interfering RNA (siRNA) therapeutic agent (Inclisiran), LDL-C < 0.65 mmol/L was achieved26, which necessitates improving the accuracy of LDL-C estimating methods.

To overcome those setbacks, other equations have been formulated, some of which have also relied on a fixed factor for VLDL-C estimation, while others have not17,27,28,29,30,31,32. Of the most reliable ones are Martin–Hopkins and Sampson-NIH equations which were shown to improve the accuracy of LDL-C estimation as compared to Friedewald’s 33. However, the Friedewald equation is still being adopted in the majority of laboratories33.

Previous studies have shown that utilizing one equation to universally and equally predict LDL-C in all populations is not successful12,34,35,36. Accordingly, the equations derived over the years were population-specific. In our previous study37, we have validated the use of other equations in the Saudi Arabian population, and some were shown to be unsuccessful despite outperforming Friedewald’s in other populations. This could be explained by the genetic variations and possibly the variability in lifestyle factors that could both contribute to the divergence in lipid profiles among different populations38,39. Indeed, the prevalence of consanguineous marriages and familial hypercholesterolemia is among the highest in the world in the Saudi Arabian population40,41. In an attempt to improve LDL-C estimation, we endeavored to derive an equation that is specific to our population.

This study has two aims: first, to derive and validate a novel equation for estimating LDL-C from the standard lipid profile in the Saudi Arabian population, and second, to compare the directly measured LDL-C to those calculated by Friedewald, Martin–Hopkins and Sampson-NIH equations.

Results

A total of 2245 subjects were included in the study. The majority of screened records were excluded; this was mainly due to the abundance of non-Saudi patients, the availability of lipid parameters tested on different dates, and the missing HDL-C levels in a large portion of subjects. Two-thirds comprised the derivation cohort (cohort 1; n = 1497), and one-third comprised the validation cohort (cohort 2; n = 748). The overall study population had a median age of 52 (21) years, and 61.7% were women. Both cohorts were homogenous in terms of age and lipid levels, while cohort 1 had more women (58.2%) than men (41.8%); p ≤ 0.001. The medians of the lipid parameters were: 4.92 (1.53) mmol/L for TC, 3.08 (1.36) mmol/L for measured LDL-C, 1.26 (0.48) mmol/L for HDL-C, and 1.44 (1.26) mmol/L for TG. Thirty seven percent of the samples were hyperlipidemic; of those, 36.1% had elevated LDL-C, 42.5% had hypertriglyceridemia, and 21.4% had mixed hyperlipidemia. The characteristics of the two cohorts are outlined in Table 1.

Table 1 Characteristics of the study population.

TG:VLDL-C ratio showed very high inter-individual variations in our dataset, with a minimum value of 0.22 and a maximum of 190. Figure 1A shows the distribution of TG:VLDL-C ratios in relation to TG concentrations and across non-HDL-C categories. At low TG levels (approximately < 2 mmol/L), TG:VLDL-C ratios showed some elevation with increasing TG, however, this observation was lost as TG increased, and the overall ratios were independent of non-HDL-C levels. This relationship was further explored after log-transformation of TG:VLDL-C ratio and TG (Fig. S1 in the supplementary file); the plot failed to explain a clear pattern for their relationship, which was also independent of non-HDL-C concentrations. Figure 1B confirms the independence of the TG:VLDL-C ratio on non-HDL-C. At high TG levels (marked by the darker dots), the ratio is no longer consistently increasing with TG concentrations.

Figure 1
figure 1

The distribution of TG:VLDL-C ratio: (A) in relation to triglycerides concentrations and color coded by non-HDL-C categories (darker dots imply higher non-HDL-C levels). (B) in relation to non-HDL-C concentrations and color coded by triglycerides categories (darker dots imply higher TG levels). TG and non-HDL-C categories are based on the ACC/AHA guideline classification.

Derivation of the equation

Data from cohort 1 was used in a multiple regression model to derive an equation for LDL-C calculation from the standard lipid profile, age, and gender that is independent of TG:VLDL-C ratio. There was a strong association between the measured LDL-C values and those predicted by the model (multiple correlation coefficient R = 0.97), and 94.5% of the independent variables collectively explained LDL-C concentrations (adjusted R2 = 0.945). The model significantly predicted LDL-C levels (p ≤ 0.001). The contribution of gender to the overall prediction was not statistically significant, and the prediction by the equation did not significantly alter by distinguishing corrective factors for each gender. Hence, due to the slightly more accurate prediction for LDL-C with using the 0.024 as a corrective factor for the female gender, and to simplify our equation’s applicability, we have used this factor all over the dataset.

The derived equation is:

$${\text{LDL}} - {\text{C}}^{{\text{D}}} \left( {{\text{mmol}}/{\text{L}}} \right) \, = \, 0.{224 } + \, \left( {{\text{TC }} \times \, 0.{919}} \right) \, {-} \, \left( {{\text{HDL}} - {\text{C}} \times \, 0.{9}0{4}} \right) \, {-} \, \left( {{\text{TG }} \times \, 0.{236}} \right) \, {-} \, \left( {{\text{age }} \times \, 0.00{1}} \right) \, {-} \, 0.0{24}{\text{.}}$$

Validation of the derived equation

Table 2 compares the means of LDL-CDM to those estimated by the equations. There was no statistically significant difference between LDL-CDM and LDL-CD in both cohorts (p≥ 0.05), while LDL-C estimated by the remaining equations showed a significant difference when compared to the direct method (p ≤ 0.001).

Table 2 LDL-C levels estimated by the equations in each cohort.

Bland–Altman analysis tested the agreement between the direct and calculated LDL-C by the equations. Figures 2 and S2 illustrate Bland–Altman plots; the middle line (light colored) shows the mean bias of each equation in both cohorts. The Friedewald, Martin–Hopkins, and Sampson-NIH equations underestimated LDL-C by approximately 0.24, 0.15, and 0.17 mmol/L, respectively, while equationD showed a lower bias that was close to zero; approximately 0.001 mmol/L (Table 3).

Figure 2
figure 2

Bland–Altman plots of agreement between the directly measured and calculated LDL-C by the equations in both cohorts. The lines demonstrate the mean bias and 95% limits of agreement (± 2SD). LDL-C low-density lipoprotein cholesterol, SD standard deviation.

Table 3 Bland–Altman analysis: mean bias with the limits of agreement (LOA).

The performance of the equations was further tested by calculating the magnitude of error for all LDL-C values which was assessed in the predefined LDL-C stratified groups. The percentage of samples by the magnitude of error between direct and calculated LDL-C stratified by LDL-C categories is outlined in Table 4. Upon focus on the validation cohort, and across all LDL-C categories, equationD estimated LDL-C with the lowest bias as compared to Friedewald, Martin–Hopkins, and Sampson-NIH equations, while Friedewald showed the lowest accuracy with the highest magnitude of error among all of them.

Table 4 The percentage of samples by the magnitude of error between direct and calculated LDL-C by guideline-classified LDL-C categories in both cohorts.

At LDL-C levels of < 1.92 mmol/L, equationD estimated LDL-C with the lowest error (≤ 0.24 mmol/L) in 85% of subjects as compared to 58% with Friedewald, 64% with Martin-Hopkins, and 61% with Sampson-NIH equations. Additionally, an error of ≥ 0.77 mmol/L in this group was seen with only 1.8% of LDL-C estimated by equationD, Martin–Hopkins and Sampson-NIH equations, which were lower than Friedewald’s (3.6%). In the mid-ranges of LDL-C levels (1.93–2.57, 2.58–3.22, and 3.23–3.86 mmol/L), equationD also predicted LDL-C with the lowest errors as compared to the other equations; approximately 70% of LDL-C were predicted with an error between 0 and 0.24 mmol/L, as compared to approximately 50% with Friedewald and 59% with both Martin–Hopkins and Sampson-NIH equations. Larger errors (> 0.24 mmol/L) among these ranges were also observed in a smaller percentage of LDL-C calculated by equationD as compared to the other equations.

At higher LDL-C levels (≥ 3.87 mmol/L), an error of ≥ 0.77 mmol/L was the lowest with equationD (3%), followed by Martin–Hopkins (4%), Sampson-NIH (8%), then with Friedewald (12%) equation which was associated with the highest error. Furthermore, the highest percentage of subjects with the lowest error (≤ 0.24 mmol/L) in these groups was also observed with equationD (53%), followed by Sampson-NIH (40%), Martin–Hopkins (39%), and finally Friedewald (37%) equations.

Moreover, the accuracy of the equations, as determined by their ability to correctly classify LDL-C according to guideline-stratified categories, was calculated in percentage. In both cohorts, and in all categories, except for LDL-C < 1.92 mmol/L, equationD showed the best classification for LDL-C among the equations, followed by Martin–Hopkins, Sampson-NIH, and finally Friedewald equations, which showed the least appropriate classifier for all LDL-C categories (Fig. 3 and S3).

Figure 3
figure 3

The accuracy of the equations in classifying LDL-C according to the guideline categories in: (A) derivation cohort and (B) validation cohort.

The magnitude of error by the equations was sub-analyzed in a group of samples with LDL-C levels ≥ 4.9 mmol/L (Fig. 4 and Table S2). Similarly, the lowest bias in LDL-C estimation was observed with equationD; 61% of subjects were estimated with an error of ≤ 0.24 mmol/L, as compared to 53.8% with Sampson-NIH, 48.7% with Martin–Hopkins, and 46.2% with Friedewald equations. Higher errors of ≥ 0.51 mmol/L were seen with only 10.3% of subjects with equationD, as compared to 18% with both Martin–Hopkins and Sampson-NIH equations and 25.6% with Friedewald equation. Moreover, the misclassification of LDL-C at this level was the lowest with equationD as compared to the other three equations (Fig. S4).

Figure 4
figure 4

The bias of the equations in estimating LDL-C ≥ 4.9 mmol/L (yellow colors denote the lowest errors of < 0.24 mmol/L).

Finally, the performance of the equations was assessed in different TG categories. Across all TG levels up to 5.63 mmol/L, LDL-CDM strongly correlated with the estimated LDL-C by the four equations, with a slightly higher correlation with equationD in most groups (Figs. 5 and S5-S10), which was also associated with the lowest mean bias (Tables S3 and S4). With TG levels ≥ 5.64 mmol/L, the lowest mean bias between the direct and calculated LDL-C was observed with equationD and Martin–Hopkins as compared to other two equations, which was slightly lower with Martin–Hopkins equation (0.01 mmol/L) in cohort 1, and with equationD (0.06 mmol/L) in cohort 2 (Table S4).

Figure 5
figure 5

The correlation between direct and calculated LDL-C across TG categories in the validation cohort.

The percentage of samples by the magnitude of error between direct and calculated LDL-C stratified by TG categories is outlined in Table 5. EquationD predicted LDL-C with the lowest magnitude of error in all TG levels among the four equations. At high TG levels (2.25–5.63 mol/L), equationD estimated 53% of subjects with a low error (≤ 0.24 mmol/L), as compared to 48% with Martin–Hopkins, 33% with Sampson-NIH, and 21% with Friedewald equations. Moreover, in this category, the highest error of ≥ 0.77 mmol/L was observed with Friedewald followed by Sampson-NIH equations in 28% and 11% of subjects, respectively. While only 5% of subjects were predicted with high errors with both equationD and Martin–Hopkins equation. At TG levels ≥ 5.64 mmol/L, Friedewald equation failed to predict any LDL-C levels with errors ≤ 0.24 mmol/L, and 77% of subjects were predicted with an error ≥ 0.77 mmol/L. The lowest error (≤ 0.24 mmol/L) was observed with equationD in 22% of subjects as compared to 11% with Martin–Hopkins and Sampson-NIH equations. While all of the equations were associated with higher errors at this level, equationD and Martin–Hopkins equation showed better predictive abilities than Sampson-NIH equation, which predicted 44% of subjects with an error of ≥ 0.77 mmol/L.

Table 5 The percentage of samples by the magnitude of error between direct and calculated LDL-C by guideline-classified TG categories in both cohorts.

Discussion

Early detection of dyslipidemia is the key for ASCVD prevention that also mitigates its associated health and economic costs. Accurate LDL-C estimation is fundamental to classify CVD and initiate the suitable therapeutic intervention, nevertheless, it remains a very common challenge in the medical laboratory12. Several efforts have been made to overcome the barriers of safely implementing the Friedewald equation in different high CV risk conditions, such as the metabolically compromised states with hypertriglyceridemia (TG > 4.5 mmol/L), in familial hyperlipidemia (LDL ≥ 4.9 mmol/L) with a high risk of premature myocardial infarctions as well as in very low LDL-C levels12.

Despite the various trials to derive an accurate LDL-C estimating method, Friedewald and others have formulated equations that relied on the concept that TG:VLDL-C ratio is constant; for example, a factor of 5, 6.85, and 6 have been suggested16,28, 32. This perception is the source of error in these equations. VLDL-C is highly variable; TG content within VLDL particles ranges from 50 to 70% while 10–25% is comprised of cholesteryl esters42, hence, assuming a fixed ratio is erroneous, and would never attribute to TG:VLDL-C ratio’s variability. Furthermore, the fluctuation of VLDL particle sizes also influences its lipid content and their relation to one another, which also limits relying on this ratio43. Moreover, VLDL-C can be inaccurately quantified in hypertriglyceridemic samples because the TG-rich lipoproteins (TRLs) are prone to sticking to centrifugation tubes, hence their possible false low levels, and LDL-C estimating equations based on a fixed or an adjustable factor that falls within a narrow range for TG:VLDL-C derived from these samples only maximizes the error43.

Based on the assumption that VLDL-C can be almost accurately calculated in fasting states, we calculated VLDL-C and found that using a “constant” factor to substitute “5” (or 2.2 when mmol/L is used) in Friedewald equation to reflect TG:VLDL-C in every individual is not possible, even with ideal TG levels. This factor was shown to be highly inconsistent between individuals; the mean TG:VLDL-C ratio in our study population was 6.03, and it ranged from 0.22 to 190. Figure 1 confirms the high variability of the ratio in relation to the status of TG and non-HDL-C concentrations; the two parameters that mostly reflect the core lipids that can alter this ratio17. The independence of TG:VLDL-C ratio on non-HDL-C was also observed in Martin’s 180-cell table of the medians of TG:VLDL-C ratios across different TG and non-HDL-C concentrations, and was also in-line with Sampson’s finding17,43. Ideally, higher VLDL-TG increases with mild TG elevation due to TG incorporation and storage within VLDL-C particles, which in turn causes its cholesterol content to appear lower, hence elevates the ratio. However, with further TG elevation, the loss of this observation might be explained by other factors affecting the TG:VLDL-C. Accordingly, we sought to derive a novel method for accurate LDL-C estimation that is independent of TG:VLDL-C. Direct LDL-C was quantified in our laboratory using a standard homogenous enzymatic method, which has been previously shown to be highly correlated with the ultracentrifugation (βQ) method45 and has been adopted as a practical gold standard for direct LDL-C estimation that is widely employed in research as the reference for LDL-C estimating equations12,46,47,48,49.

Using the standard lipid profile, age and gender, we were able to derive an equation that surpassed Friedewald, Martin–Hopkins, and Sampson-NIH equations in estimating LDL-C in our population. Overall, equationD showed the highest agreement with LDL-CDM among the four equations in both cohorts. It was associated with the lowest bias that was close to zero (0.001 mmol/L) as compared to Friedewald (0.24 mmol/L), Martin–Hopkins (0.15 mmol/L), and Sampson-NIH (0.17 mmol/L) equations. The largest percentage of subjects with the highest magnitude of error was observed with the Friedewald equation, followed by Martin–Hopkins and Sampson-NIH equations. On the other hand, a magnitude of error of > 0.24 mmol/L was the least observed with equationD as compared to the other three methods. According to the guideline LDL-C classification, the very-low LDL-C category (< 1.92 mmol/L) was estimated with an error of > 0.24 mmol/L in only 14% of subjects with equationD, as compared to 41% with Friedewald, 36% with Martin–Hopkins, and 39% with Sampson-NIH equations. Low and borderline LDL-C-ranges (1.93–3.86 mmol/L) were 22% more accurately predicted by equationD than Friedewald, and 12% and 14% better than Martin–Hopkins and Sampson-NIH equations, respectively. At levels ≥ 3.87 mmol/L, LDL-C estimated by equationD was also more accurate than the other equations by approximately 15%. Despite its association with the lowest errors among the equations, equationD still predicted a few samples with an error of > 0.24 mmol/L. This was observed in higher LDL-C categories as the number of samples dropped; which suggests that a larger sample might be required for the equation to be validated.

Furthermore, our equation’s accuracy has shown a lower misclassification of LDL-C according to the guideline categories. In all LDL-C classes, except for very-low, the misclassification was the lowest with equationD; around 70% of samples were correctly classified by equationD as compared to approximately 60–65% by the other equations. The slightly lower percentage observed with equationD, Martin–Hopkins, and Sampson-NIH equations as compared to Friedewald in the very-low class of LDL-C however, does not imply their lower accuracy; as the Friedewald equation better classified LDL-C in this class of LDL-C only, it was in fact associated with a higher amount of error. Thirty-eight subjects in this category were misclassified by our equation but not by Friedewald. A possible explanation is the smaller sample size in this category that actual percentages were not detected. Furthermore, estimating equations are prone to misclassify LDL-C at certain TG levels even with very low LDL-C32,50,51,52. This was observed in this category, but could not be ascertained because our data did not yield a sufficient sample with very high TG levels to be accounted for by the derived equation. Moreover, a small percentage of correctly classified samples by all the four equations was observed in the high LDL-C category, however, a clear explanation cannot be elucidated.

Our sub-analysis on the clinically established LDL-C cutoff for the diagnosis of familial hyperlipidemia (FH) (≥ 4.9 mmol/L) 53 showed that equationD predicted LDL-C with the lowest bias and it was associated with the least misclassification for this level among the other equations. This illustrates the impact of our equation in promoting primary cardiovascular prevention among those with genetic hyperlipidemia at high CV risk, which in turn aids in the reduction of their associated health and economical burden.

Linear regression showed a high correlation between LDL-CDM and those calculated by the equations in different TG concentrations up to 5.63 mmol/L. However, since these correlations do not guarantee better LDL-C prediction, we have looked at their absolute errors within TG-stratified groups. In TG levels up to 5.63 mmol/L, equationD was associated with the lowest error, followed by Martin–Hopkins, Sampson-NIH, then Friedewald equations. However, due to the overall very small number of subjects with TG > 5.63 mmol/L, a clear relationship between direct and estimated LDL-C could not be observed, and the performance of these equations at this TG level could not be validated.

The inaccuracy of the Friedewald equation and the superiority of others over it have been previously documented in other populations12,29, 54, 55, which is in line with our findings. In our previous study37, we have tested the performance of other equations in addition to Friedewald; Cordova, Hata, Puavilai, Chen, Ahmadi, Hattori, and Vujovic equations, and none have shown an error as low as equationD.

The impact of our equation on clinical outcomes and patients’ care is paramount, because the accuracy of LDL-C estimation is directly linked to the correct adjustment of ASCVD treatment, and therefore CV risk reduction. Large LDL-C under/overestimations can have life-threatening effects from treatment delays or unnecessary therapies, and our equation can overcome this matter among Saudi Arabians, even among FH patients.

Despite the worth of obtaining a population-specific LDL-C estimating equation for our population, its performance on non-Saudi Arabians cannot be currently ascertained. Alternatively, we recommend the current implementation of the Martin–Hopkins equation for non-Saudis in our laboratories due to its previous validation in different populations and its superiority to Friedewald’s. Furthermore, we recommend validating equationD in a larger dataset, in non-Saudis residing in our region, and in other populations to test its performance.

To the best of our knowledge, this is the first study to derive a population-specific equation for LDL-C estimation in Saudi Arabians. Moreover, our sample size is nearly five times larger than Friedewald’s and various other similar studies55. This study has some limitations. The LDL-C was directly measured in our laboratory using the homogenous enzymatic assay rather than the gold standard βQ method. Some homogenous assays could be associated with poor analytical performance in certain diseases. Moreover, they may not exhibit complete LDL-C specificity in the presence of abnormal lipoproteins, and its accuracy could sometimes be influenced by elevated TGs56,57. As shown by some studied, the homogenous assays might also be sometimes discordant with the ßQ method at low LDL-C levels58. Calculated VLDL-C levels can also reflect intermediate-density lipoprotein cholesterol (IDL-C) and lipoprotein (a) (Lp(a)), which constitutes a negligible amount of non-HDL-C, except in the very rare cases of elevated Lp(a) and type III dyslipidemia, which were not captured from our records, however, due to their rarity, these cases are not expected to alter our VLDL-C values and change the observed TG:VLDL-C ratio, and since our equation is independent of VLDL-C, it certainly does not impact its accuracy. Moreover, our study could not include a sufficient sample with TG levels > 5.63 mmol/L, hence, our equation was not tested in severe hypertriglyceridemia. Finally, a larger sample is required to validate its performance across all LDL-C and TG levels before its employment in clinical practice.

In conclusion, our novel equation outperformed other equations in estimating LDL-C across its levels, and in TG as high as 5.63 mmol/L, which can promote CV risk prevention and lessen the healthcare costs for those with familial hyperlipidemia. We recommend validating equationD on a bigger sample size, in other populations, and in severe hypertriglyceridemia before its implementation in clinical laboratories.

Methods

Study design and subjects’ selection

This is a cross-sectional, data-based, derivation and validation study that evaluated the accuracy of the Friedewald, Martin–Hopkins, and Sampson-NIH equations and derived a novel equation for LDL-C estimation in Saudi Arabians following up in King Abdulaziz University Hospital (KAUH), Jeddah, Saudi Arabia. Medical records were screened for LDL-C tested between 2009 and 2022. The inclusion criteria were: Saudi Arabian subjects of both genders, aged ≥ 18 years old, tested for complete fasting lipid profile. The origin of ethnicity was indicated in the demographics section of the medical record of each subject. The exclusion criteria were: subjects with non-fasting samples, missing lipid parameters, and subjects with lipid parameters tested on different occasions. Blood samples were not collected for lipid testing for non-fasting subjects; this was indicated on the lipid panel results for non-fasting samples in our laboratory. To augment the general applicability of our results, we included both normolipemic and hyperlipidemic subjects, both genders, and no TG level restrictions were applied. The total study population was randomly allocated to either a derivation or a validation data set; cohorts 1 and 2, respectively. Both cohorts comprised a wide range of LDL-C and TG levels. This study was performed in accordance with the Declaration of Helsinki principles. The institutional review board of King AbdulAziz University approved our study (Reference No. 314/22). An Informed consent was waived by the Research Ethics Committee (REC) of the Unit of Biomedical Ethics of King AbdulAziz University. This was mainly due to the nature of our study as we have used electronic-based data that posed no risks to the study subjects, in addition to the impracticality of obtaining consents for samples taken during the preceding years.

Data collection and biochemical measurements

Data extraction was performed electronically. The four major lipid parameters were obtained in mmol/L: total cholesterol (TC), TG, LDL-C, and HDL-C. Non-HDL-C levels were calculated by: [TC − HDL-C]. LDL-C levels were also calculated by the Friedewald equation as: LDL-CF (mmol/L) = [TC − HDL − TG/2.2] 16, by Martin–Hopkins equation (LDL-CM) as: [TC − HDL − (TG/adjustable factor)] through an excel-based automated calculator (available online through Johns Hopkins School of Medicine: https://ldlcalculator.com/) 17, and by Sampson-NIH equation (LDL-CS) as: (TC/0.948) − (HDL-C/0.971) − [(TG/8.56) + ((TG × non-HDL-C)/2140)–(TG2/16,100)]–9.44 43,44. For both Martin–Hopkins and Sampson-NIH equations, LDL-C was calculated in mg/dL, which was then converted to mmol/L. Since the samples were obtained from fasting subjects, where chylomicrons and their remnants are nearly non-existent, VLDL-C levels were calculated as: [TC − HDL-C − LDL-C] 59. Fasting blood sugar (FBS) and glycosylated hemoglobin (HbA1C) levels were also collected if possible.

LDL-C (hereafter referred to as LDL-CDM) was directly and quantitatively measured using the SIEMENS Dimension Vista® System. It is a homogenous method with a two-reagent format; detergent 1 solubilizes non-LDL particles only. The yielded cholesterol is then hydrolyzed by cholesterol esterase (CE) and oxidized by cholesterol oxidase (CO) in a non-color forming reaction. Detergent 2 then solubilizes the remaining LDL particles. LDL-C, in its soluble form, undergoes hydrolysis and oxidation by CE and CO, respectively, resulting in cholestenone and hydrogen peroxide (H2O2). Catalyzed by peroxidase, H2O2 gives color in the presence of N,N-bis (4-sulfobutyl)-m-toluidine, disodium salt (DSBmT), and 4-aminoantipyrine (4-AA) which is measured by a bichromatic (540, 700 nm) endpoint technique. The formed color is directly proportional to the amount of LDL-C within the sample. All other parameters were directly quantified in the central hospital laboratory using the same SIEMENS autoanalyzer Dimension Vista® System.

Statistical analysis

Normality of the tested variables was analyzed by Kolmogorov–Smirnov test. The variables deviated from the expected normality, hence were explained as median with interquartile range (IQR). The 2 cohorts were compared by Mann–Whitney and chi-square (χ2) tests. The American College of Cardiology/American Heart Association (ACC/AHA) guideline cutoffs were used to classify samples as those with elevated LDL-C, hypertriglyceridemia, and mixed hyperlipidemia, and for obtaining TG, non-HDL-C, and LDL-C categories used for the analysis60,61,62. We first looked into the distribution of TG:VLDL-C ratio across different TG and non-HDL-C categories as follows: TG levels were grouped into: < 1.69, 1.69–2.24, 2.25–5.63, and ≥ 5.64 mmol/L, and non-HDL-C levels into: ≤ 2.57, 2.58–3.22, 3.23–3.86, 3.87–4.51, 4.52–5.16, ≥ 5.17 mmol/L. Multiple regression analysis was done to derive a novel equation for LDL-C estimation; hereafter referred to as equationD, and equationD-derived LDL-C values are referred to as LDL-CD. Paired t-test compared LDL-CDM to those estimated by the equations.

Bland–Altman analysis was performed to detect the degree of agreement between the direct and calculated LDL-C levels (bias and limits of agreements (± 2SD) were calculated). The accuracy of the equations was determined by their ability to correctly classify LDL-C according to guideline-stratified categories of: ≤ 1.92, 1.93–2.57, 2.58–3.22, 3.23–3.86, 3.87–4.5, ≥ 4.5 mmol/L59. The differences between LDL-C values by the two methods was estimated as: Δ LDL-Ccalculated–LDL-CDM (mmol/L), and the proportion of subjects with magnitude of errors of < 0.13, 0.13–0.24, 0.25–0.50, 0.51–0.76, and ≥ 0.77 mmol/L stratified by the aforementioned LDL-C categories were calculated54. A similar sub-analysis was performed on a group of LDL-C ≥ 4.9 mmol/L. To further validate the performance of the equations across different TG levels, the absolute errors were also calculated in the TG-stratified groups. The correlation between direct and estimated LDL-C in these groups were assessed by a linear regression. Data analysis was done using IBM SPSS Statistics software version 28.0.0 (SPSS Inc., Chicago, IL, USA), and statistical significance was set at a P-value of < 0.05 for all tests.