Introduction

The success of genome-wide association studies has fueled interest in genetic risk prediction of multifactorial diseases, such as type 2 diabetes, cardiovascular disease and non-familial cancers. The known contribution of genetic variants to the prediction of most diseases is still limited,1, 2 as the variants identified to date together explain only a small part of the heritability.3 Further research is needed to find out the extent to which genetic variants can improve the prediction of multifactorial diseases.

To investigate the potential predictive ability of genetic risk models, researchers are using modeling studies to quantify the area under the receiver operating characteristic (ROC) curve (AUC) as a measure of discriminative accuracy.4, 5, 6, 7, 8 These studies have demonstrated that hundreds of genetic variants are required to obtain an AUC of 0.70 when their effect sizes are small (odds ratio (OR) <1.2),5 and that the upper limit of the AUC is determined by the heritability of the disease and the population disease risk.5, 9 For example, when the heritability of the disease is 10% and the population disease risk is 20%, the maximum AUC value that can be obtained by genetic risk models will be around 0.80.5

The modeling methods published to date have similarities and differences in terms of input parameters, underlying assumptions and output produced. For example, all methods assume multiplicative joint effects of genetic variants, but to express the effect sizes of the variants some method use relative risks (RRs), whereas others use ORs as input data. These differences may impact the AUC and lead to different inferences about the predictive ability of genetic risk models, but this impact is not obvious as AUC is known to be an insensitive metric, unable to detect the contribution of significant risk factors.10 As it is unknown whether these differences between the modeling strategies affect conclusions on the predictive ability, we reviewed published modeling methods that intend to estimate the potential predictive ability of genetic risk models. We compared the input parameters, underlying assumptions and output, and investigated the agreement of estimated AUCs between the methods in several hypothetical scenarios. We also assessed the accuracy of estimated AUCs by attempting to reproduce the AUC values reported in several published empirical studies.

Methods

Analytical and simulation methods

We compared the five published methods that aim to investigate the predictive ability of genetic risk models by quantifying the AUC.4, 5, 6, 7, 8 The methods are referred in this paper by the name of the first author. Three methods use analytical formulas and two use simulations to obtain the AUC. First, the analytic method by Lu7 calculates the frequencies and likelihood ratios of all genotype combinations separately for cases and controls from the population disease risk, and the RRs and frequencies of all genetic variants. The AUC values are subsequently obtained from the distribution of likelihood ratios in cases and controls. Second, the analytic method by Moonesinghe4 obtains the AUC using a formula that requires RRs and frequencies for dominant or recessive effects of the variants. This method approximates the distributions of the number of risk genotypes for cases and controls by normal distributions that are subsequently used to obtain the AUC value. Third, the analytical method by Gail8 computes the RRs of all possible genotype combinations for the entire population and for cases, and uses these distributions to obtain the AUC. The simulation methods by Pepe6 and Janssens5 both first construct genotype data for individuals of a hypothetical population according to the frequencies of the genetic variants. Based on these data and the ORs, they estimate the disease risk, which is then used to obtain the disease status for each individual in the hypothetical data set. Using the estimated disease risks and disease status, the methods finally calculate the AUC value. These two methods differ in how the genetic effects of the variants are considered. The method by Pepe requires per allele frequencies and ORs to construct individual genotype data, and estimates disease risks using a logistic regression equation, whereas the method by Janssens can use per genotype, per allele or dominant/recessive effect of the risk allele to construct genotype data, and estimates disease risks using Bayes’ theorem.

We documented the modeling strategy, input parameters, assumptions and output. To ensure that all these items were assessed for all methods, a checklist of the documented items was made and the five methods were reviewed again. If an item was not explicitly mentioned, deductive reasoning was used to document it. For example, if a method constructed the combined effect of all genetic variants by multiplying the effects of each single variant, we recorded that the method assumed independent genetic effects. Data extraction was done by two researchers (SK, LCK) independently and discrepancies were discussed with a third researcher (ACJWJ).

Table 1 presents an overview of the modeling strategy, input parameters, assumptions and output of the methods. To obtain AUC values, the methods use different input parameters. All methods require effect estimates and frequencies of the genetic variants included, but the effect sizes of genetic variants have to be entered differently. The method by Lu can handle ORs and RRs, whereas the methods by Pepe, Janssens and Gail require ORs and the methods by Moonesinghe requires RRs. All but two methods require an estimate of the population risks, and the simulation models additionally need a specification of the population size.

Table 1 Overview of input parameters, assumptions and output of the modeling methods

All methods assume that (i) the combined effect of the genetic variants on disease risk follows a multiplicative (ie, log-additive) risk model; (ii) genetic variants inherit independently, that is, no linkage disequilibrium between the variants; (iii) genetic variants have independent effects on the disease risk, which indicates no interaction among variants. Furthermore, if methods need to convert allele frequencies into genotype frequencies, they additionally assume that all genotypes and allele frequencies are in Hardy–Weinberg Equilibrium. Two methods assume that the disease is rare. Finally, the methods differ in how genetic variants need to be included. Two methods assume per allele (additive) effects of the risk allele, one assumes that the effects vary between genotypes and one assumes dominant or recessive effects of the risk alleles. The fifth method does not make any assumptions about the genetic effects and allows these to vary between the variants considered.

We had selected methods that obtain the AUC as a measure of predictive ability, but most methods can obtain other predictive measures of (genetic) risk models as well. Moonesinghe's method provides a formula to specifically calculate the AUC, but all other methods can be used to obtain other plots and metrics as well, such as risk distributions and predictiveness curves. The simulation methods can be used to compare risk models by, for example, reclassification measures.

Data analysis and data generation

To investigate the agreement in estimated AUCs, we applied the five methods in various hypothetical scenarios. Scenarios were defined as any combination of (i) the number of genetic variants included, chosen to be 10 or 50; (ii) the OR of the risk allele, set to 1.1, 1.4 or 2.0; (iii) the risk allele frequency, set to 0.05 or 0.25; and (iv) the disease risk in the population, set to 1 or 25%, as listed in Table 2. In these hypothetical scenarios, we assumed that all genetic variants had the same risk allele frequencies and ORs.

Table 2 Estimated area under the ROC curve for hypothetical values of the input parameters

To assess the accuracy of the estimated AUCs, we investigated whether the methods could accurately reproduce AUCs of published empirical studies (Table 3). We selected studies that assessed the AUC of genetic risk models and reported the ORs and frequencies of genetic variants included in the model. Population disease risks were taken from the empirical studies listed in Table 3 or from other epidemiological studies if they were not listed in the original paper. As random factors, such as rounding of values and random deviations from Hardy–Weinberg Equilibrium, may have impacted the empirical AUC, we conclude that the methods accurately reproduce the empirical studies when the predicted AUC is similar to the empirical AUC, but not necessarily exactly the same.

Table 3 Estimated area under the ROC curve for input parameters from published empirical studies

As the methods differ in how genetic variants need to be entered in the method, as per allele, per genotype or dominant/recessive effects of the risk alleles, transformations were needed when the specified and required frequencies and risk estimates did not match. Specified values of the frequencies, ORs and population risk were used to construct a (3 × 2) genotype by disease status contingency table, from which all required frequencies and risk ratios (OR/RR) were calculated. Hardy–Weinberg Equilibrium was assumed to obtain genotype frequencies when allele frequencies were specified.

For the simulation methods, genetic variants and disease status were constructed for 100 000 individuals and all simulations were repeated 100 times to obtain robust estimates of the AUC. Presented AUC estimates are averages of the 100 runs. All analyses were performed using software written in the R language (version 2.12.1).11 Extensive details together with the mathematical explanation of the five methods and the source codes or references to the source codes are provided in the Supplemental Material.

Results

Table 2 shows the estimated AUC values obtained by the five methods for the hypothetical scenarios. As expected, higher risk allele frequencies, higher ORs and larger number of genetic variants yielded higher AUC values for all methods. The differences in AUC between the methods were larger when the AUC values were higher; for example, when higher ORs or more genetic variants were considered. The AUC values calculated using the simulation methods were identical up to two decimals in most scenarios. The analytical method of Moonesinghe consistently produced lower AUC values than the simulation methods, particularly when recessive effects of the variants were assumed. The same results were observed when the risk allele frequency was 75%, with the exception that the recessive model estimated higher AUC values than the dominant model (data not shown). The analytical method of Lu yielded lower AUC estimates than the simulation methods when AUCs were higher (>0.80). The analytical method of Gail obtained similar AUC values as the simulation methods when the disease risk was 5%, but overestimated the AUC when the disease risk was 25%. Both the methods by Lu and Gail were unable to compute the AUC when the number of genetic variants was 50.

Table 3 presents the estimated AUCs for the scenarios that used the frequencies and ORs of genetic variants and population risks obtained from published empirical studies. The estimated AUCs using the simulation methods and the analytic methods of Gail and Lu were always consistent with those of the empirical studies, but the analytical method of Moonesinghe underestimated all empirical AUCs. When the number of genetic variants was 15 or higher, the analytical methods by Gail and Lu were unable to compute the AUC because of computer memory limitations.

Discussion

This paper provides a review of the five methods that have been proposed to investigate the potential predictive ability of genetic risk models by quantifying the AUC that can be expected. The five modeling methods use the same main assumptions, but they differed with regard to the modeling strategy. Estimates of the AUC differed between the methods when one or more variants had stronger effects and absolute AUC values were higher. The two simulation methods always obtained the same estimates and both accurately reproduced the AUCs of published empirical studies.

Modeling studies are used to estimate the potential predictive ability of genetic risk models on the basis of hypothetical epidemiological data. When the modeling is based on published ORs and frequencies rather than on hypothetical values for variants, some methods may be more flexible than others. If the coding of genetic variants differs between what is assumed in the method and what is published in the literature, transformations are needed. These transformations, such as converting the data into dominant/recessive effects of the risk alleles, may not be valid in reality and in our examples, and may explain the differences in estimated and published AUC values when transformations were applied, for example, for the method by Moonesinghe (Table 3). These transformations may have contributed to the differences in AUC values between the methods.

Although the methods share similar assumptions, they differ in the way the AUC is obtained. Some details in the calculation can be considered as limitations of the methods. For example, the analytic methods by Gail and Lu are not able to obtain the AUC for larger number of genetic variants because they calculate the frequencies of all possible combinations of the genotypes. As the number of combinations grows exponentially with increasing number of variants, at one point these methods reach the limits of computer memory. Using a computer with a 2.33-GHz processor and 2-GB RAM, we observed that the AUC could not be computed when the number of variants exceeded 14. When the number of genetic variants is larger, Gail’s method can still be used by assuming a log-normal distribution of RRs for the genotype combinations in the population. Another example is that most methods assume the variants to have either per allele, per genotype or dominant/recessive effects, rather than allowing the effects to differ between them. Most empirical risk prediction studies these days consider weighted risk allele counts (weighted risk scores) when the number of variants is large, which is similar to the assumption of per allele effects. Assuming per genotype effects is more flexible, as it simultaneously expresses the per allele effects or dominant/recessive effects of the variants. Yet, solely assuming dominant/recessive effects of risk variants may not adequately express allelic effects and hence explain why the method of Moonesinghe underestimated the AUC values when assuming recessive effects. Even though AUC is known as an insensitive metric,10 these differences in assumptions about the genetic effects had substantial impact on the observed AUC value.

We reviewed five methods that estimate the AUC of prediction models. There are two other modeling approaches for the predictive ability of genetic risk models that we did not evaluate because they do not estimate AUC based on published epidemiological data of genetic variants, that is, on ORs and frequencies. First, in a theoretical paper on the predictive ability of multiple genetic variants, Pharoah et al12 described how genetic profiling yields a distribution of risk that can be useful for selecting high-risk groups in disease prevention. Second, Wray et al13 described three different models for genetic risk prediction that assume different underlying distribution of the disease risk in the population. The methods by Pharoah et al and Wray et al use the same assumptions as the five discussed methods, including a multiplicative risk model for joint effects and independent effects of genetic variants.

All methods are methodologically simple and use assumptions that are generally reasonable. They assume that the combined effect of the genetic variants on disease risk follows a multiplicative risk model with independent effects (ie, no statistical interaction terms are included in the model) and that genetic variants inherit independently. Inclusion of gene–gene and gene–environment interactions may further improve the predictive ability of the methods. Although all five methods might be improved by including these extensions, their performance so far seems adequate given current understanding of the joint contribution of genetic variants to the disease risk. Currently many empirical studies calculate weighted risk scores where the differences in the effects between risk alleles are acknowledged. Of the modeling studies, some explicitly obtain weighted risks scores,5, 6, 8 whereas others consider different effect sizes for risk alleles in other ways.4, 7

In conclusion, the five most commonly used methods for quantifying the AUC of genetic risk prediction models have similar assumptions, but differ with regard to the input parameters required and the AUC values estimated. The simulation methods yielded consistent AUC estimates and both accurately replicated published empirical AUC values. The simulation studies provide valuable insight into the potential predictive ability of genetic risk prediction.