Introduction

Genome-wide association studies (GWAS) have identified hundreds of gene polymorphisms associated with common diseases, however, every effort to explain the heritability of a disease by single nucleotide polymorphisms (SNPs) detected in GWAS has been failed1,2,3. Wellcome Trust Case Control Consortium et al. reported a genome-wide association study of copy number variations (CNVs) for eight common diseases in 2010 and they concluded that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases4. Because efforts have largely focused on common genetic variants, one hypothesis is raised that much of the missing heritability is due to rare genetic variants2,5. However, it has not yet reported that a large part of the heritability of a disease is accounted for by rare variants. Although many papers have reported the contribution of a set of variants to heritability by the quantitative genetic analysis, there has been no paper discussing about the estimation of a heritability of a single polymorphism. Here I describe a novel method to calculate heritability of an individual polymorphism including a SNP or a CNV.

Results

Definitions and premises

  • The frequency of a risk allele in a general population: p.

  • The frequency of non-risk allele in a general population: q.

  • The frequency of a risk allele in patients: u.

  • The frequency of non-risk allele in patients: v.

  • The prevalence of a disease: P. Suppose frequencies of the risk and non-risk alleles of asymptomatic individuals are represented by x and y, respectively, then the following relationships are generated:

Odds ratio, OR, is represented by the following:

In the reports of case-control study, u, x and OR are usually shown and p can be calculated by using Equation [1]. When the data of p and OR are available in a SNP database, u or v should be calculated. It is impossible to have reasonable solutions of u and v using Equations [1, 2, 3]. Instead, they can be estimated by approximated solutions. First of all, calculation of genotype frequencies of the first-degree relatives is necessary for the estimation of heritability. For this purpose, Bayes’ method will be needed, because frequency of the risk genotype(s) of them should be calculated with a posterior probability. For these purposes the following definitions are needed.

  • A and a represent dominant and recessive allele, respectively.

  • The genotype frequency of AA for the proband: α.

  • The genotype frequency of Aa for the proband : β.

  • The genotype frequency of aa for the proband: γ.

  • The frequency of the risk genotype(s) of the general population: X1.

  • The frequency of the risk genotype(s) of the first-degree relatives: Y1.

The probability of each genotype for a sibling and an offspring is shown in Table 1. The probability of each genotype for a parent, that is same as for an offspring, is omitted here. The calculation procedure to have genotype probabilities were shown in the section of the methods.

Table 1 Probability of each genotype of a sibling and an offspring.

Then the calculations of the heritability of a polymorphism of the main subject are shown.

Heritability of a polymorphism under an autosomal dominant (AD) model

When genotypes AA and Aa have a same risk effect, Y1 of a sibling is calculated using the expressions in Table 1 as follows:

Y1 of an offspring is calculated as follows:

A relation between the arithmetic mean and the geometrical average indicates that there is a relation of YO1 > YS1 unless v equals to q.

Let us think about the incidence rate of the disease among the first-degree relatives, Q. When a polymorphism is involved in a part of the patients group, its share in the prevalence, P, is represented by the population attributable risk that is denoted by P(1–v/q) (Fig. 1A). Suppose that the risk allele of a polymorphism is the only genetic cause of a disease. For the first-degree relatives of the patients who do not have the risk allele the incident rate is not different from that in the general population. Therefore Q will be bigger than P by (Y1/X1 − 1) for the effect of this polymorphism (Fig. 1B). Then the incidence rate of the disease for a sibling, Qs, is represented by Equation [6], as follows:

Figure 1
figure 1

Schematic images of the prevalence of a disease in general population, P and the incidence rate of the disease for first-degree relatives, Q, for a polymorphism.

(A) P is represented by a circle. The area where the population attributable risk of a polymorphism, P(1–v/q), covers is applied gray. (B) Q is represented by the area where either the circle or the gray oval covers. Q is bigger than P by P(1−v/q)(Y1/X1−1). q: allele frequency of the non-risk allele for the general population. v: allele frequency of the non-risk allele for the patient group. X1: frequency of the risk genotype of the general population. Y1: frequency of the risk genotype of the first-degree relatives.

The incidence rate for an offspring, Qo, is represented by Equation [7], as follows:

Once Qs or Qo is estimated, the heritability of a polymorphism, hp2, is calculated by the Falconer’s liability threshold model6.

Heritability of a polymorphism under an autosomal recessive (AR) model

It is known that some polymorphisms show a recessive effect. If the risk allele of a polymorphism shows a recessive effect, frequencies of the risk genotypes of a sibling and an offspring, YS1 and YO1, are represented as follows, respectively:

In the recessive model, homozygote is the risk genotype. Therefore the proportion of patients who have the risk genotype in the holder of risk allele is represented by u2/(u2 + 2uv). The incidence rates of the disease among siblings and among offspring, if we consider only for the effect of the polymorphism are represented by next Equations, respectively, as follows:

Heritability of a polymorphism under other inheritance models

hp2 can be estimated for a polymorphism under any other inheritance models so far the frequency of the risk genotype(s) for the first-degree relatives can be calculated. If a polymorphism is located on an autosome and if the OR of heterozygote is smaller than that of homozygote, the hp2 of this polymorphism is smaller than hp2 under AD model and larger than hp2 under AR model.

Calculation of the heritability of two or more polymorphisms

Falconer’s method is based on the calculation of the “liability thresholds” for the prevalence of a disease in general population and for the recurrence rate in the first-degree relatives. Units of these measures are standard deviations and heritability is estimated by the difference of two measures6. The calculation of the heritability of two or more polymorphisms is possible. For this purpose second clause of Equation [6] or [7] for each polymorphism should be calculated and added finally to P.

Estimation of various CNVs and SNPs reported in the literatures

Most germline CNVs are heritable7. However, heredity form of a CNV is not always known. Furthermore a de novo CNV is sometimes identified in the association studies (3). The heritability of a disease has been often estimated by twin studies. Monozygotic (MZ) twins share all germline polymorphisms including de novo variants, whereas dizygotic (DZ) twins usually do not share a de novo polymorphism. Because heritability is calculated by a difference between the concordance rates of MZ twins and DZ twins, a de novo polymorphism should also be involved in the estimation of heritability in a twin study. When we estimate the contribution of a CNV to the heritability of a disease by Falconer’s model, the recurrence risk to hold the CNV for a sibling cannot be used theoretically because it may be a de novo CNV for the proband. On the other hand, the recurrence risk for an offspring can be used because all germline polymorphisms, including de novo ones, will be fundamentally transmitted to the offspring.

Table 2 listed various CNVs and SNPs reported in the literatures. The hp2 of these polymorphisms were calculated for offspring under the AD model. As shown in Table 2, CNVs generally have a larger hp2 (>0.01). A noteworthy result was that about 25% of the heritability of type 2 diabetes mellitus (T2DM) could be accounted for by one CNV, a value greater than the previously estimated heritability explained by all identified variants in GWAS published in 20128. Another noteworthy result was that about 15% of the heritability of schizophrenia could be accounted for by four CNVs, although this value was smaller than the previously estimated heritability (23%) explained by all identified variants in GWAS published in 20129. With regard to schizophrenia, it turned out that the hp2 of a CNV that was detected only in patients (OR = +∞) is large. The results in the analyses suggest that a large part of missing heritability of common diseases could be accounted for by a kind of CNVs. 15q13.3 microdeletions has been reported to be associated not only with schizophrenia but also with idiopathic generalized epilepsy (IGE)2,10. Although the accurate data of prevalence of IGE that contains several types of epilepsies could not be obtained, hp2 of IGE was estimated to be 0.13–0.15 (not shown in Table 2). CNVs have been suspected to be involved in the pathophysiology of neuropsychiatric conditions11 The results of trial estimation of the hp2 of a polymorphism suggest that CNVs might be the major genetic cause of neuropsychiatric disorders.

Table 2 Results of a trial to calculate hp2 of CNVs and SNPs using published data.

Comparison of the required number of polymorphisms to explain a heritability

Previous studies have estimated the heritability of sets of polymorphisms. Pawitan et al. showed how many variants were needed to explain a heritability of 0.4 in 200912. In order to confirm that the calculated results by using the method described in the present study are consistent with those generated using other approaches, the required numbers of genetic variants under the AD model to explain a heritability of 0.4, when the prevalence of a disease is 0.01, were estimated. In this estimation the additive effect of each hp2 was considered, in the other words, the “narrow sense” heritability was tried to be accounted for. The results by the method in the present study were shown comparing with those of Pawitan et al. in Table 3. The required number of genetic variants calculated using the median of the range of variants in a category was not different from their approximation for the same category except for the common variants of category 1.

Table 3 Various categories of variants and the number of variants to explain heritability of 0.4.

Discussion

The estimations of heritability of polymorphisms were mainly conducted for the SNPs that were found in GWAS1,2,3,12,13. It is thought that the heritability of common diseases is due to multiple genes of small effect size and that even qualitative disorders can be interpreted simply as being the extremes of quantitative dimensions, that is, by the quantitative genetic analysis14. Recent studies demonstrated the interaction effects and the collective effects of SNPs in quantitative genetic traits15,16,17. However, I discuss here the conventional quantitative analysis under the premise that there are simple additive effects of polymorphisms. In quantitative genetic analysis authors have assumed a latent susceptibility (or liability) that varies between individuals12. The liability can be due to genetic and environmental factors and heritability is defined as the proportion of the variance in liability due to genetic factors. For calculation of liability that is contributed by a SNP, OR of allele frequency or OR of risk genotype for a SNP is the fundamental factor for estimating the penetrance in the analysis12,13. Therefore when a SNP was detected only in patients (OR = +∞), the calculation is theoretically impossible in the quantitative genetic analysis. After all the quantitative effect of genes with a small effect size is being handled in the analysis and the participation of gene with such a large effect size (OR = +∞) is not assumed. Wellcome Trust Case Control Consortium et al. published in 2010 the estimation of heritability of common CNVs and they did not take into the consideration for the CNVs that were detected only in patients, either4. However, CNVs are sometimes detected only in the patients as shown in Table 2.

In this report a novel method to calculate heritability of a single polymorphism was shown. A trial to estimate the required numbers of genetic variants under the AD model to explain a heritability showed that the calculation results by using the method described in the present study are entirely consistent with those generated by a quantitative genetic analysis (Table 3). I did not introduce the penetrance in the calculation procedure but introduced the population attributable risk that would not be infinity when OR is +∞. By the method in the present report it was suggested that heritability of some CNVs are quite large when it was calculated under the AD model. The heredity form of CNVs is often unknown and only an OR of allele frequency for a CNV is usually available. Although by the calculation of heritability of CNVs only under the AD model, it was suggested a large part of missing heritability could be accounted for by re-evaluating the CNVs which have been already found and by searching novel CNVs with large hp2. The results of this study also suggest that CNVs might be the major genetic cause of neuropsychiatric disorders. In conclusion, CNVs were turned out to play important roles in familial aggregation of common diseases.

Methods

Calculation of genotype probabilities for a sibling

For the purpose of calculation of genotype probabilities for a sibling, an application of Beye’s method is necessary. An example of the calculation of genotype probabilities by Beye’s method for the father of the proband is shown in Table 4. As a result the posterior probability equals to the frequency of another allele (A or a) of the transmitted one (A) in the general population.

Table 4 An example of the calculation of genotype probabilities by Beye’s method when the genotype of the proband is AA.

Then the genotype probabilities for a sibling are calculated. The calculation procedure of the genotype probabilities for a sibling was shown in Table 5. In Table 5, P1 and P2 are the posterior probabilities of genotypes of father and mother, respectively and P3 is a conditioned probability of genotype of sibling. A joint probability is the product of F, P1, P2 and P3. The summation of joint probabilities for each genotype was shown in Table 1.

Table 5 The calculation procedure of the genotype probabilities for a sibling.

Calculation of genotype probabilities for an offspring

For calculation of genotype probabilities for an offspring the Beye’s method is not needed. The calculation procedure of the genotype probabilities for an offspring was shown in Table 6. The summation of joint probabilities for each genotype was shown in Table 1.

Table 6 The calculation procedure of the genotype probabilities for an offspring.

An example of calculation of heritability of a polymorphism

As an example of a common disease, let us choose schizophrenia. The prevalence, P, of schizophrenia is reported as 0.01. Here, CNV (16p11.2 dup) is chosen as an example of a polymorphism18. The frequency of a risk allele in patients, u, is 0.0039 and the frequency of a risk allele in asymptomatic individuals, x, is 0. Therefore p is calculated as 0.000039 using Equation [1]. By the way, P of schizophrenia (1%) is more than +2.32635SD of a general population. The mean distance from the median in the normal distribution is calculated as +2.6652SD for the patients. The incidence rate under the autosomal dominant model of the disease in a first-degree relative, if we consider only for the effect of the CNV, is represented by Formula [7]:

The incidence rate of schizophrenia is calculated as following:

This value can be used as a recurrence risk of the disease in first-degree relatives and is more than +2.25998SD. Then heritability (hp2) of CNV (16p11.2 dup) is calculated by Falconer’s liability threshold model and the result is as following6:

Additional Information

How to cite this article: Nagao, Y. Copy number variations play important roles in heredity of common diseases: a novel method to calculate heritability of a polymorphism. Sci. Rep. 5, 17156; doi: 10.1038/srep17156 (2015).