Copy number variations play important roles in heredity of common diseases: a novel method to calculate heritability of a polymorphism

“Missing heritability” in genome wide association studies, the failure to account for a considerable fraction of heritability by the variants detected, is a current puzzle in human genetics. For solving this puzzle the involvement of genetic variants like rare single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) has been proposed. Many papers have published estimating the heritability of sets of polymorphisms, however, there has been no paper discussing the estimation of a heritability of a single polymorphism. Here I show a simple but rational method to calculate heritability of an individual polymorphism, hp2. Using this method, I carried out a trial calculation of hp2 of CNVs and SNPs using published data. It turned out that hp2 of some CNVs is quite large. Noteworthy examples were that about 25% of the heritability of type 2 diabetes mellitus and about 15% of the heritability of schizophrenia could be accounted for by one CNV and by four CNVs, respectively. The results suggest that a large part of missing heritability could be accounted for by re-evaluating the CNVs which have been already found and by searching novel CNVs with large hp2.

Genome-wide association studies (GWAS) have identified hundreds of gene polymorphisms associated with common diseases, however, every effort to explain the heritability of a disease by single nucleotide polymorphisms (SNPs) detected in GWAS has been failed [1][2][3] . Wellcome Trust Case Control Consortium et al. reported a genome-wide association study of copy number variations (CNVs) for eight common diseases in 2010, and they concluded that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases 4 . Because efforts have largely focused on common genetic variants, one hypothesis is raised that much of the missing heritability is due to rare genetic variants 2,5 . However, it has not yet reported that a large part of the heritability of a disease is accounted for by rare variants. Although many papers have reported the contribution of a set of variants to heritability by the quantitative genetic analysis, there has been no paper discussing about the estimation of a heritability of a single polymorphism. Here I describe a novel method to calculate heritability of an individual polymorphism including a SNP or a CNV.

Definitions and premises.
• The frequency of a risk allele in a general population: p.
• The frequency of non-risk allele in a general population: q.
• The frequency of a risk allele in patients: u.
• The frequency of non-risk allele in patients: v.
• The prevalence of a disease: P.Suppose frequencies of the risk and non-risk alleles of asymptomatic individuals are represented by x and y, respectively, then the following relationships are generated: Odds ratio, OR, is represented by the following: = / ( ) OR uy vx 3 In the reports of case-control study, u, x, and OR are usually shown, and p can be calculated by using Equation [1]. When the data of p and OR are available in a SNP database, u or v should be calculated. It is impossible to have reasonable solutions of u and v using Equations [1][2][3]. Instead, they can be estimated by approximated solutions.First of all, calculation of genotype frequencies of the first-degree relatives is necessary for the estimation of heritability. For this purpose, Bayes' method will be needed, because frequency of the risk genotype(s) of them should be calculated with a posterior probability. For these purposes the following definitions are needed. • A and a represent dominant and recessive allele, respectively.
• The genotype frequency of AA for the proband: α .
• The genotype frequency of Aa for the proband : β .
• The genotype frequency of aa for the proband: γ .
• The frequency of the risk genotype(s) of the general population: X 1 .
• The frequency of the risk genotype(s) of the first-degree relatives: Y 1 .
The probability of each genotype for a sibling and an offspring is shown in Table 1. The probability of each genotype for a parent, that is same as for an offspring, is omitted here. The calculation procedure to have genotype probabilities were shown in the section of the methods.
Then the calculations of the heritability of a polymorphism of the main subject are shown.

Heritability of a polymorphism under an autosomal dominant (AD) model. When genotypes
AA and Aa have a same risk effect, Y 1 of a sibling is calculated using the expressions in Table 1 as follows: Y 1 of an offspring is calculated as follows: A relation between the arithmetic mean and the geometrical average indicates that there is a relation of Y O1 > Y S1 unless v equals to q. Let us think about the incidence rate of the disease among the first-degree relatives, Q. When a polymorphism is involved in a part of the patients group, its share in the prevalence, P, is represented by the population attributable risk that is denoted by P(1-v/q) (Fig. 1A). Suppose that the risk allele of a polymorphism is the only genetic cause of a disease. For the first-degree relatives of the patients who do not have the risk allele the incident rate is not different from that in the general population. Therefore Q will be bigger than P by (Y 1 /X 1 − 1) for the effect of this polymorphism (Fig. 1B). Then the incidence rate of the disease for a sibling, Q s , is represented by Equation [6], as follows: The incidence rate for an offspring, Q o , is represented by Equation [7], as follows: Once Q s or Q o is estimated, the heritability of a polymorphism, h p 2 , is calculated by the Falconer's liability threshold model 6 .
Heritability of a polymorphism under an autosomal recessive (AR) model. It is known that some polymorphisms show a recessive effect. If the risk allele of a polymorphism shows a recessive effect, frequencies of the risk genotypes of a sibling and an offspring, Y S1 and Y O1 , are represented as follows, respectively: In the recessive model, homozygote is the risk genotype. Therefore the proportion of patients who have the risk genotype in the holder of risk allele is represented by u 2 /(u 2 + 2uv). The incidence rates of the disease among siblings and among offspring, if we consider only for the effect of the polymorphism are represented by next Equations, respectively, as follows: Heritability of a polymorphism under other inheritance models. h p 2 can be estimated for a polymorphism under any other inheritance models so far the frequency of the risk genotype(s) for the first-degree relatives can be calculated. If a polymorphism is located on an autosome and if the OR of heterozygote is smaller than that of homozygote, the h p 2 of this polymorphism is smaller than h p 2 under AD model and larger than h p 2 under AR model.
. q: allele frequency of the non-risk allele for the general population. v: allele frequency of the nonrisk allele for the patient group. X 1 : frequency of the risk genotype of the general population. Y 1 : frequency of the risk genotype of the first-degree relatives.  Table 2. Results of a trial to calculate h p 2 of CNVs and SNPs using published data. Odds ratio (OR), risk allele frequency (p), and prevalence of disease (P) of each polymorphism are cited from the literatures 3,10, 18-30 . P of schizophrenia is cited from a review 31 . *de novo CNV.
Calculation of the heritability of two or more polymorphisms. Falconer's method is based on the calculation of the "liability thresholds" for the prevalence of a disease in general population and for the recurrence rate in the first-degree relatives. Units of these measures are standard deviations and heritability is estimated by the difference of two measures 6 . The calculation of the heritability of two or more polymorphisms is possible. For this purpose second clause of Equation [6] or [7] for each polymorphism should be calculated and added finally to P.

Estimation of various CNVs and SNPs reported in the literatures.
Most germline CNVs are heritable 7 . However, heredity form of a CNV is not always known. Furthermore a de novo CNV is sometimes identified in the association studies (3). The heritability of a disease has been often estimated by twin studies. Monozygotic (MZ) twins share all germline polymorphisms including de novo variants, whereas dizygotic (DZ) twins usually do not share a de novo polymorphism. Because heritability is calculated by a difference between the concordance rates of MZ twins and DZ twins, a de novo polymorphism should also be involved in the estimation of heritability in a twin study. When we estimate the contribution of a CNV to the heritability of a disease by Falconer's model, the recurrence risk to hold the CNV for a sibling cannot be used theoretically because it may be a de novo CNV for the proband. On the other hand, the recurrence risk for an offspring can be used because all germline polymorphisms, including de novo ones, will be fundamentally transmitted to the offspring. Table 2 listed various CNVs and SNPs reported in the literatures. The h p 2 of these polymorphisms were calculated for offspring under the AD model. As shown in Table 2, CNVs generally have a larger h p 2 (> 0.01). A noteworthy result was that about 25% of the heritability of type 2 diabetes mellitus (T2DM) could be accounted for by one CNV, a value greater than the previously estimated heritability explained by all identified variants in GWAS published in 2012 8 . Another noteworthy result was that about 15% of the heritability of schizophrenia could be accounted for by four CNVs, although this value was smaller than the previously estimated heritability (23%) explained by all identified variants in GWAS published in 2012 9 . With regard to schizophrenia, it turned out that the h p 2 of a CNV that was detected only in patients (OR = + ∞) is large. The results in the analyses suggest that a large part of missing heritability of common diseases could be accounted for by a kind of CNVs. 15q13.3 microdeletions has been reported to be associated not only with schizophrenia but also with idiopathic generalized epilepsy (IGE) 2,10 . Although the accurate data of prevalence of IGE that contains several types of epilepsies could not be obtained, h p 2 of IGE was estimated to be 0.13-0.15 (not shown in Table 2). CNVs have been suspected to be involved in the pathophysiology of neuropsychiatric conditions 11 The results of trial estimation of the h p 2 of a polymorphism suggest that CNVs might be the major genetic cause of neuropsychiatric disorders.

Comparison of the required number of polymorphisms to explain a heritability. Previous
studies have estimated the heritability of sets of polymorphisms. Pawitan et al. showed how many variants were needed to explain a heritability of 0.4 in 2009 12 . In order to confirm that the calculated results by using the method described in the present study are consistent with those generated using other approaches, the required numbers of genetic variants under the AD model to explain a heritability of 0.4, when the prevalence of a disease is 0.01, were estimated. In this estimation the additive effect of each h p 2 was considered, in the other words, the "narrow sense" heritability was tried to be accounted for. The results by the method in the present study were shown comparing with those of Pawitan et al. in Table 3. The required number of genetic variants calculated using the median of the range of variants in a category was not different from their approximation for the same category except for the common variants of category 1.

Discussion
The estimations of heritability of polymorphisms were mainly conducted for the SNPs that were found in GWAS [1][2][3]12,13 . It is thought that the heritability of common diseases is due to multiple genes of small effect size and that even qualitative disorders can be interpreted simply as being the extremes of quantitative dimensions, that is, by the quantitative genetic analysis 14 . Recent studies demonstrated the interaction effects and the collective effects of SNPs in quantitative genetic traits [15][16][17] . However, I discuss here the conventional quantitative analysis under the premise that there are simple additive effects of polymorphisms. In quantitative genetic analysis authors have assumed a latent susceptibility (or liability) that varies between individuals 12 . The liability can be due to genetic and environmental factors, and heritability is defined as the proportion of the variance in liability due to genetic factors. For calculation of liability that is contributed by a SNP, OR of allele frequency or OR of risk genotype for a SNP is the fundamental factor for estimating the penetrance in the analysis 12,13 . Therefore when a SNP was detected only in patients (OR = + ∞), the calculation is theoretically impossible in the quantitative genetic analysis. After all the quantitative effect of genes with a small effect size is being handled in the analysis and the participation of gene with such a large effect size (OR = + ∞) is not assumed. Wellcome Trust Case  Control Consortium et al. published in 2010 the estimation of heritability of common CNVs, and they did not take into the consideration for the CNVs that were detected only in patients, either 4 . However, CNVs are sometimes detected only in the patients as shown in Table 2.
In this report a novel method to calculate heritability of a single polymorphism was shown. A trial to estimate the required numbers of genetic variants under the AD model to explain a heritability showed that the calculation results by using the method described in the present study are entirely consistent with those generated by a quantitative genetic analysis (Table 3). I did not introduce the penetrance in the calculation procedure but introduced the population attributable risk that would not be infinity when OR is + ∞. By the method in the present report it was suggested that heritability of some CNVs are quite large when it was calculated under the AD model. The heredity form of CNVs is often unknown, and only an OR of allele frequency for a CNV is usually available. Although by the calculation of heritability of CNVs only under the AD model, it was suggested a large part of missing heritability could be accounted for by re-evaluating the CNVs which have been already found and by searching novel CNVs with large h p 2 . The results of this study also suggest that CNVs might be the major genetic cause of neuropsychiatric disorders. In conclusion, CNVs were turned out to play important roles in familial aggregation of common diseases.

Methods
Calculation of genotype probabilities for a sibling. For the purpose of calculation of genotype probabilities for a sibling, an application of Beye's method is necessary. An example of the calculation of genotype probabilities by Beye's method for the father of the proband is shown in Table 4. As a result the posterior probability equals to the frequency of another allele (A or a) of the transmitted one (A) in the general population.
Then the genotype probabilities for a sibling are calculated. The calculation procedure of the genotype probabilities for a sibling was shown in Table 5. In Table 5, P1 and P2 are the posterior probabilities of genotypes of father and mother, respectively, and P3 is a conditioned probability of genotype of sibling. A joint probability is the product of F, P1, P2, and P3. The summation of joint probabilities for each genotype was shown in Table 1.

Calculation of genotype probabilities for an offspring. For calculation of genotype probabilities
for an offspring the Beye's method is not needed. The calculation procedure of the genotype probabilities for an offspring was shown in Table 6. The summation of joint probabilities for each genotype was shown in Table 1.
An example of calculation of heritability of a polymorphism. As an example of a common disease, let us choose schizophrenia. The prevalence, P, of schizophrenia is reported as 0.01. Here, CNV (16p11.2 dup) is chosen as an example of a polymorphism 18 . The frequency of a risk allele in patients, u, is 0.0039 and the frequency of a risk allele in asymptomatic individuals, x, is 0. Therefore p is calculated as 0.000039 using Equation [1]. By the way, P of schizophrenia (1%) is more than + 2.32635SD  of a general population. The mean distance from the median in the normal distribution is calculated as + 2.6652SD for the patients. The incidence rate under the autosomal dominant model of the disease in a first-degree relative, if we consider only for the effect of the CNV, is represented by Formula [7]: = + ( / ) ( )/( ) ( ) ----Q P P 1 v q [ 1 qv 1 q 1] 7 2 The incidence rate of schizophrenia is calculated as following: . + . × ( .