A powerful association test for qualitative traits incorporating imprinting effects using general pedigree data

Zhou, Ji-Yuan; He, Hai-Qiang; You, Xiao-Ping; Li, Shao-Zhan; Chen, Ping-Yan; Fung, Wing Kam

doi:10.1038/jhg.2014.109

Download PDF

Original Article
Published: 18 December 2014

A powerful association test for qualitative traits incorporating imprinting effects using general pedigree data

Ji-Yuan Zhou¹,
Hai-Qiang He¹,
Xiao-Ping You¹,
Shao-Zhan Li¹,
Ping-Yan Chen¹ &
…
Wing Kam Fung²

Journal of Human Genetics volume 60, pages 77–83 (2015)Cite this article

361 Accesses
1 Citations
7 Altmetric
Metrics details

Subjects

Abstract

For qualitative traits and diallelic marker loci, the pedigree disequilibrium test (PDT) based on general pedigrees and its extension (Monte Carlo PDT (MCPDT)) for dealing with missing genotypes are simple and powerful tests for association. There is an increasing interest of incorporating imprinting into association analysis. However, PDT and MCPDT do not take account of the information on imprinting effects in the analysis, which may reduce their test powers when the effects are present. On the other hand, the transmission disequilibrium test with imprinting (TDTI*) combines imprinting into the mapping of association variants. However, TDTI* only accommodates two-generation nuclear families and thus is not suitable for extended pedigrees. In this article, we first extend PDT to incorporate imprinting and propose PDTI for complete pedigrees (no missing genotypes). To fully utilize pedigrees with missing genotypes, we further develop the Monte Carlo PDTI (MCPDTI) statistic based on Monte Carlo sampling and estimation. Both PDTI and MCPDTI are derived in a two-stage framework. Simulation study shows that PDTI and MCPDTI control the size well under the null hypothesis of no association and are more powerful than PDT and TDTI* (based on a sample of nuclear families randomly selecting from pedigrees) when imprinting effects exist.

Leveraging information between multiple population groups and traits improves fine-mapping resolution

Article Open access 10 November 2023

Robust association tests for quantitative traits on the X chromosome

Article 10 September 2022

Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture

Article 29 July 2019

Introduction

Imprinting, also known as parent-of-origin effects, is a genetic phenomenon that a certain gene is differentially expressed between paternal and maternal alleles, which is an important epigenetic factor in the study of complex traits.^{1, 2} Currently, some researches have demonstrated that some complex diseases are related to imprinting effects, such as Beckwith–Wiedemann, Prader–Willi and Angelman syndromes.^{3, 4, 5} Morison et al.⁶ constructed an imprinted gene database to collect genes that show imprinting effects (http://igc.otago.ac.nz). For diallelic marker loci, Zhou et al.⁷ proposed the pedigree parental-asymmetry test (PPAT) and Monte Carlo (MC) PPAT (MCPPAT) based on general pedigree data to test for parent-of-origin effects. PPAT is generally suitable for complete pedigrees (without any missing genotypes), while MCPPAT can deal with pedigree data with missing genotypes (incomplete pedigrees) based on MC sampling and estimation procedures, which can recapture most of the missing genotypes based on the observed genotypes in the pedigrees. On the other hand, in the literature, some parametric approaches have been developed to test for imprinting. Yang and Lin^{8, 9} proposed two likelihood methods for simultaneously detecting imprinting and maternally mediated effects in pedigrees. Lin et al.¹⁰ and Ainsworth et al.¹¹ suggested two multi-locus likelihood approaches for assessing parent-of-origin effects by using mothers and their offspring.

Currently, there has been an increasing interest of incorporating imprinting into association analysis. For a diallelic marker locus, Xia et al.¹² developed the transmission disequilibrium test with imprinting (TDTI*) for qualitative traits, which takes imprinting effects into account. Xia et al.¹³ further extended TDTI* to tackle quantitative traits and proposed the Q-C-TDTI test statistic. However, both TDTI* and Q-C-TDTI only accommodate two-generation nuclear families and thus are not suitable for extended pedigrees. On the other hand, the pedigree disequilibrium test (PDT) based on general pedigree data and its extension MCPDT based on MC sampling and estimation for dealing with missing genotypes are simple and powerful tests for association in the presence of linkage.^{14, 15} However, PDT and MCPDT do not take into consideration the information on imprinting effects in the analysis, which may reduce their test powers when the effects are present.

In this article, we first extend PDT to take imprinting effects into account and propose two new test statistics PDT_m and PDT_p, the maternal and paternal versions of PDT, based on complete pedigrees for qualitative traits. Then we suggest the PDTI statistic, which is derived in a two-stage test process for detecting association in the presence of linkage. In the first stage, we use the existing imprinting test PPAT to detect if there is paternal, maternal or no imprinting effects at the marker locus under study. In the second stage, we test for association by selecting an appropriate statistic among the three test statistics PDT_m, PDT and PDT_p according to the findings of PPAT. To make full use of pedigrees with some missing genotypes, we further extend PDTI and develop the Monte Carlo PDTI (MCPDTI) statistic based on MC sampling and estimation, which utilizes MCPPAT in the first stage to test for imprinting and then conducts association analysis in the second stage by choosing a suitable statistic among the three test statistics MCPDT_m (maternal version of MCPDT), MCPDT and MCPDT_p (paternal version of MCPDT) according to the findings of MCPPAT. As MC sampling and estimation require marker allele frequencies, so we assume that the population under study is homogeneous for MCPDTI. Simulation study under various settings is conducted to evaluate the validity and performance of the proposed tests. Simulation results show that PDTI and MCPDTI control the size well under the null hypothesis of no association. When the Hardy–Weinberg equilibrium in the population is not severely violated, MCPDTI still controls the type I error rates well under the null hypothesis of no association from our simulation study. Further, power comparison demonstrates that PDTI and MCPDTI are more powerful than PDT and TDTI* (based on a sample of nuclear families, where we randomly select one from each pedigree) when imprinting effects exist. On the other hand, PDTI and MCPDTI have almost the same performance as PDT in the absence of imprinting, which are still much more powerful than TDTI*.

Materials and methods

Background and notations

Suppose that we collect a sample of N pedigrees with the ith pedigree having n_i nuclear families, i=1, 2, …, N. For the jth nuclear family in the ith pedigree, assume that there are n_1ij case-parent trios (consisting of an affected child and his/her parents) and n_2ij discordant sib pairs (DSPs, being composed of one affected and one unaffected sibs), i=1, 2, …, N; j=1, 2, …, n_i. Consider a diallelic marker locus with two alleles M₁ and M₂. Let F_ij, M_ij and C_ijk denote the number of copies of M₁ for the father, the mother and the kth child (affected or unaffected) in the jth nuclear family of the ith pedigree, respectively. As such, F_ij, M_ij and C_ijk take possible values of 0, 1 or 2 for genotypes M₂M₂, M₁M₂ or M₁M₁. We also consider a disease susceptibility locus (DSL) with disease allele D and normal allele d. To distinguish the paternal and maternal alleles, we introduce four ordered genotypes D/D, D/d, d/D and d/d at the DSL, of which the allele before the slash is paternal and the allele after the slash is maternal. Further, the corresponding penetrances are denoted by f₂, f₁₀, f₀₁ and f₀, respectively. When the Mendel’s law holds, f₁₀=f₀₁, which means that there is no parent-of-origin effects; otherwise, there may be parent-of-origin effects at the DSL. The degree of imprinting is defined as I=(f₁₀−f₀₁)/2, which is used to measure the imprinting effects. If association exists, the marker allele M₁ is assumed to be in positive linkage disequilibrium (LD) with the disease. To this end, I>0 (I<0) suggests the presence of maternal (paternal) imprinting effect.¹⁶ Further, we assume mating symmetry in the parental generation,^{17, 18} that the population under study is homogeneous and that there are no maternally mediated effects at the DSL.

Existing PDT and PPAT statistics for complete pedigree data

We begin by describing the existing PDT (for testing for association) and PPAT (for the detection of parent-of-origin effects) statistics. For the jth nuclear family in the ith pedigree, let and , i=1, 2, …, N; j=1, 2, …, n_i, where T_ijk and NT_ijk represent the numbers of M₁ being transmitted and not being transmitted from the heterozygous parents in each case-parent trio, respectively; A_ijl and U_ijl denote the numbers of M₁ in the affected and unaffected sibs for each DSP, respectively; I_{{comparison statement}} is 1 if the comparison statement is true and is 0 otherwise. Taking the summation of the above statistics over all the nuclear families and all the pedigrees, we obtain and , and then the PDT and PPAT statistics can be formulated as , where and are the unbiased estimates of the variances of s and s_I under the null hypotheses of no association and no imprinting, respectively.^{7, 15}

PDTI for complete pedigree data

PDT is a powerful method for detecting association. However, it does not take into account the information on parent-of-origin effects in the analysis, which may suffer from much power loss when imprinting effects are present. To this end, following the idea of constructing TDTI* for nuclear families,¹² we propose the PDTI statistic for complete pedigrees in the following, which is based on the existing PDT, and two statistics PDT_m and PDT_p proposed in this article, to incorporate parent-of-origin effects into association study. The statistics PDT_m and PDT_p, the maternal and paternal versions of PDT, can be obtained by replacing s_ij involved in PDT by s_mij and s_pij, respectively, where

T_mijk and NT_mijk (T_pijk and NT_pijk) are the numbers of M₁ being transmitted and not being transmitted from the heterozygous mother (father), respectively; A_mijl and U_mijl (A_pijl and U_pijl) are the numbers of the maternal (paternal) M₁ allele in the affected and unaffected sibs for a DSP, respectively. The calculation of T_mijk−NT_mijk (T_pijk−NT_pijk) is direct, while the calculation of A_mijl−U_mijl (A_pijl−U_pijl) is a little more complicated as we need to determine which allele in each member of a DSP is maternal and which one is paternal. To do so, we need to simultaneously consider two members in each DSP together with their parents.

For easy description here, we ignore the subscripts for indexing the DSP, nuclear family and pedigree and simply use A−U, A_m−U_m and A_p−U_p to replace A_ijl−U_ijl, A_mijl−U_mijl and A_pijl−U_pijl, respectively. Note that A−U=0 involved in PDT when two members in a DSP have the same genotypes, which does not provide information to association. As such, A_m−U_m=0 and A_p−U_p=0 under this situation. Furthermore, if the mating type is M₁M₂ × M₁M₂, and one of two members in a DSP is homozygous and the other is heterozygous (for example, the affected and unaffected sibs, respectively, have genotypes C_A=M₁M₁ and C_U=M₁M₂), then the allele M₁ in the unaffected sib may come from either the father or the mother. Thus, we assume that both A_m−U_m and A_p−U_p are 0.5 for this situation, following the idea from Li et al.¹⁹ Similarly, both A_m−U_m and A_p−U_p are taken to be 0.5 for C_A=M₁M₂ and C_U=M₂M₂, −0.5 for C_A=M₁M₂ and C_U=M₁M₁ and C_A=M₂M₂ and C_U=M₁M₂. Table 1 shows the values of A_m−U_m and A_p−U_p based on all the 29 possible genotype combinations of a DSP and their parents, together with the corresponding conditional joint probabilities w_r’s given that two children are a DSP, r=1,2,…,29. It is shown in Supplementary Information that E(A_m−U_m)=0 and E(A_p−U_p)=0 under the null hypothesis of no association.

Table 1 Values of A_m−U_m and A_p−U_p for all the 29 possible genotype combinations of a DSP and their parents, together with the corresponding conditional joint probabilities w_r’s given that two children are a DSP

Full size table

Let and . Thus, E(s_m)=0 and E(s_p)=0 under the null hypothesis of no association. As such, the following PDT_m and PDT_p statistics are valid in testing for association

where and are, respectively, the unbiased estimates of the variances of s_m and s_p under the null hypothesis of no association, which have taken into account the dependencies among all the case-parent trios, the dependencies among all the DSPs and the dependencies between the case-parent trios and the DSPs within each pedigree (see Supplementary Information). As can be seen from the variance estimates, both the PDT_m and PDT_p statistics asymptotically follow a standard normal distribution under the null hypothesis of no association when the sample size is large.

Based on PDT_m, PDT and PDT_p, the proposed PDTI statistic is formulated as

where α_I is the prespecified significance level for the imprinting test PPAT, and z_αI/2 is the upper quantile of the standard normal distribution corresponding to the probability of α_I/2. The formula of the PDTI statistic demonstrates that, in the presence of paternal (maternal) imprinting effect, the PDT_m (PDT_p) statistic would be chosen to conduct the detection of association. Otherwise, PDT will be used to test for the association.

In the absence of association, the PDT_m, PDT, PDT_p and PPAT statistics asymptotically follow a standard normal distribution N(0, 1), respectively. However, they are not independent, and the covariance between PDT_m and PPAT under the null hypothesis of no association can be estimated as

where , which is the unbiased estimate of the covariance between s_m and s_I under the null hypothesis of no association (see Supplementary Information). Also, the covariance between PDT and PPAT, and that between PDT_p and PPAT, can be estimated in a similar way. Denote the estimates of the 2 × 2 variance–covariance matrices of the statistics (PDT_m, PPAT), (PDT, PPAT) and (PDT_p, PPAT) by and , respectively. Under the null hypothesis H₀ of no association, the asymptotic distributions of these three pairs of statistics are bivariate normal distributions with zero mean vectors and the variance–covariance matrices given above. Let t be the observed value of PDTI and denote the P-value of the test for association by P=P_H0(PDTI>|t|). Then, based on the asymptotic bivariate normal distributions, we obtain

where R(y)={y:−∞<y<−|t|}∪{y:|t|<y<∞}. Moreover, we have also developed a method on how to find the critical value of PDTI, which can be used to determine whether or not to reject the null hypothesis (see Supplementary Information).

Although we introduce the PDTI method under the assumption that M₁ is in positive LD with the disease allele D, PDTI remains valid for the negative LD between M₁ and D. Note that if M₁ is in positive LD with D, the positive (negative) values of PPAT indicate maternal (paternal) imprinting effect. Conversely, if M₁ is in negative LD with D (that is, M₂ is positively associated with D), the positive (negative) values of PPAT show paternal (maternal) imprinting effect. However, similar to TDTI*, the PDTI statistic keeps unchanged irrespective of the positive or negative LD, which signifies the validity of PDTI for practical use (the proof omitted for brevity).

MCPDTI for incomplete pedigree data

In the case in which there are missing genotypes in pedigrees, we extend PDTI and propose MCPDTI by using MC sampling, which may recapture most information on missing genotypes based on the observed genotypes. The existing MCPDT and MCPPAT statistics are the MC versions of PDT and PPAT, respectively. Now, we turn to introduce the MCPDT_m and MCPDT_p statistics, the MC versions of PDT_m and PDT_p, respectively.

For easy description here, we delete the subscripts for indexing the nuclear family and pedigree. Let s_m denote the contribution from a pedigree to the PDT_m statistic in Equation (1). Let G_o and G_u be the sets of the observed and unobserved (missing) genotypes in the pedigree, respectively, and S be the observed phenotypes (disease affection statuses). For the case with missing data, s_m is unavailable, while its conditional expectation given G_o can be estimated by s_mMC,^{7, 14} which is based on the MC simulation scheme and can be formulated as follows, , where G_ux’s are independent samples randomly drawn from P(G_u|G_o), x=1,2,…,X, which can be efficiently accomplished based on the peeling algorithm using the SLINK software.²⁰ Subsequently, MCPDT_m can be calculated by using Equation (1) with s_m replaced by s_mMC. Similarly, MCPDT_p can be given by replacing s_p by s_pMC.

Note that if all the pedigrees are assumed to be drawn from a certain underlying population and S is treated as random, the expectations of s_mMC and s_pMC are 0 under the null hypothesis of no association. The idea of the corresponding proof is similar to Zhou et al.⁷ and Ding et al.,¹⁴ and hence we omit it here for brevity. As such, the MCPDTI statistic can be formulated by

Obviously, MCPDTI shares some common properties with PDTI. First of all, MCPDTI is not influenced by positive or negative LD between M₁ and D. Next, MCPPAT and three MCPDT-type statistics are not independent. Hence, to clarify the asymptotic null distribution of MCPDTI, the derivation of their covariances among them is required. On the other hand, note that the MC scheme relies on marker allele frequencies. This means that MCPDTI needs the assumption of the population under study being homogeneous. The derivation of the critical value of MCPDTI is similar to that of PDTI (omitted here for brevity).

Results

Settings

In this section, a simulation study is conducted to investigate the performance of the proposed PDTI and MCPDTI statistics and to make the power comparison of PDTI and MCPDTI with the existing PDT and TDTI*. A homogeneous population is considered. We simulate one marker and one DSL with the recombination fraction being fixed at 0.001. Three sets of haplotype frequencies for haplotypes DM₁, dM₁, DM₂ and dM₂ are used to simulate the test powers: {0.13, 0.02, 0.12, 0.73}, {0.22, 0.03, 0.03, 0.72} and {0.23, 0.12, 0.02, 0.63}, with the allele frequencies of M₁ being taken as 0.15, 0.25 and 0.35, respectively. The allele frequency of disease allele D is fixed at 0.25. For studying the empirical size of the proposed test statistics under the null hypothesis of no association, four haplotype frequencies for haplotypes DM₁, dM₁, DM₂ and dM₂ are just taken as the products of two allele frequencies on each haplotype, respectively. For example, if the allele frequency of M₁ is 0.15, then the frequency of haplotype DM₁ is 0.25 × 0.15=0.0375. However, the recombination fraction between the marker and the DSL is still fixed to be 0.001. Further, note that no association means no parent-of-origin effects. To this end, we simulate the empirical size of the proposed test statistics under no association and no parent-of-origin effect model (see below).

We consider three sets of two homozygote penetrances f₂ and f₀, {0.39, 0.26}, {0.44, 0.24} and {0.48, 0.22}, which are very similar to those in Ding et al.¹⁴ For each set of homozygote penetrances, we further consider three parent-of-origin effect models by setting the various values of f₁₀ and f₀₁: no parent-of-origin effect model, incomplete parent-of-origin effect model, and complete parent-of-origin effect model. Therefore, for each parent-of-origin-effect model, three sets of haplotype frequencies and three sets of homozygote penetrances lead to nine simulation settings, which are listed in Table 2.

Table 2 Simulation settings^a

Full size table

Figure 1 gives three types of common pedigree structures used in our simulation study: (a) two-generation family with 5 individuals, (b) three-generation family with 10 individuals, and (c) four-generation family with 12 individuals. To investigate the performance of the proposed tests under different sample sizes, 50 and 100 pedigrees under each pedigree structure are simulated in each replicate, which leads to the total sample size being 150 and 300, respectively. Note that here a pedigree was ascertained if at least one of the nonfounders was affected. For MCPDTI, 50 MC samples of missing genotypes are generated for each replicate using the SLINK software.²⁰ Both the true marker allele frequencies or those estimated from the genotyped founders in each replicate are used in the MC sampling where applicable.

To evaluate the performance of our proposed tests and compare PDTI and MCPDTI with the existing PDT and TDTI* tests, we consider the following five statistics: PDTI, PDT, TDTI*, MCPDTI_T, and MCPDTI_E. The first three statistics PDTI, PDT and TDTI* are obtained based on complete pedigree data without any missing genotypes. PDTI can be regarded as a golden standard for the association study incorporating parent-of-origin effects. Note that PDT does not take account of parent-of-origin effects, and hence its powers will not be influenced by different degrees of imprinting (that is, different parent-of-origin effect models). As such, here PDT can be considered as a reference line. Note that TDTI* can only accommodate two-generation nuclear family data. So, in our simulation, TDTI* is based on a sample of nuclear families, where only one nuclear family is randomly chosen from a pedigree. Further, MCPDTI_T and MCPDTI_E are on the basis of incomplete pedigree data with genotypes of individual 1 in two-generation families, individuals 1, 4 and 5 in three-generation pedigrees and individuals 1 and 3 in four-generation pedigrees being removed from complete pedigree data; MCPDTI_T and MCPDTI_E are based on the true and estimated marker allele frequencies, respectively.

Ten thousand replicates under each simulation setting mentioned above are simulated to assess the empirical size and powers of the proposed tests at the significance levels of 5% and 1%. The corresponding significance level of the imprinting test in the first stage is taken to be the same as the association test in the second stage. The size and power results at the 1% significance level can be seen in Supplementary Table S1 and Supplementary Figure S1, respectively.

To have a better understanding of the performance of PDTI and MCPDTI statistics, we conduct two additional simulation studies as shown in Supplementary Information. One is based on the settings with the less frequency of disease allele D (being fixed at 0.1) and the family sample consisting of 150 and 200 pedigrees under each pedigree structure, which leads to 450 and 600 pedigrees in total, respectively. For detailed simulation settings, see Supplementary Table S2. The other simulation study is based on the settings with the frequency of disease allele D being taken as 0.1, 0.15 and 0.2 under Hardy–Weinberg disequilibrium at a candidate locus, where the inbreeding coefficient ρ is taken to be 0.03 and 0.05. The family sample in each replicate consists of 100 pedigrees under each pedigree structure, where the number of pedigrees is 300 in total. The detailed simulation settings can be found in Supplementary Tables S3 and S4. The simulation results are shown in Supplementary Tables S5–S8 and Supplementary Figures S2–S5.

Size and powers of PDTI, PDT, TDTI* and MCPDTI

Table 3 displays the empirical type I error rates of PDTI, PDT, TDTI* and MCPDTI under nine simulation settings based on 150 and 300 pedigrees at the nominal 5% significance level, respectively. For 10 000 simulation replicates, the s.e. for the type I error rate estimate is (0.05 × 0.95/10 000)^1/2=2.18 × 10⁻³ when the nominal level is fixed at 5%. Note that in Table 3, there are altogether 54 observed type I error rates corresponding to the proposed tests (PDTI, MCPDTI_T and MCPDTI_E), and 2 out of the 54 are greater than the upper limit of the interval (0.04564, 0.05436), that is, 0.05±2 × s.e. Thus, generally speaking, the tests maintain the type I error rates satisfactorily.

Table 3 Type I error rates (in percentage (%)) of PDTI, PDT, TDTI* and MCPDTI under nine simulation settings based on 150 and 300 pedigrees at nominal 5% significance level, respectively

Full size table

Figure 2 gives the simulated powers of PDTI, MCPDTI_T, MCPDTI_E, PDT and TDTI* under nine simulation settings based on 150 and 300 pedigrees at the 5% significance level for no, incomplete and complete parent-of-origin effect models, respectively. We can see from each subplot of the figure that MCPDTI, by the MC sampling scheme, can recapture most information on the missing genotypes and has similar performance to PDTI based on complete pedigree data. MCPDTI based on the estimated marker allele frequencies (MCPDTI_E) almost has the same power as that based on the true marker allele frequencies (MCPDTI_T). When there are no parent-of-origin effects, with the haplotype frequencies, penetrances and sample size being fixed (for each simulation setting in Figures 2a and d), the powers of PDTI, MCPDTI_T and MCPDTI_E are very close to that of PDT and are all much higher than that of TDTI*, which only accommodates nuclear family data. However, for each simulation setting with incomplete or complete parent-of-origin effects (Figures 2b, c, e or f), PDTI, MCPDTI_T and MCPDTI_E are all more powerful than both PDT and TDTI*. The power of PDT based on pedigrees is much larger than TDTI*, even though PDT does not take account of imprinting but TDTI* does.

Furthermore, when the sample size is fixed for no, incomplete and complete parent-of-origin effect models (among Figures 2a, b and c or among Figures 2d, e and f), the power of PDT (the reference base) is almost not affected by the degree of imprinting. However, PDTI, MCPDTI and TDTI* under the incomplete and complete parent-of-origin effect models are more powerful, compared with no parent-of-origin effect model. The power difference between our proposed test PDTI/MCPDTI_T/MCPDTI_E and PDT is larger and the gap in power between PDT and TDTI* is smaller, by comparing the power results under the complete parent-of-origin effect model with the incomplete model. The powers based on 300 pedigrees are higher than those based on 150 pedigrees (Figure 2; d vs a, e vs b and f vs c), and the corresponding power improvement of PDTI and MCPDTI is more significant compared with that of PDT, as expected.

It is also shown in each subplot of Figure 2 that, with the sample size and the penetrances being fixed, the powers of all the five test statistics become larger when the simulation settings change from settings 1 to 3 (settings 4 to 6 or settings 7 to 9), due to the increase of the LD coefficient for three sets of haplotype frequencies. As for settings 1, 4 and 7 (or settings 2, 5 and 8/settings 3, 6 and 9), which are based on the same set of haplotype frequencies but different relative risks of two homozygote penetrances (f₂/f₀=1.50, 1.83 and 2.18), all the statistics have higher and higher power.

Discussion

In this article, we have extended PDT and MCPDT to PDTI and MCPDTI to incorporate parent-of-origin effects for association analysis based on general pedigree data. Simulation study is carried out to investigate the validity and performance of the proposed PDTI and MCPDTI tests and to compare PDTI and MCPDTI with the existing PDT and TDTI* for various sets of haplotype frequencies, parent-of-origin effect models, significance levels and sample sizes. Simulation results show that the proposed PDTI and MCPDTI control the size well under the null hypothesis of no association.

As for power comparison of PDTI and MCPDTI with PDT (ignoring the information on imprinting effects) and TDTI* (only accommodating two-generation nuclear families), under no parent-of-origin effect model, PDTI and MCPDTI have similar performance, which are comparable to PDT. Under incomplete parent-of-origin effect model, PDTI and MCPDTI, by using the information on the parent-of-origin effects and the extended pedigree data, outperform PDT and TDTI*. Note that in our simulation study, MCPDTI for incomplete pedigree data is even more powerful than PDT for complete pedigree data, although the power improvement is not so significant. However, as the degree of imprinting increases, under the complete parent-of-origin effect model, the powers of PDTI and MCPDTI are much higher than that of PDT and TDTI*. Moreover, with the sample size increasing, the improvement in power of our proposed methods over PDT is more distinct. Even though the frequency of the disease allele is smaller (being taken as 0.1), our methods are still more powerful than PDT and TDTI* in the presence of imprinting effects (see Supplementary Information). On the other hand, MCPDTI can recapture most of the missing genotypes based on the observed genotypes by the MC sampling and estimation. Our software MCPDTI is freely available at http://www.echobelt.org/web/UploadFiles/MCPDTI.html, which is implemented in R (http://www.r-project.org/).

In the calculation process of the proposed PDTI and MCPDTI test statistics, we first divide each extended pedigree in the collected sample into several nuclear families (two-generation families). For each nuclear family, PDTI and MCPDTI will use all the case-parent trios and all the DSPs in it. As such, the alleles of both parents will repeatedly be used in different case-parent trios, the alleles of some affected children may repeatedly be used in both case-parent trios and DSPs and the alleles of some unaffected children may repeatedly be utilized in different DSPs. Further, for different nuclear families from this extended pedigree, there may also be the repeated use of the alleles of some family members who can be the children of the elder generation and the parents of the younger generation. Thus the repeated use of these family members mentioned above will lead to the dependencies among the case-parent trios, the dependencies among the DSPs and the dependencies between the case-parent trios and the DSPs within each pedigree. However, it should be emphasized that PDTI and MCPDTI have taken account of all the dependencies (see Supplementary Information).

Although our simulation study is conducted under settings with positive LD coefficient between marker allele M₁ and disease allele D, PDTI and MCPDTI are still suitable for situations where M₁ is in negative LD with D. On the other hand, like other methods, the proposed PDTI and MCPDTI methods have their own limitations. PDTI is not applicable to the situation where there are some genotypes missing, and thus we recommend MCPDTI for this case to conduct association analysis. Moreover, MCPDTI needs the assumption of the underlying population being homogeneous. However, it still controls the size well if the violation of Hardy–Weinberg equilibrium is not so severe in the population (see Supplementary Information). In addition, although our simulation study is conducted by fixing the recombination fraction at 0.001, the proposed PDTI and MCPDTI are still more powerful than the existing PDT and TDTI* for the complete LD case and when the recombination fraction is 0 (see Supplementary Tables S9 and S10). Finally, we should mention that, in the data collection process of our simulation study, the requirement for a pedigree to be included is at least one affected nonfounder, while it is not necessary to require that each nuclear family within a large pedigree has at least one affected child. In practice, the pedigrees with more affected individuals are more likely to be recruited. This ascertainment bias due to the difference in the number of affected individuals in pedigree is not considered in the present study. How the ascertainment bias affects the proposed test statistics is our future work.

References

Pfeifer, K . Mechanisms of genomic imprinting. Am. J. Hum. Genet. 67, 777–787 (2000).
Article CAS Google Scholar
Reik, W . & Walter, J . Genomic imprinting: parental influence on the genome. Nat. Rev. Genet. 2, 21–32 (2001).
Article CAS Google Scholar
Falls, J. G ., Pulford, D. J ., Wylie, A. A . & Jirtle, R. L . Genomic imprinting: implications for human disease. Am. J. Pathol. 154, 635–647 (2001).
Article Google Scholar
Ziegler, A ., König, I. R . & Pahlke, F . A Statistical Approach to Genetic Epidemiology: Concepts and Applications, with an E-Learning Platform, (Wiley-Vch, Weinheim, Germany, 2010).
Book Google Scholar
Zhou, J. Y ., Mao, W. G ., Li, D. L ., Hu, Y. Q ., Xia, F . & Fung, W. K . A powerful parent-of-origin effects test for qualitative traits incorporating control children in nuclear families. J. Hum. Genet. 57, 500–507 (2012).
Article CAS Google Scholar
Morison, I. M ., Paton, C. J . & Cleverley, S. D . The imprinted gene and parent-of-origin effect database. Nucl. Acids Res. 29, 275–276 (2001).
Article CAS Google Scholar
Zhou, J. Y ., Ding, J ., Fung, W. K . & Lin, S . Detection of parent-of-origin effects using general pedigree data. Genet. Epidemiol. 34, 151–158 (2010).
Article Google Scholar
Yang, J . & Lin, S . Likelihood approach for detecting imprinting and in utero maternal effects using general pedigrees from prospective family-based association studies. Biometrics 68, 477–485 (2012).
Article Google Scholar
Yang, J . & Lin, S . Robust partial likelihood approach for detecting imprinting and maternal effects using case-control families. Ann. Appl. Stat. 7, 249–268 (2013).
Article Google Scholar
Lin, D ., Weinberg, C. R ., Feng, R ., Hochner, H . & Chen, J . A multi-locus likelihood method for assessing parent-of-origin effects using case-control mother-child pairs. Genet. Epidemiol. 37, 152–162 (2013).
Article Google Scholar
Ainsworth, H. F ., Unwin, J ., Jamison, D. L . & Cordell, H. J . Investigation of maternal effects, maternal-fetal interactions and parent-of-origin effects (imprinting), using mothers and their offspring. Genet. Epidemiol. 35, 19–45 (2011).
Article Google Scholar
Xia, F ., Zhou, J. Y . & Fung, W. K . A powerful approach for association analysis incorporating imprinting effects. Bioinformatics 27, 2571–2577 (2011).
Article CAS Google Scholar
Xia, F ., Zhou, J. Y . & Fung, W. K . Powerful tests for association on quantitative trait loci incorporating imprinting effects. J. Hum. Genet. 58, 384–390 (2013).
Article CAS Google Scholar
Ding, J ., Lin, S . & Liu, Y . Monte Carlo pedigree disequilibrium test for markers on the X chromosome. Am. J. Hum. Genet. 79, 567–573 (2006).
Article CAS Google Scholar
Martin, E. R ., Monks, S. A ., Warren, L. L . & Kaplan, N. L . A test for linkage and association in general pedigrees: the pedigree disequilibrium test. Am. J. Hum. Genet. 67, 146–154 (2000).
Article CAS Google Scholar
Strauch, K ., Fimmers, R ., Kurz, T ., Deichmann, K. A ., Wienker, T. F . & Baur, M. P . Parametric and nonparametric multipoint linkage analysis with imprinting and two-locus-trait models: application to mite sensitization. Am. J. Hum. Genet. 66, 1945–1957 (2000).
Article CAS Google Scholar
Hu, Y. Q ., Zhou, J. Y . & Fung, W. K . An extension of the transmission disequilibrium test incorporating imprinting. Genetics 175, 1489–1504 (2007).
Article Google Scholar
Weinberg, C. R . Methods for detection of parent-of-origin effects in genetic studies of case-parents triads. Am. J. Hum. Genet. 65, 229–235 (1999).
Article CAS Google Scholar
Li, X ., Sui, Y ., Liu, T ., Wang, J ., Li, Y . & Lin, Z . et al. A model for family-based case-control studies of genetic imprinting and epistasis. Brief. Bioinform. 15, 1069–1079 (2013).
Article Google Scholar
Weeks, D. E ., Ott, J . & Lathrop, G. M . SLINK: a general simulation program for linkage analysis. Am. J. Hum. Genet. 47, A204 (1990).
Google Scholar

Download references

Acknowledgements

We thank the Associate Editor and two reviewers for their insightful and helpful suggestions, which have led to improvement of the presentation. This work was supported by the National Natural Science Foundation of China (grants 81373098 and 81072386), Science and Technology Planning Project of Guangdong Province, China (grant 2013B021800038), the National and Guangdong University Students’ Innovation and Enterprise Training Project of China (grants 201312121014 and 1212113029), grant from School of Public Health and Tropical Medicine of Southern Medical University, China (grant GW201422) and the Hong Kong RGC GRF Research Grant (HKU 766511M). We are also grateful to Dr Jie Ding who shares his R code with us.

Author information

Authors and Affiliations

Department of Biostatistics, State Key Laboratory of Organ Failure Research, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, China
Ji-Yuan Zhou, Hai-Qiang He, Xiao-Ping You, Shao-Zhan Li & Ping-Yan Chen
Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China
Wing Kam Fung

Authors

Ji-Yuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hai-Qiang He
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Ping You
View author publications
You can also search for this author in PubMed Google Scholar
Shao-Zhan Li
View author publications
You can also search for this author in PubMed Google Scholar
Ping-Yan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wing Kam Fung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ji-Yuan Zhou or Wing Kam Fung.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Supplementary Information accompanies the paper on Journal of Human Genetics website

Supplementary information

Supplementary Information (PDF 178 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, JY., He, HQ., You, XP. et al. A powerful association test for qualitative traits incorporating imprinting effects using general pedigree data. J Hum Genet 60, 77–83 (2015). https://doi.org/10.1038/jhg.2014.109

Download citation

Received: 30 July 2014
Revised: 19 November 2014
Accepted: 24 November 2014
Published: 18 December 2014
Issue Date: February 2015
DOI: https://doi.org/10.1038/jhg.2014.109

This article is cited by

Generalized disequilibrium test for association in qualitative traits incorporating imprinting effects based on extended pedigrees
- Jian-Long Li
- Peng Wang
- Ji-Yuan Zhou
BMC Genetics (2017)