Introduction

Imprinting, also known as parent-of-origin effects, is a genetic phenomenon that a certain gene is differentially expressed between paternal and maternal alleles, which is an important epigenetic factor in the study of complex traits.1, 2 Currently, some researches have demonstrated that some complex diseases are related to imprinting effects, such as Beckwith–Wiedemann, Prader–Willi and Angelman syndromes.3, 4, 5 Morison et al.6 constructed an imprinted gene database to collect genes that show imprinting effects (http://igc.otago.ac.nz). For diallelic marker loci, Zhou et al.7 proposed the pedigree parental-asymmetry test (PPAT) and Monte Carlo (MC) PPAT (MCPPAT) based on general pedigree data to test for parent-of-origin effects. PPAT is generally suitable for complete pedigrees (without any missing genotypes), while MCPPAT can deal with pedigree data with missing genotypes (incomplete pedigrees) based on MC sampling and estimation procedures, which can recapture most of the missing genotypes based on the observed genotypes in the pedigrees. On the other hand, in the literature, some parametric approaches have been developed to test for imprinting. Yang and Lin8, 9 proposed two likelihood methods for simultaneously detecting imprinting and maternally mediated effects in pedigrees. Lin et al.10 and Ainsworth et al.11 suggested two multi-locus likelihood approaches for assessing parent-of-origin effects by using mothers and their offspring.

Currently, there has been an increasing interest of incorporating imprinting into association analysis. For a diallelic marker locus, Xia et al.12 developed the transmission disequilibrium test with imprinting (TDTI*) for qualitative traits, which takes imprinting effects into account. Xia et al.13 further extended TDTI* to tackle quantitative traits and proposed the Q-C-TDTI test statistic. However, both TDTI* and Q-C-TDTI only accommodate two-generation nuclear families and thus are not suitable for extended pedigrees. On the other hand, the pedigree disequilibrium test (PDT) based on general pedigree data and its extension MCPDT based on MC sampling and estimation for dealing with missing genotypes are simple and powerful tests for association in the presence of linkage.14, 15 However, PDT and MCPDT do not take into consideration the information on imprinting effects in the analysis, which may reduce their test powers when the effects are present.

In this article, we first extend PDT to take imprinting effects into account and propose two new test statistics PDTm and PDTp, the maternal and paternal versions of PDT, based on complete pedigrees for qualitative traits. Then we suggest the PDTI statistic, which is derived in a two-stage test process for detecting association in the presence of linkage. In the first stage, we use the existing imprinting test PPAT to detect if there is paternal, maternal or no imprinting effects at the marker locus under study. In the second stage, we test for association by selecting an appropriate statistic among the three test statistics PDTm, PDT and PDTp according to the findings of PPAT. To make full use of pedigrees with some missing genotypes, we further extend PDTI and develop the Monte Carlo PDTI (MCPDTI) statistic based on MC sampling and estimation, which utilizes MCPPAT in the first stage to test for imprinting and then conducts association analysis in the second stage by choosing a suitable statistic among the three test statistics MCPDTm (maternal version of MCPDT), MCPDT and MCPDTp (paternal version of MCPDT) according to the findings of MCPPAT. As MC sampling and estimation require marker allele frequencies, so we assume that the population under study is homogeneous for MCPDTI. Simulation study under various settings is conducted to evaluate the validity and performance of the proposed tests. Simulation results show that PDTI and MCPDTI control the size well under the null hypothesis of no association. When the Hardy–Weinberg equilibrium in the population is not severely violated, MCPDTI still controls the type I error rates well under the null hypothesis of no association from our simulation study. Further, power comparison demonstrates that PDTI and MCPDTI are more powerful than PDT and TDTI* (based on a sample of nuclear families, where we randomly select one from each pedigree) when imprinting effects exist. On the other hand, PDTI and MCPDTI have almost the same performance as PDT in the absence of imprinting, which are still much more powerful than TDTI*.

Materials and methods

Background and notations

Suppose that we collect a sample of N pedigrees with the ith pedigree having ni nuclear families, i=1, 2, …, N. For the jth nuclear family in the ith pedigree, assume that there are n1ij case-parent trios (consisting of an affected child and his/her parents) and n2ij discordant sib pairs (DSPs, being composed of one affected and one unaffected sibs), i=1, 2, …, N; j=1, 2, …, ni. Consider a diallelic marker locus with two alleles M1 and M2. Let Fij, Mij and Cijk denote the number of copies of M1 for the father, the mother and the kth child (affected or unaffected) in the jth nuclear family of the ith pedigree, respectively. As such, Fij, Mij and Cijk take possible values of 0, 1 or 2 for genotypes M2M2, M1M2 or M1M1. We also consider a disease susceptibility locus (DSL) with disease allele D and normal allele d. To distinguish the paternal and maternal alleles, we introduce four ordered genotypes D/D, D/d, d/D and d/d at the DSL, of which the allele before the slash is paternal and the allele after the slash is maternal. Further, the corresponding penetrances are denoted by f2, f10, f01 and f0, respectively. When the Mendel’s law holds, f10=f01, which means that there is no parent-of-origin effects; otherwise, there may be parent-of-origin effects at the DSL. The degree of imprinting is defined as I=(f10f01)/2, which is used to measure the imprinting effects. If association exists, the marker allele M1 is assumed to be in positive linkage disequilibrium (LD) with the disease. To this end, I>0 (I<0) suggests the presence of maternal (paternal) imprinting effect.16 Further, we assume mating symmetry in the parental generation,17, 18 that the population under study is homogeneous and that there are no maternally mediated effects at the DSL.

Existing PDT and PPAT statistics for complete pedigree data

We begin by describing the existing PDT (for testing for association) and PPAT (for the detection of parent-of-origin effects) statistics. For the jth nuclear family in the ith pedigree, let and , i=1, 2, …, N; j=1, 2, …, ni, where Tijk and NTijk represent the numbers of M1 being transmitted and not being transmitted from the heterozygous parents in each case-parent trio, respectively; Aijl and Uijl denote the numbers of M1 in the affected and unaffected sibs for each DSP, respectively; I{comparison statement} is 1 if the comparison statement is true and is 0 otherwise. Taking the summation of the above statistics over all the nuclear families and all the pedigrees, we obtain and , and then the PDT and PPAT statistics can be formulated as , where and are the unbiased estimates of the variances of s and sI under the null hypotheses of no association and no imprinting, respectively.7, 15

PDTI for complete pedigree data

PDT is a powerful method for detecting association. However, it does not take into account the information on parent-of-origin effects in the analysis, which may suffer from much power loss when imprinting effects are present. To this end, following the idea of constructing TDTI* for nuclear families,12 we propose the PDTI statistic for complete pedigrees in the following, which is based on the existing PDT, and two statistics PDTm and PDTp proposed in this article, to incorporate parent-of-origin effects into association study. The statistics PDTm and PDTp, the maternal and paternal versions of PDT, can be obtained by replacing sij involved in PDT by smij and spij, respectively, where

Tmijk and NTmijk (Tpijk and NTpijk) are the numbers of M1 being transmitted and not being transmitted from the heterozygous mother (father), respectively; Amijl and Umijl (Apijl and Upijl) are the numbers of the maternal (paternal) M1 allele in the affected and unaffected sibs for a DSP, respectively. The calculation of TmijkNTmijk (TpijkNTpijk) is direct, while the calculation of AmijlUmijl (ApijlUpijl) is a little more complicated as we need to determine which allele in each member of a DSP is maternal and which one is paternal. To do so, we need to simultaneously consider two members in each DSP together with their parents.

For easy description here, we ignore the subscripts for indexing the DSP, nuclear family and pedigree and simply use AU, AmUm and ApUp to replace AijlUijl, AmijlUmijl and ApijlUpijl, respectively. Note that AU=0 involved in PDT when two members in a DSP have the same genotypes, which does not provide information to association. As such, AmUm=0 and ApUp=0 under this situation. Furthermore, if the mating type is M1M2 × M1M2, and one of two members in a DSP is homozygous and the other is heterozygous (for example, the affected and unaffected sibs, respectively, have genotypes CA=M1M1 and CU=M1M2), then the allele M1 in the unaffected sib may come from either the father or the mother. Thus, we assume that both AmUm and ApUp are 0.5 for this situation, following the idea from Li et al.19 Similarly, both AmUm and ApUp are taken to be 0.5 for CA=M1M2 and CU=M2M2, −0.5 for CA=M1M2 and CU=M1M1 and CA=M2M2 and CU=M1M2. Table 1 shows the values of AmUm and ApUp based on all the 29 possible genotype combinations of a DSP and their parents, together with the corresponding conditional joint probabilities wr’s given that two children are a DSP, r=1,2,…,29. It is shown in Supplementary Information that E(AmUm)=0 and E(ApUp)=0 under the null hypothesis of no association.

Table 1 Values of AmUm and ApUp for all the 29 possible genotype combinations of a DSP and their parents, together with the corresponding conditional joint probabilities wr’s given that two children are a DSP

Let and . Thus, E(sm)=0 and E(sp)=0 under the null hypothesis of no association. As such, the following PDTm and PDTp statistics are valid in testing for association

where and are, respectively, the unbiased estimates of the variances of sm and sp under the null hypothesis of no association, which have taken into account the dependencies among all the case-parent trios, the dependencies among all the DSPs and the dependencies between the case-parent trios and the DSPs within each pedigree (see Supplementary Information). As can be seen from the variance estimates, both the PDTm and PDTp statistics asymptotically follow a standard normal distribution under the null hypothesis of no association when the sample size is large.

Based on PDTm, PDT and PDTp, the proposed PDTI statistic is formulated as

where αI is the prespecified significance level for the imprinting test PPAT, and zαI/2 is the upper quantile of the standard normal distribution corresponding to the probability of αI/2. The formula of the PDTI statistic demonstrates that, in the presence of paternal (maternal) imprinting effect, the PDTm (PDTp) statistic would be chosen to conduct the detection of association. Otherwise, PDT will be used to test for the association.

In the absence of association, the PDTm, PDT, PDTp and PPAT statistics asymptotically follow a standard normal distribution N(0, 1), respectively. However, they are not independent, and the covariance between PDTm and PPAT under the null hypothesis of no association can be estimated as

where , which is the unbiased estimate of the covariance between sm and sI under the null hypothesis of no association (see Supplementary Information). Also, the covariance between PDT and PPAT, and that between PDTp and PPAT, can be estimated in a similar way. Denote the estimates of the 2 × 2 variance–covariance matrices of the statistics (PDTm, PPAT), (PDT, PPAT) and (PDTp, PPAT) by and , respectively. Under the null hypothesis H0 of no association, the asymptotic distributions of these three pairs of statistics are bivariate normal distributions with zero mean vectors and the variance–covariance matrices given above. Let t be the observed value of PDTI and denote the P-value of the test for association by P=PH0(PDTI>|t|). Then, based on the asymptotic bivariate normal distributions, we obtain

where R(y)={y:−∞<y<−|t|}{y:|t|<y<∞}. Moreover, we have also developed a method on how to find the critical value of PDTI, which can be used to determine whether or not to reject the null hypothesis (see Supplementary Information).

Although we introduce the PDTI method under the assumption that M1 is in positive LD with the disease allele D, PDTI remains valid for the negative LD between M1 and D. Note that if M1 is in positive LD with D, the positive (negative) values of PPAT indicate maternal (paternal) imprinting effect. Conversely, if M1 is in negative LD with D (that is, M2 is positively associated with D), the positive (negative) values of PPAT show paternal (maternal) imprinting effect. However, similar to TDTI*, the PDTI statistic keeps unchanged irrespective of the positive or negative LD, which signifies the validity of PDTI for practical use (the proof omitted for brevity).

MCPDTI for incomplete pedigree data

In the case in which there are missing genotypes in pedigrees, we extend PDTI and propose MCPDTI by using MC sampling, which may recapture most information on missing genotypes based on the observed genotypes. The existing MCPDT and MCPPAT statistics are the MC versions of PDT and PPAT, respectively. Now, we turn to introduce the MCPDTm and MCPDTp statistics, the MC versions of PDTm and PDTp, respectively.

For easy description here, we delete the subscripts for indexing the nuclear family and pedigree. Let sm denote the contribution from a pedigree to the PDTm statistic in Equation (1). Let Go and Gu be the sets of the observed and unobserved (missing) genotypes in the pedigree, respectively, and S be the observed phenotypes (disease affection statuses). For the case with missing data, sm is unavailable, while its conditional expectation given Go can be estimated by smMC,7, 14 which is based on the MC simulation scheme and can be formulated as follows, , where Gux’s are independent samples randomly drawn from P(Gu|Go), x=1,2,…,X, which can be efficiently accomplished based on the peeling algorithm using the SLINK software.20 Subsequently, MCPDTm can be calculated by using Equation (1) with sm replaced by smMC. Similarly, MCPDTp can be given by replacing sp by spMC.

Note that if all the pedigrees are assumed to be drawn from a certain underlying population and S is treated as random, the expectations of smMC and spMC are 0 under the null hypothesis of no association. The idea of the corresponding proof is similar to Zhou et al.7 and Ding et al.,14 and hence we omit it here for brevity. As such, the MCPDTI statistic can be formulated by

Obviously, MCPDTI shares some common properties with PDTI. First of all, MCPDTI is not influenced by positive or negative LD between M1 and D. Next, MCPPAT and three MCPDT-type statistics are not independent. Hence, to clarify the asymptotic null distribution of MCPDTI, the derivation of their covariances among them is required. On the other hand, note that the MC scheme relies on marker allele frequencies. This means that MCPDTI needs the assumption of the population under study being homogeneous. The derivation of the critical value of MCPDTI is similar to that of PDTI (omitted here for brevity).

Results

Settings

In this section, a simulation study is conducted to investigate the performance of the proposed PDTI and MCPDTI statistics and to make the power comparison of PDTI and MCPDTI with the existing PDT and TDTI*. A homogeneous population is considered. We simulate one marker and one DSL with the recombination fraction being fixed at 0.001. Three sets of haplotype frequencies for haplotypes DM1, dM1, DM2 and dM2 are used to simulate the test powers: {0.13, 0.02, 0.12, 0.73}, {0.22, 0.03, 0.03, 0.72} and {0.23, 0.12, 0.02, 0.63}, with the allele frequencies of M1 being taken as 0.15, 0.25 and 0.35, respectively. The allele frequency of disease allele D is fixed at 0.25. For studying the empirical size of the proposed test statistics under the null hypothesis of no association, four haplotype frequencies for haplotypes DM1, dM1, DM2 and dM2 are just taken as the products of two allele frequencies on each haplotype, respectively. For example, if the allele frequency of M1 is 0.15, then the frequency of haplotype DM1 is 0.25 × 0.15=0.0375. However, the recombination fraction between the marker and the DSL is still fixed to be 0.001. Further, note that no association means no parent-of-origin effects. To this end, we simulate the empirical size of the proposed test statistics under no association and no parent-of-origin effect model (see below).

We consider three sets of two homozygote penetrances f2 and f0, {0.39, 0.26}, {0.44, 0.24} and {0.48, 0.22}, which are very similar to those in Ding et al.14 For each set of homozygote penetrances, we further consider three parent-of-origin effect models by setting the various values of f10 and f01: no parent-of-origin effect model, incomplete parent-of-origin effect model, and complete parent-of-origin effect model. Therefore, for each parent-of-origin-effect model, three sets of haplotype frequencies and three sets of homozygote penetrances lead to nine simulation settings, which are listed in Table 2.

Table 2 Simulation settingsa

Figure 1 gives three types of common pedigree structures used in our simulation study: (a) two-generation family with 5 individuals, (b) three-generation family with 10 individuals, and (c) four-generation family with 12 individuals. To investigate the performance of the proposed tests under different sample sizes, 50 and 100 pedigrees under each pedigree structure are simulated in each replicate, which leads to the total sample size being 150 and 300, respectively. Note that here a pedigree was ascertained if at least one of the nonfounders was affected. For MCPDTI, 50 MC samples of missing genotypes are generated for each replicate using the SLINK software.20 Both the true marker allele frequencies or those estimated from the genotyped founders in each replicate are used in the MC sampling where applicable.

Figure 1
figure 1

Pedigree structure used for the simulation studies. (a) Nuclear family; (b) three-generation family; (c) four-generation family. Genotypes of individual 1 in nuclear family, individuals 1, 4 and 5 in three-generation family and individuals 1 and 3 in four-generation family are assumed to be missing for the analysis based on incomplete data.

To evaluate the performance of our proposed tests and compare PDTI and MCPDTI with the existing PDT and TDTI* tests, we consider the following five statistics: PDTI, PDT, TDTI*, MCPDTIT, and MCPDTIE. The first three statistics PDTI, PDT and TDTI* are obtained based on complete pedigree data without any missing genotypes. PDTI can be regarded as a golden standard for the association study incorporating parent-of-origin effects. Note that PDT does not take account of parent-of-origin effects, and hence its powers will not be influenced by different degrees of imprinting (that is, different parent-of-origin effect models). As such, here PDT can be considered as a reference line. Note that TDTI* can only accommodate two-generation nuclear family data. So, in our simulation, TDTI* is based on a sample of nuclear families, where only one nuclear family is randomly chosen from a pedigree. Further, MCPDTIT and MCPDTIE are on the basis of incomplete pedigree data with genotypes of individual 1 in two-generation families, individuals 1, 4 and 5 in three-generation pedigrees and individuals 1 and 3 in four-generation pedigrees being removed from complete pedigree data; MCPDTIT and MCPDTIE are based on the true and estimated marker allele frequencies, respectively.

Ten thousand replicates under each simulation setting mentioned above are simulated to assess the empirical size and powers of the proposed tests at the significance levels of 5% and 1%. The corresponding significance level of the imprinting test in the first stage is taken to be the same as the association test in the second stage. The size and power results at the 1% significance level can be seen in Supplementary Table S1 and Supplementary Figure S1, respectively.

To have a better understanding of the performance of PDTI and MCPDTI statistics, we conduct two additional simulation studies as shown in Supplementary Information. One is based on the settings with the less frequency of disease allele D (being fixed at 0.1) and the family sample consisting of 150 and 200 pedigrees under each pedigree structure, which leads to 450 and 600 pedigrees in total, respectively. For detailed simulation settings, see Supplementary Table S2. The other simulation study is based on the settings with the frequency of disease allele D being taken as 0.1, 0.15 and 0.2 under Hardy–Weinberg disequilibrium at a candidate locus, where the inbreeding coefficient ρ is taken to be 0.03 and 0.05. The family sample in each replicate consists of 100 pedigrees under each pedigree structure, where the number of pedigrees is 300 in total. The detailed simulation settings can be found in Supplementary Tables S3 and S4. The simulation results are shown in Supplementary Tables S5–S8 and Supplementary Figures S2–S5.

Size and powers of PDTI, PDT, TDTI* and MCPDTI

Table 3 displays the empirical type I error rates of PDTI, PDT, TDTI* and MCPDTI under nine simulation settings based on 150 and 300 pedigrees at the nominal 5% significance level, respectively. For 10 000 simulation replicates, the s.e. for the type I error rate estimate is (0.05 × 0.95/10 000)1/2=2.18 × 10−3 when the nominal level is fixed at 5%. Note that in Table 3, there are altogether 54 observed type I error rates corresponding to the proposed tests (PDTI, MCPDTIT and MCPDTIE), and 2 out of the 54 are greater than the upper limit of the interval (0.04564, 0.05436), that is, 0.05±2 × s.e. Thus, generally speaking, the tests maintain the type I error rates satisfactorily.

Table 3 Type I error rates (in percentage (%)) of PDTI, PDT, TDTI* and MCPDTI under nine simulation settings based on 150 and 300 pedigrees at nominal 5% significance level, respectively

Figure 2 gives the simulated powers of PDTI, MCPDTIT, MCPDTIE, PDT and TDTI* under nine simulation settings based on 150 and 300 pedigrees at the 5% significance level for no, incomplete and complete parent-of-origin effect models, respectively. We can see from each subplot of the figure that MCPDTI, by the MC sampling scheme, can recapture most information on the missing genotypes and has similar performance to PDTI based on complete pedigree data. MCPDTI based on the estimated marker allele frequencies (MCPDTIE) almost has the same power as that based on the true marker allele frequencies (MCPDTIT). When there are no parent-of-origin effects, with the haplotype frequencies, penetrances and sample size being fixed (for each simulation setting in Figures 2a and d), the powers of PDTI, MCPDTIT and MCPDTIE are very close to that of PDT and are all much higher than that of TDTI*, which only accommodates nuclear family data. However, for each simulation setting with incomplete or complete parent-of-origin effects (Figures 2b, c, e or f), PDTI, MCPDTIT and MCPDTIE are all more powerful than both PDT and TDTI*. The power of PDT based on pedigrees is much larger than TDTI*, even though PDT does not take account of imprinting but TDTI* does.

Figure 2
figure 2

Powers of PDTI, MCPDTI, PDT and TDTI* under nine simulation settings at the 5% significance level. (a) One hundred and fifty pedigrees and no parent-of-origin effect model; (b) 150 pedigrees and incomplete parent-of-origin effect model; (c) 150 pedigrees and complete parent-of-origin effect model; (d) 300 pedigrees and no parent-of-origin effect model; (e) 300 pedigrees and incomplete parent-of-origin effect model; (f) 300 pedigrees and complete parent-of-origin effect model.

Furthermore, when the sample size is fixed for no, incomplete and complete parent-of-origin effect models (among Figures 2a, b and c or among Figures 2d, e and f), the power of PDT (the reference base) is almost not affected by the degree of imprinting. However, PDTI, MCPDTI and TDTI* under the incomplete and complete parent-of-origin effect models are more powerful, compared with no parent-of-origin effect model. The power difference between our proposed test PDTI/MCPDTIT/MCPDTIE and PDT is larger and the gap in power between PDT and TDTI* is smaller, by comparing the power results under the complete parent-of-origin effect model with the incomplete model. The powers based on 300 pedigrees are higher than those based on 150 pedigrees (Figure 2; d vs a, e vs b and f vs c), and the corresponding power improvement of PDTI and MCPDTI is more significant compared with that of PDT, as expected.

It is also shown in each subplot of Figure 2 that, with the sample size and the penetrances being fixed, the powers of all the five test statistics become larger when the simulation settings change from settings 1 to 3 (settings 4 to 6 or settings 7 to 9), due to the increase of the LD coefficient for three sets of haplotype frequencies. As for settings 1, 4 and 7 (or settings 2, 5 and 8/settings 3, 6 and 9), which are based on the same set of haplotype frequencies but different relative risks of two homozygote penetrances (f2/f0=1.50, 1.83 and 2.18), all the statistics have higher and higher power.

Discussion

In this article, we have extended PDT and MCPDT to PDTI and MCPDTI to incorporate parent-of-origin effects for association analysis based on general pedigree data. Simulation study is carried out to investigate the validity and performance of the proposed PDTI and MCPDTI tests and to compare PDTI and MCPDTI with the existing PDT and TDTI* for various sets of haplotype frequencies, parent-of-origin effect models, significance levels and sample sizes. Simulation results show that the proposed PDTI and MCPDTI control the size well under the null hypothesis of no association.

As for power comparison of PDTI and MCPDTI with PDT (ignoring the information on imprinting effects) and TDTI* (only accommodating two-generation nuclear families), under no parent-of-origin effect model, PDTI and MCPDTI have similar performance, which are comparable to PDT. Under incomplete parent-of-origin effect model, PDTI and MCPDTI, by using the information on the parent-of-origin effects and the extended pedigree data, outperform PDT and TDTI*. Note that in our simulation study, MCPDTI for incomplete pedigree data is even more powerful than PDT for complete pedigree data, although the power improvement is not so significant. However, as the degree of imprinting increases, under the complete parent-of-origin effect model, the powers of PDTI and MCPDTI are much higher than that of PDT and TDTI*. Moreover, with the sample size increasing, the improvement in power of our proposed methods over PDT is more distinct. Even though the frequency of the disease allele is smaller (being taken as 0.1), our methods are still more powerful than PDT and TDTI* in the presence of imprinting effects (see Supplementary Information). On the other hand, MCPDTI can recapture most of the missing genotypes based on the observed genotypes by the MC sampling and estimation. Our software MCPDTI is freely available at http://www.echobelt.org/web/UploadFiles/MCPDTI.html, which is implemented in R (http://www.r-project.org/).

In the calculation process of the proposed PDTI and MCPDTI test statistics, we first divide each extended pedigree in the collected sample into several nuclear families (two-generation families). For each nuclear family, PDTI and MCPDTI will use all the case-parent trios and all the DSPs in it. As such, the alleles of both parents will repeatedly be used in different case-parent trios, the alleles of some affected children may repeatedly be used in both case-parent trios and DSPs and the alleles of some unaffected children may repeatedly be utilized in different DSPs. Further, for different nuclear families from this extended pedigree, there may also be the repeated use of the alleles of some family members who can be the children of the elder generation and the parents of the younger generation. Thus the repeated use of these family members mentioned above will lead to the dependencies among the case-parent trios, the dependencies among the DSPs and the dependencies between the case-parent trios and the DSPs within each pedigree. However, it should be emphasized that PDTI and MCPDTI have taken account of all the dependencies (see Supplementary Information).

Although our simulation study is conducted under settings with positive LD coefficient between marker allele M1 and disease allele D, PDTI and MCPDTI are still suitable for situations where M1 is in negative LD with D. On the other hand, like other methods, the proposed PDTI and MCPDTI methods have their own limitations. PDTI is not applicable to the situation where there are some genotypes missing, and thus we recommend MCPDTI for this case to conduct association analysis. Moreover, MCPDTI needs the assumption of the underlying population being homogeneous. However, it still controls the size well if the violation of Hardy–Weinberg equilibrium is not so severe in the population (see Supplementary Information). In addition, although our simulation study is conducted by fixing the recombination fraction at 0.001, the proposed PDTI and MCPDTI are still more powerful than the existing PDT and TDTI* for the complete LD case and when the recombination fraction is 0 (see Supplementary Tables S9 and S10). Finally, we should mention that, in the data collection process of our simulation study, the requirement for a pedigree to be included is at least one affected nonfounder, while it is not necessary to require that each nuclear family within a large pedigree has at least one affected child. In practice, the pedigrees with more affected individuals are more likely to be recruited. This ascertainment bias due to the difference in the number of affected individuals in pedigree is not considered in the present study. How the ascertainment bias affects the proposed test statistics is our future work.