Test of rare variant association based on affected sib-pairs

Sha, Qiuying; Zhang, Shuanglin

doi:10.1038/ejhg.2014.43

Download PDF

Article
Published: 26 March 2014

Test of rare variant association based on affected sib-pairs

Qiuying Sha¹ &
Shuanglin Zhang¹

European Journal of Human Genetics volume 23, pages 229–237 (2015)Cite this article

620 Accesses
5 Citations
Metrics details

Subjects

Abstract

With the development of sequencing techniques, there is increasing interest to detect associations between rare variants and complex traits. Quite a few statistical methods to detect associations between rare variants and complex traits have been developed for unrelated individuals. Statistical methods for detecting rare variant associations under family-based designs have not received as much attention as methods for unrelated individuals. Recent studies show that rare disease variants will be enriched in family data and thus family-based designs may improve power to detect rare variant associations. In this article, we propose a novel test to test association between the optimally weighted combination of variants and trait of interests for affected sib-pairs. The optimal weights are analytically derived and can be calculated from sampled genotypes and phenotypes. Based on the optimal weights, the proposed method is robust to the directions of the effects of causal variants and is less affected by neutral variants than existing methods are. Our simulation results show that, in all the cases, the proposed method is substantially more powerful than existing methods based on unrelated individuals and existing methods based on affected sib-pairs.

Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits

Article Open access 01 April 2021

An evaluation of approaches for rare variant association analyses of binary traits in related samples

Article Open access 04 February 2021

Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data

Article Open access 25 May 2023

Introduction

Recent studies show that the large number of disease-associated variants identified through genome-wide association studies account for only a small portion of the presumed phenotypic variation.¹ One of the potential sources of missing heritability is the contribution of rare variants.^{2, 3, 4, 5, 6, 7} The recent advances of sequencing technology have made directly testing rare variants possible.^{8, 9} Therefore, there is increasing interest to detect associations between rare variants and complex traits.

Recently, several statistical methods to detect associations between rare variants and complex traits have been developed for unrelated individuals. These methods can be roughly divided into three groups: burden tests, quadratic tests, and combined tests. Burden tests include the cohort allelic sums test,¹⁰ the combined multivariate and collapsing method,¹¹ the weighted sum statistic (WSS),¹² the variable minor allele frequency (MAF) threshold method,¹³ and the cumulative minor-allele test¹⁴ among others. Burden tests implicitly assume that all the rare variants are causal and the directions of the effects are all the same. If these assumptions are true, burden tests can be powerful tests; otherwise, burden tests can perform poorly.^{15, 16, 17, 18} Quadratic tests include C-alpha test,¹⁹ sequence kernel association test,¹⁵ and the test for Testing the effects of the Optimally Weighted combination of variants (TOW).¹⁷ Quadratic tests also include adaptive weighting methods^{20, 21, 22, 23, 24} since, as pointed out by Derkach et al,¹⁸ adaptive weighting methods are operationally similar to quadratic tests. Quadratic tests are robust to the directions of the effects of causal variants and are less affected by neutral variants than burden tests are. If most of the rare variants are causal and the directions of the effects of causal variants are all the same, then burden tests can outperform quadratic tests; otherwise, quadratic tests perform better. To increase the robustness of a test, Derkach et al and Lee et al proposed combined tests that combine information from burden and quadratic tests aiming to have advantages of both burden and quadratic tests.^{16, 18}

All of the aforementioned methods are for unrelated individuals. For any type of study design, the statistical power will be improved if rare variants can be enriched in the samples. If one parent has a copy of a rare allele, half of the offspring are expected to carry it, and hence, variants that are rare in the general population could be very common in certain families.²⁵ Therefore, family-based designs may have an important role in rare variant association studies. More recently, a couple of family-based rare variant association methods for quantitative traits^{26, 27} and for qualitative traits^{28, 29} have been developed.

In this article, based on affected sib-pair data, we propose a test for Testing the effects of the Optimally Weighted combination of variants (TOW-sib). TOW-sib is based on the score test for testing the optimally weighted combination of variants derived from the retrospective likelihood of affected sib-pairs, unrelated controls, and possible unrelated cases. The optimal weights are analytically derived and can be calculated from sampled genotypes and phenotypes. Based on the optimal weights, TOW-sib is robust to the directions of the effects of causal variants and is less affected by neutral variants than existing tests are. We use extensive simulation studies to compare the performance of the proposed method with that of existing methods based on unrelated individuals^{12, 17} and existing methods based on affected sib-pairs.²⁸ Our simulation results show that, in all the cases, the proposed method is substantially more powerful than existing methods based on either unrelated individuals or affected sib-pairs.

Materials and Methods

Consider a sample of n_s affected sib-pairs, n_a unrelated cases, and n_c unrelated controls. Each individual has been genotyped at M variants in a genomic region. Denote g_ji=(g_ji1,...,g_jiM)^T, g_ai=(g_ai1,…,g_aiM)^T, and g_ci=(g_ci1,…,g_ciM)^T as the genotypes of the j^th individual in the i^th sib-pair, the i^th case, and the i^th control, respectively, where g_jim, g_aim, g_cim∈{0,1,2} are the number of minor alleles. Let , , and denote the combinations of genotypic scores at the M variants of the i^th sib-pair, the i^th case, and the i^th control, respectively, where w=(w₁,...,w_M) are weights and their values will be decided later. Denote the disease status of an individual by D with D=0 indicating a normal, whereas D=1 indicating a diseased individual.

The retrospective likelihood is given by

where and represent all possible genotype pair for a sib-pair and g* represents all possible genotypes for an individual. Choose g₀=(0,…,0) as a baseline genotype. Let r(g) be the relative risk of genotype g to the baseline genotype. Following Schaid,³⁰ we use a log-linear model to model the relative risk, ie, r(g)=e^xβ, with x representing the combination of genotypic scores of the genotype g. Denote the risk of an individual with the baseline genotype as Pr(D=1|g₀)=e^α. Then, the retrospective likelihood is given by

where and represent the combinations of genotypic scores of the genotypes and , respectively, and x* represents the combination of genotypic scores of the genotype g*.

In Appendix A, we have shown that, under the assumption that the M variants are independent (our proposed test is still valid if this assumption is not true), the score test statistic to test the null hypothesis H₀:β=0 is given by

where , , and â are the maximum likelihood estimates (MLEs) of p_m and under the null hypothesis, p_m is the MAF at the m^th variant, and . Under the null hypothesis, the likelihood function becomes

Based on L₀, has no explicit expression. Using the joint distribution of genotypes of a sib-pair given by Table 1, we can construct an expectation-maximization algorithm to calculate (see Appendix B). We cannot estimate α based on L₀, because L₀ does not contain α. We propose to estimate α based on the full likelihood function

Table 1 The joint distribution of genotypes of a sib-pair

Full size table

Based on L_full, the MLE of under the null hypothesis is . Using this estimate of a, U can be written as . Let , N=6n_s+2n_a+2n_câ², , and w=(w₁,…,w_M)^T. Then,

T(w₁,…,w_M) reaches its maxim when w=v⁻¹u. We define the statistic of the test for Testing the effect of an Optimally Weighted combination of variants for sib-pair data (TOW-sib) as

We use a special permutation test to evaluate P-values of TOW-sib. For each permutation, we have the following steps: (1) permute the multi-variant genotypes and get the permuted genotypes . (2) In the i^th sib-pair, given , we generate variant by variant according to the conditional distribution Pr(g₂|g₁) from Table 1. (3) Calculate , the value of T_TOW−sib based on the permuted genotypes , , and . We generate under the assumption that the M variants are independent. When the M variants are in linkage disequilibrium (LD), T_TOW−sib and may have different variances, although they have the same mean. In order to make T_TOW−sib and have the same mean and same variance, we standardize T_TOW−sib such that T_{TOW−sib−ST}=(T_TOW−sib−μ_TOW−sib)/σ_TOW−sib, where μ_TOW−sib and are the estimates of the mean and variance of T_TOW−sib (see Appendix C on how to calculate μ_TOW−sib and ). Suppose we perform B times of permutations. Let denote the value of T_{TOW−sib−ST} based on data of the b^th permutation (b=0 denotes the original data). Then, the P-value of the test is given by .

For a simulation study with R replicates, the above procedure will be rather computationally expensive. In our simulation studies, we use the pooling permutation method proposed by Guo and Lin to evaluate P-values.³¹ In the pooling permutation method, permuted samples from all the replicates are pooled together to form a joint sample from the null distribution. Suppose that we have R replicates and we perform B permutations for each replicate. Let T_{TOW−sib−ST}^(b,r) denote the value of T_{TOW−sib−ST} based on data of the b^th permutation of the r^th replicate (b=0 denotes the original data). Then, the P-value of the test in the r^th replicate is given by

As the permutation samples are pooled across all replicates to form a sample from the null, B can be set to be much smaller than the situation when only one sample is analyzed.

We compare the performance of the proposed method with three existing methods: WSS,¹² sibpair-based weighted sum statistic (SPWSS),²⁸ and TOW.¹⁷ WSS and TOW are based on unrelated cases and controls, whereas SPWSS is based on affected sib-pairs, unrelated cases, and unrelated controls.

Simulation

The empirical Mini-Exome genotype data provided by the genetic analysis workshop 17 are used for simulation studies. This data set contains genotypes of 697 unrelated individuals on 3205 genes. The genotypes of the genetic analysis workshop 17 data set are extracted from the sequence alignment files provided by the 1000 Genomes Project for their pilot3 study (http://www.1000genomes.org). We choose four genes: ELAVL4 (gene1), MSH4 (gene2), PDE4B (gene3), and ADAMTS4 (gene4) with 10, 20, 30, and 40 variants, respectively. We merge the four genes to form a super gene (Sgene) with 100 variants with 86 rare variants (MAF<0.01) and 14 common variants (MAF≥0.01). We choose Sgene because the distributions of MAFs in the 100 variants in Sgene and in the 24 487 variants in all the 3205 genes are very similar.¹⁷ In our simulation studies, we generate genotypes based on the genotypes of 697 individuals in Sgene. We use the program fastPHASE to infer haplotypic phase for the 697 individuals and calculate haplotype frequencies.³² To generate the genotype of an individual, we generate two haplotypes according to the haplotype frequencies. To obtain the genotypes of a family, we first generate genotypes of parents. Then the genotypes of children are generated from parental haplotypes by random transmission. To generate a qualitative disease affection status, we use a liability threshold model based on a continuous phenotype (quantitative trait). An individual is defined to be affected if the individual’s phenotype is at least one standard deviation larger than the phenotypic mean. This yields a prevalence of 16% for the simulated disease in the general population. In the following, we describe how to generate a quantitative trait.

Under the null hypothesis, we generate trait values for unrelated individuals according to the standard normal distribution. For a family with m children, let Y₁=(y_F,y_M) and Y₂=(y₁,y₂,⋯,y_m) denote the trait values of the parents and the m children in a family, respectively. Assume that (Y₁,Y₂) follows a multivariate normal distribution with a mean vector of zero and variance-covariance matrix of, where , , and

This variance-covariance matrix indicates that the parents in each family are independent, and the correlation coefficient between a parent and a child or between two children is constant, ρ (in this study, ρ=0.2). To generate trait values of all members in each family, we first generate the trait value of a parent by using a standard normal distribution. Then, trait values of the children are generated by a normal distribution with a mean vector and a variance–covariance matrix .

Under the alternative hypothesis, we choose n_cau rare variants (MAF<1%) as causal variants. The value of n_cau is determined by p_cau, the percentage of causal variants in rare variants. Let pp denote the percentage of protective variants in causal variants, then the number of protective variants and the number of risk variants are n_p=n_cau·pp and n_r=n_cau·(1−pp), respectively. For the j^th member in the i^th family, let and denote the genotypic scores of the risk variant and the protective variant, respectively. Assume that all causal variants have the same heritability. Then the disease model is given by , where and are coefficients and their values depend on the total heritability, and ɛ_ij is the trait value under the null hypothesis.

To generate affected sib-pairs, we generate families with two children. We keep generating families with two children until we have generated enough families with two affected children.

Results

In simulation studies, P-values are estimated using a pooling permutation method in which permuted samples from all the replicates are pooled together to form a joint sample from the null distribution.³¹ In each replicate, we perform 20 permutations. Type I error rates are evaluated using 10 000 replicated samples, whereas powers are evaluated using 500 replicated samples.

For type I error evaluation, we consider different haplotype structures (different genes), different sample sizes, different designs, and different significance levels. For 10 000 replicated samples, the 95% confidence intervals for type I error rates of nominal levels 0.05, 0.01, and 0.001 are (0.046, 0.054), (0.008, 0.012), and (0.0004, 0.0016), respectively. The estimated type I error rates of the proposed test are summarized in Tables 2 and 3. As shown by these tables, all the estimated type I error rates are within the 95% confidence intervals, which indicates that the proposed test is valid.

Table 2 Estimated type I error rates of TOW-sib for the design of affected sib-pairs and unrelated controls based on 10 000 replicated samples

Full size table

Table 3 Estimated type I error rates of TOW-sib for the design of affected sib-pairs, unrelated cases, and unrelated controls based on 10 000 replicated samples

Full size table

For fixed number of total cases and fixed number of total individuals, power comparisons for power as a function of the number of affected sib-pairs are given in Figure 1. As shown by Figure 1, the power of TOW-sib increases with the increase of the number of affected sib-pairs. With the increase of the number of affected sib-pairs, the power of SPWSS increases if the number of affected sib-pairs is less than 20% of total number of cases and the power of SPWSS decreases otherwise. Therefore, in the following discussion, the number of affected sib-pairs is equal to the half of total number of cases in the design for TOW-sib and the number of affected sib-pairs is equal to 20% of total number of cases in the design for SPWSS. The powers of TOW and WSS do not have relation with the number of affected sib-pairs. In almost all the cases, TOW-sib is the most powerful test. When the percentage of causal variants is small (10%), SPWSS is more powerful than TOW and WSS if the number of affected sib-pairs is between 10 and 45% of the total number of cases. When the percentage of causal variants is large (50%), SPWSS is the least powerful test.

As shown by power comparisons for power as a function of heritability and for power as a function of the percentage of protective variants (Figures 2 and 3), TOW-sib is the most powerful test in all the cases. When the percentage of causal variants is small (10%), SPWSS is more powerful than TOW and WSS. When the percentage of causal variants is large (50%), SPWSS and TOW have similar power and are less powerful than WSS if the percentage of protective variants is small and are more powerful than WSS if the percentage of protective variants is large.

Figure 4 shows power comparisons for power as a function of the percentage of causal variants. This figure shows that TOW-sib is the most powerful test in all the cases and the power of TOW-sib is not affected much by the percentage of causal variants. With the increase of the percentage of causal variants, the powers of WSS and TOW increase, whereas the power of SPWSS decreases. It is easy to understand that the power increases with the increase of the percentage of causal variants because larger percentage of causal variants or smaller percentage of neutral variants means smaller noise level. The reason of decrease in power of SPWSS with the increase of the percentage of causal variants probably is that it is easier to estimate weights when the percentage of causal variants is smaller. We also conduct a set of simulations to compare the powers for different values of ρ. The results (Supplementary Figure 1) show that the power comparisons have similar patterns for different values of ρ.

In summary, TOW-sib is the most powerful test in all the cases. Among other three tests: WSS, SPWSS, and TOW, none is consistently more powerful than the other two.

Discussion

There is increasing interest to detect associations between rare variants and complex traits. Recently, several statistical methods for detecting rare variant associations by jointly considering multiple variants in a genomic region have been developed for unrelated individuals. However, statistical methods for detecting rare variant associations under family-based designs have not received as much attention as methods for unrelated individuals, although family-based designs have been shown to improve power to detect rare variants.^{28, 29} Motivated by the facts that rare disease variants will be enriched in family data³³ and a large number of affected sib-pairs for a variety of diseases has been collected by traditional linkage studies, we develop TOW-sib to detect associations between the optimal combination of rare variants in a genomic region and complex traits based on affected sib-pairs and unrelated individuals. TOW-sib is robust to the directions of the effects of causal variants and is also relatively robust to the number of neutral variants. The proposed method does not require a MAF filtering threshold and can be applied to genomic regions that contain both rare and common variants. Our simulations demonstrated that TOW-sib using affected sib-pairs can be dramatically more powerful than the methods based on unrelated individuals and the existing methods based on affected sib-pairs.

Although TOW-sib is derived under the assumption that variants are independent, our simulation results show that TOW-sib is still a valid test when variants are in LD. Our simulations for type I error evaluation are based on the LD structures of genes 1–4 and, in each gene, there are variants in strong LD (Supplementary Tables 1–4). The correct type I error rates of TOW-sib in our simulations (Tables 2 and 3) indicate that this test is valid even if variants are in LD.

The current version of TOW-sib cannot adjust for covariates. It is possible to extend TOW-sib to be able to adjust for covariates. Denote z_ji, z_ai, and z_ci as the covariates of the j^th individual in the i^th sib-pair, the i^th cases, and the i^th controls, respectively. With covariates, the retrospective likelihood can be written as

Let , where x represents the combination of genotypic scores of the genotype g and z denotes covariates. Based on this likelihood, we can derive a score test statistic. However, the details of adjusting for covariates in TOW-sib need further investigation.

TOW-sib uses the optimal data-driven weights. TOW-sib belongs to quadratic tests and thus is robust to the directions of the effects of causal variants. We can use other weights. For example, in the score test statistic T(w₁,…,w_M), we can use the weights suggested by Madsen and Browning,¹² that is, , where p_m is the estimated MAF with pseudo-counts at the m^th variant. We call the score test T(w₁,…,w_M) with WSS-sib. WSS-sib belongs to burden tests. When most of the rare variants are causal and the directions of the effects of causal variants are all the same, WSS-sib can outperform TOW-sib; otherwise, TOW-sib should outperform WSS-sib. To increase the robustness of the tests, we can also construct combined tests by combining information from TOW-sib and WSS-sib. One thing we want to make clear is the term ‘optimal weight’. The optimal weight in this paper only means that the selected weight makes the score test statistic maximum, it does not mean that the selected weight makes the score test to have the maximum power.

In this study, we estimate based on the full likelihood. We can also use other estimates of a. Different estimates do not affect type I error, but do affect power. Our simulations (results not shown) show that the MLE of a based on the full likelihood is a good choice. We compare our proposed method with two methods based on the case/control design to see if the affected sib-pair design is more powerful than the case/control design. This is our main purpose. We also compare our proposed method with one of the existing methods that are applicable to the affected sib-pair design. Although several methods^{28, 29} developed recently are applicable to the affected sib-pair design, we only choose SPWSS²⁸ to compare with because SPWSS is most relevant to our proposed method.

References

McCarthy MI, Abecasis GR, Cardon LR et al: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008; 9: 356–369.
Article CAS Google Scholar
Manolio TA, Collins FS, Cox NJ et al: Finding the missing heritability of complex diseases. Nature 2009; 461: 747–753.
Article CAS Google Scholar
Marini NJ, Gin J, Ziegle J et al: The prevalence of folate-remedial MTHFR enzyme variants in humans. Proc Natl Acad Sci USA 2008; 105: 8055–8060.
Article CAS Google Scholar
Ji W, Foo JN, O'Roak BJ et al: Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet 2008; 40: 592–599.
Article CAS Google Scholar
Cohen JC, Pertsemlidis A, Fahmi S et al: Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma lowdensity lipoprotein levels. Proc Natl Acad Sci USA 2006; 103: 1810–1815.
Article CAS Google Scholar
Nejentsev S, Walker N, Riches D, Egholm M, Todd JA : Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 2009; 324: 387–389.
Article CAS Google Scholar
Zhu X, Feng T, Li Y, Lu Q, Elston RC : Detecting rare variants for complex traits using family and unrelated data. Genet Epidemiol 2010; 34: 171–187.
Article Google Scholar
Andrés AM, Clark AG, Shimmin L, Boerwinkle E, Sing CF, Hixson JE : Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genetic Epi 2007; 31: 659–671.
Article Google Scholar
Metzker ML : Sequencing technologies – the next generation. Nat Rev Genet 2010; 11: 31–46.
Article CAS Google Scholar
Morgenthaler S, Thilly WG : A strategy to discover genes that carry multiallelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res 2007; 615: 28–56.
Article CAS Google Scholar
Li B, Leal SM : Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 2008; 83: 311–321.
Article CAS Google Scholar
Madsen BE, Browning SR : A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 2009; 5: e1000384.
Article Google Scholar
Price AL, Kryukov GV, de Bakker PI et al: Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet 2010; 86: 832–838.
Article Google Scholar
Zawistowski M, Gopalakrishnan S, Ding J, Li Y, Grimm S, Zollner S : Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am J Hum Genet 2010; 87: 604–617.
Article CAS Google Scholar
Wu M, Lee S, Cai T, Li Y, Boehnke M, Lin X : Rare variant association testing for sequencing data using the sequence kernel association test (SKAT). Am J Hum Genet 2011; 89: 82–93.
Article CAS Google Scholar
Lee S, Emond MJ, Bamshad MJ et al: Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet 2012; 91: 224–237.
Article CAS Google Scholar
Sha Q, Wang X, Wang X, Zhang S : Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet Epidemiol 2012; 36: 561–571.
Article Google Scholar
Derkach A, Lawless J, Sun L : Robust and powerful tests for rare variants using Fisher’s method to combine evidence of association from two or more complementary tests. Genetic Epi 2012; 37: 110–121.
Article Google Scholar
Neale BM, Rivas MA, Voight BF et al: Testing for an unusual distribution of rare variants. PLoS Genet 2011; 7: e1001322.
Article CAS Google Scholar
Han F, Pan W : A data-adaptive sum test for disease association with multiple common or rare variants. Hum Hered 2010; 70: 42–54.
Article Google Scholar
Hoffmann TJ, Marini NJ, Witte JS : Comprehensive approach to analyzing rare genetic variants. PLoS One 2010; 5: e13584.
Article Google Scholar
Lin D-Y, Tang Z-Z : A general framework for detecting disease associations with rare variants i n sequencing studies. Am J Hum Genet 2011; 89: 354–367.
Article CAS Google Scholar
Yi N, Zhi D : Bayesian analysis of rare variants in genetic association studies. Genet Epidemiol 2011; 35: 57–69.
Article Google Scholar
Sha Q, Wang S, Zhang S : Adaptive clustering and adaptive weighting methods to detect disease associated rare variants. Eu J Hum Genet 2013; 21: 332–337.
Article CAS Google Scholar
Shi G, Rao D : Optimum designs for next-generation sequencing to discover rare variants for common complex disease. Genetic Epidemiol 2011; 35: 572–579.
Article Google Scholar
Fang S, Sha Q, Zhang S : Two adaptive weighting methods to test for rare variant associations in family-based designs. Genet Epidemiol 2012; 36: 499–507.
Article Google Scholar
Liu D, Leal S : A unified framework for detecting rare variant quantitative trait associations in pedigree and unrelated individuals via sequence data. Hum Hered 2012; 73: 105–122.
Article CAS Google Scholar
Feng T, Elston R, Zhu X : Detecting rare and common variants for complex traits: sibpair and odds ratio weighted sum statistics (SPWSS, ORWSS). Genet Epidemiol 2011; 35: 398–409.
Article Google Scholar
Zhu Y, Xiong M : Family-based association studies for next-generation sequencing. Am J Hum Genet 2012; 90: 1028–1045.
Article CAS Google Scholar
Schaid DJ : General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol 1996; 13: 423–449.
Article CAS Google Scholar
Guo W, Lin S : Generalized linear modeling with regularization for detecting common disease rare haplotype association. Genet Epidemiol 2009; 33: 308–316.
Article Google Scholar
Scheet P, Stephens M : A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 2006; 78: 629–644.
Article CAS Google Scholar
Feng T, Zhu X : Genome-wide searching of rare genetic variants in WTCCC data. Hum Genet 2010; 128: 269–280.
Article Google Scholar

Download references

Acknowledgements

Research reported in this article was supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number R03 HG006155. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The Genetic Analysis workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. Preparation of the Genetic Analysis Workshop 17 Simulated Exome Data Set was supported in part by NIH R01 MH059490 and used sequencing data from the 1000 Genomes Project (http://www.1000genomes.org).

Author information

Authors and Affiliations

Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
Qiuying Sha & Shuanglin Zhang

Authors

Qiuying Sha
View author publications
You can also search for this author in PubMed Google Scholar
Shuanglin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuanglin Zhang.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Supplementary Information accompanies this paper on European Journal of Human Genetics website

Supplementary information

Supplementary Information (DOC 80 kb)

Appendices

Appendix A

Score Test Statistic

Using notations in the Method section, from Equation (1), the log retrospective likelihood is given by

Then,

Let P=(p₁,…,p_M)^T and . Note that and

for m=1,…,M. We have

and

Similarly, we have

Let , , U*=(U,0,0)^T denote the score vector, and I denote the information matrix. Then, the score test statistic is given by

Appendix B

Expectation-maximization Algorithm to Estimate Allele Frequency Based on Sib-pairs and Unrelated Individuals

Consider a variant with two alleles. Let B denote the minor allele and p denote the frequency of allele B. We use the following notations.

N: the number of unrelated individuals

N_f: the number of sib-pairs

n: the number of minor alleles in genotypes of the N unrelated individuals

n_ij: the number of sib-pairs with genotype pair (i,j) or (j,i)

: the number of sib-pairs with genotype pair (i,j) or (j,i) and the pair of genotypes has k alleles IBD E-step:

M-step: where

Appendix C

Mean and Variance of TOW-sib

It is easy to know that . In the following, we will calculate the variance of T_TOW−sib.

Let g₁ and g₂ denote genotypes of a sib-pair, x=g₁+g₂, and p (q=1−p) denote the MAF. Using the distribution given by Table 1, we have

E(g₁−2p)⁴=2pq, var(x)=6 pq, and E(x−4p)⁴=6pq(pq+3).

We know that , , and . Let n=n_s+n_a+n_c, x_i=g_1im+g_2im−4p_m for i=1,…,n_s, for i=1,…,n_a, for i=1,…,n_c, and y_i is similarly defined for the k^th variant as x_i for the m^th variant.

We can calculate the variance of T_TOW−sib if we note that

where N₁=9n_s+n_a+n_câ⁴, N₂=(6n_s+2n_a+2n_câ²)²−34n_s−4n_a−4n_câ⁴, and ;

where N₃=(6n_s+2n_a+2n_câ²)², N₄=2((4n_s+n_a+n_câ²)²), E(x₁²y₁²) is estimated with , E(x_n²y_n²) is estimated with , and cov(x_n, y_n) is estimated with .

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sha, Q., Zhang, S. Test of rare variant association based on affected sib-pairs. Eur J Hum Genet 23, 229–237 (2015). https://doi.org/10.1038/ejhg.2014.43

Download citation

Received: 07 July 2013
Revised: 06 November 2013
Accepted: 30 December 2013
Published: 26 March 2014
Issue Date: February 2015
DOI: https://doi.org/10.1038/ejhg.2014.43

Test of rare variant association based on affected sib-pairs

Subjects

Abstract

Similar content being viewed by others

Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits

An evaluation of approaches for rare variant association analyses of binary traits in related samples

Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data

Introduction

Materials and Methods

Simulation

Results

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information (DOC 80 kb)

Appendices

Appendix A

Score Test Statistic

Appendix B

Expectation-maximization Algorithm to Estimate Allele Frequency Based on Sib-pairs and Unrelated Individuals

Appendix C

Mean and Variance of TOW-sib

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits

An evaluation of approaches for rare variant association analyses of binary traits in related samples

Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data

Introduction

Materials and Methods

Simulation

Results

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information (DOC 80 kb)

Appendices

Appendix A

Score Test Statistic

Appendix B

Expectation-maximization Algorithm to Estimate Allele Frequency Based on Sib-pairs and Unrelated Individuals

Appendix C

Mean and Variance of TOW-sib

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links