Introduction

Waist-to-hip ratio (WHR) is a common anthropometric measurement of body fat distribution, in particular central abdominal fat. A larger WHR indicates more intra-abdominal fat deposition and is an established risk factor for type 2 diabetes1, 2 and cardiovascular disease.3, 4 Moreover, it has been demonstrated that body shape, rather than weight, is a better predictor of cardiovascular risk.5, 6 WHR is a heritable trait,7 and many studies have investigated the genetic influence on body fat distribution.8, 9 Recent genome-wide association studies (GWAS) of common variants conducted in different ethnic populations have reported several genetic loci associated with WHR.10, 11, 12, 13, 14 However, these findings explain only a modest percentage of the genetic variance of WHR. The involvement of rare (minor allele frequency (MAF) ≤0.01) variants has not previously been well studied owing to their poor representation on commercial genotyping arrays.

Our study aims to detect the association between rare variants and WHR to better understand the genetic etiology underlying central adiposity. Although rare variants potentially have larger effect sizes than common variants, there is little statistical power to detect association signals when analyzing individual variants.15 However, taking into account the cumulative effects of multiple rare variants in specific genes or genetic regions can strengthen association signals, thereby increasing the power to detect rare variant associations. We analyzed exome sequence data using several rare variant association methods: Combined Multivariate Collapsing (CMC),16 Burden of Rare Variants (BRV) method16, 17, 18 Weighted Sum Statistic (WSS),19, 20 Variable Threshold (VT)21 and Sequence Kernel Association Test (SKAT).22

Exome sequencing targets the protein-coding variants in the human genome. It is a proven approach to detect causal variants for Mendelian disorders.23 There is also great interest in using exome sequencing to elucidate the involvement of rare variants in the genetic etiology of complex traits. With that goal in mind, the National Heart, Lung, and Blood (NHLBI)-Exome Sequencing Project (ESP) was realized. This project sequenced samples from ~7000 individuals who were selected for 12 primary traits and also had data on 59 secondary phenotypes.24, 25, 26 One of the secondary phenotypes, WHR, was available on 1534 European-American (EA) and 1216 African-American (AA) women and 784 EA and 296 AA men. For this study, we limited our analysis to women, as it has previously been demonstrated that there are significant differences in the distribution of WHR measurements and in the underlying genetic etiology of WHR between men and women.10, 13 However, owing to the small sample size for men, they were not analyzed.

Subjects and methods

Study sample ESP and quality control

A total of 6823 participants in NHLBI-ESP underwent exome sequencing in approximately equal numbers at either the Broad Institute of MIT/Harvard or the University of Washington. In brief, paired-end sequencing (2 × 76 bp) was performed on either the Illumina Genome Analyzer II or HiSeq2000 sequencers (Illumina, San Diego, CA, USA) to an average depth of ~90 ×. The Broad Institute used Agilent SureSelect Human All Exon 50 Mb capture target (Agilent, Santa Clara, CA, USA) while the University of Washington used Roche NimbleGen SeqCap EZ (Roche, Basel, Switzerland). Single-nucleotide variants (SNVs) were called using glfMultiples of the UMAKE pipeline at the University of Michigan that implements a maximum likelihood method to perform multi-variant calling. Reads were mapped to human reference (hg19) with Burrows-Wheeler Aligner (BWA)27 and summarized in Binary Align/Map (BAM) files as joint calling input and further refined by duplicate removal, recalibration and indel re-alignment using the Genome Analysis ToolKit (GATK).28 Low-quality reads with phred-scaled mapping quality <20 were excluded. A support vector machine (SVM) classifier, which is also part of the UMAKE pipeline, was used to separate likely true-positive and false-positive variant sites, and those variant sites that were likely to be false positives were excluded. A total of 1 908 614 SNVs passed the SVM filter. Subsequent data quality control was performed using Variant Association Tools (VAT),29 unless otherwise denoted. We then removed variant calls with a read depth ≤10 ×, 284 variant sites with an average read depth across all samples >500 × as these regions are likely to contain copy number variants that can induce incorrect variant calls and variant sites that deviated from Hardy–Weinberg equilibrium in either EA (N=2592) or AA (N=2779) with P-values<5 × 10−8 based on an exact test.30 To alleviate bias from different capture targets, we removed variant sites missing >10% of their genotypes and samples with a genotype missing rate >10% per gene in association analysis step. Only those genes with ≥3 variant sites were analyzed.

Although self-reported ethnicity was available, we designated EA and AA using Multidimensional Scaling (MDS)31 and removed 30 samples owing to indeterminate race and 27 samples where there was a discrepancy between self-reported and MDS ethnicity. ESP sequenced a number of duplicate samples for quality control. Additionally, several related individuals were included in the study. In order to detect cryptic duplicate and related samples, kinship analysis was performed.29, 31 For duplicate sample pairs, the one with the best sequence quality was retained in the analysis. For related individuals, only one individual per related group was analyzed; selection was based upon availability of phenotype data as well as quality of sequence data. To avoid type I error owing to inclusion of related individuals, only one individual from each family was retained in the analysis. A total of 13 EA samples and 25 AA samples were removed owing to relatedness. It was also evaluated if reported sex differed from chromosomal sex.29 Fifteen individuals for whom reported sex differed from chromosomal sex or who had Turner or Klinefelter syndrome were removed from the analysis.

Phenotype quality control was performed using PhenoMan.32 We excluded females who were aged either <18 or >90 years, had a height <140 cm, a body mass index (BMI) <15 kg/m2 or had unrealistic WHR values, presumably owing to data entry errors. In total 24 EA and 30 AA females with WHR phenotype data were excluded from the analysis, and 1510 EA and 1186 AA were available for analysis.

Association analyses

Association analysis was performed in EA and AA separately using linear regression models, which included a number of covariates that could be confounders: age, age squared, BMI, and current smoking status. Covariate selection was performed using PhenoMan under backward selection. In order to control for population stratification and substructure, we estimated MDS components separately for EAs and AAs and included the first two components in the regression model. Additionally, a dummy variable was included in the regression model to control for sampling procedure, cohort membership and capture target.

All the association tests were performed using the VAT software.29 Before initiation of association testing, all the variants were first annotated using the SeattleSeq Variation Annotation 134 and gene regions were defined by the Reference Sequence (RefSeq) database. We applied five rare variant association tests using a linear regression framework analyzing EAs and AAs separately. Only potentially protein function-altering variant sites, that is, missense, nonsense and splice sites were analyzed. Four fixed-effect methods, CMC, BRV, WSS and VT, were used in the analysis, which involve different allele coding, weighting and maximization procedures. The CMC uses an indicator variable to code whether a rare variant is present or absent within a gene region while the BRV regresses the number of rare variants within a gene region for each individual. The WSS implements BRV coding but weights each variant by its overall sample MAF, thereby up-weighting rare variants. The VT also employs BRV coding but maximizes the test over allele frequencies, adjusting for multiple testing. The random effects test SKAT, which implements a variance component score statistics, was also used to analyze the data. For SKAT, we used Quantile–Quantile (QQ) normalized WHRs, as it was computationally infeasible to obtain permutation-based empirical P-values while for the fixed-effect tests unadjusted phenotype values were used. For all tests with the exception of VT, we analyzed rare variants with MAF≤0.01 based on the population-specific MAF for EAs and AAs estimated from the entire ESP sample. A total of 15 405 genes with 200 526 SNVs and 15 599 genes with 215 400 SNVs were analyzed for EAs and AAs, respectively. For the VT method, low-frequency variants were also analyzed using MAF≤0.05; a total of 15 602 genes with 210 718 SNVs and 15 904 genes with 238 574 SNVs were analyzed for EAs and AAs, respectively.

Although in the data quality-control steps variant sites missing >10% of variant calls were removed, even lower levels of missing data can still increase type I error. We therefore replaced missing genotypes with an imputed genotype based on the population-specific ESP allele frequencies.17 For all rare variant association methods, P-values were obtained empirically using adaptive permutation with the exception of SKAT for which analytical P-values were acquired owing to the computational intensity of this method. Finally, meta-analysis was used to combine test results from EA and AA, using a sample-size based method33 for the CMC, BRV, WSS and VT methods and the MetaSKAT package22 for SKAT.

Functional evaluation of variant sites

To estimate the evolutionary conservation of the nucleotide and the amino-acid residue changes, PhyloP and GERP scores were used. PhyloP indicates nucleotide conservation based on multiple alignments of 100 vertebrate species under a null hypothesis of neutral evolution. GERP provides position-specific estimates of evolutionary constraint using maximum likelihood evolutionary rate estimation. To assess potential functional consequences, PolyPhen2, PROVEAN, SIFT, CADD, MutationTaster and LRT were included. PolyPhen-2 implements a naive Bayes classifier to predict possible impact of an amino-acid substitution from sequence alignments and protein structural properties. SIFT and PROVEAN compute a combined score based on the degree of conservation of amino-acid residues in the sequence alignments; and PROVEAN can also measure the potential impact of indels. CADD objectively integrates multiple annotations into one measure (C score) for each variant. MutationTaster employs a Bayes classifier to calculate probabilities for whether the alteration to be harmful or not. The LRT method utilizes the log likelihood ratio of the conserved relative to neutral model to measure the deleteriousness of a mutation. Using these bioinformatics tools listed above, we annotated rare variants analyzed in selected genes.

Results

We performed whole-exome sequencing data analysis on 1510 EA and 1186 AA women to study the association between the quantitative trait WHR and rare variants. These analyzed samples were ascertained from six population-based cohorts and classified into different phenotypic cohorts according to their associated primary phenotypes and were sequenced using one of four in-solution capture targets (Supplementary Table S1). EA women had a mean WHR of 0.84±0.087 (refers to mean±SD hereafter) and AA women had a mean WHR of 0.85±0.092. Additional phenotypic information on the study participants can be found in Table 1. The P-values for covariates that were included in the regression analysis are listed in Supplementary Tables S2 and S3. The distributions of variants categorized by type, for example, missense, and frequency, that is, ≤0.01 and >0.01 for each population are shown in Supplementary Table S4. More rare variant sites are observed in AAs compared with EAs (ie, P<2.2 × 10−16 for both missense and synonymous variants, proportion test), while EAs have more rare nonsense and splice sites than AAs (P<2.2 × 10−16 and P=6.63 × 10−13, respectively). For all types of coding variants with a MAF>1%, more sites are observed in AAs compared with EAs, but only missense and synonymous variant sites showed a statistical difference (P=7.95 × 10−4 and P=7.29 × 10−4, respectively). Although rare coding variants are predominant within the sample, only a small proportion of rare coding variants are shared in EA and AA (8.7, 9.6, 6.4 and 5.9% for missense, synonymous, nonsense and splice sites, respectively). Although variant sites with a MAF>0.01 occur less frequently than rare variants, a larger proportion are shared in both populations (35.9, 40.1, 32.0 and 27.6% for missense, synonymous, nonsense and splice sites, respectively). Notably, synonymous variants have the largest proportion of variants shared between populations regardless of MAF.

Table 1 Phenotypic information for women analyzed for waist-to-hip ratios

Population specific QQ plots and Manhattan plots for each gene-based test are shown in Supplementary Figures S1–S5. The QQ plots demonstrate that for each rare variant association test type I error is well controlled. The most significant genes associated with WHR (P<0.0005 in any of the five gene-based tests) in EA, AA or meta-analysis are listed in Supplementary Tables S5–S7. Most of the genes with suggestive association with WHR have consistent results for fixed-effect rare variant association methods (CMC, BRV, WSS and VT) while other genes show suggestive associations for the random-effects test SKAT.

An exception was for IKBKB (inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta; MIM 603258), which yielded P-values less than exome-wide significance (P=2.5 × 10−6 Bonferroni corrected P-value for testing 20 000 genes) and was detected to be significantly associated with WHR (eg, CMC results β=−0.131; P=4.00 × 10−8, Table 2) in EA females for all the fixed-effect rare variant association methods, while the P-value of 4.03 × 10−6 for SKAT was slightly greater than exome-wide significance (Table 2).

Table 2 Association analysis results for IKBKB

Among the EA women in our data set, eight rare IKBKB missense variants were analyzed (Figure 1a,Supplementary Table S8). Three variant sites are novel to ESP. Each of the eight IKBKB missense variants was observed in only a single heterozygous EA carrier in our data set. Based on consensus prediction across multiple annotation methods, four variant sites likely have functional consequences, while for two others it is unclear whether they have an impact on protein function and two variant sites are most likely benign. The aggregated effect of IKBKB rare variants decreases WHRs in EA (β=−0.131); and each of the eight EA rare variant carriers have WHRs lower than the mean (0.84), with the exception of one individual who is the carrier of the rare variant at rs150441824 (WHR=0.88), which is predicted to be benign. The carrier of the rare variant at rs202226005, which is also predicted to be benign, has a WHR of 0.78. Although her WHR is lower than the mean, it is still greater than other females carrying IKBKB variants, which are most likely to be functional. IKBKB does not show an association with WHR in AA (CMC result β=−0.00042; P=0.99; meta-analysis with EA: P=3.86 × 10−5, Table 2). For AA, 11 rare missense variants within IKBKB were discovered by exome sequencing and none overlap with those observed in EA (Figure 1a,Supplementary Table S8). Additionally, AA rare variant carriers have mean WHR (0.85±0.084) and six AA rare variant carriers have WHR that are greater than the mean WHR (0.85±0.092) for AA women (Figure 1a). Thus the allelic architecture of central adiposity may differ for women of African versus European ancestry.

Figure 1
figure 1

IKBKB rare variants associated with decreased WHR in EA. (a) WHR distributions in 1510 EA (blue) and 1186 AA (red) are shown in the upper panel. WHR distributions in IKBKB rare missense variant carriers in EA (blue solid triangle) and AA (red solid dot) are shown in the bottom panel. y Axis represents each IKBKB rare variant analyzed in gene-based test ordered by increasing WHRs. (b) Potential functional consequences for IKBKB rare variants in insulin mechanism. A full color version of this figure is available at the European Journal of Human Genetics journal online.

Several additional genes were nominally associated with WHR (P<5 × 10−4, Supplementary Tables S5–S7). Most of them are novel and not within regions previously identified to be associated in GWAS of common variants, with the exception of one gene, COBLL1 (COBL-like protein 1; MIM 610318; CMC result β=0.04; P=2.33 × 10−4, Supplementary Table S5). A few studies10, 12, 13, 14 reported 3′ UTR and intergenic COBLL1 variants to be significantly associated (P<5 × 10−8) with increased WHR; however, these SNPs were not analyzed because they were not exonic and hence were not included in the capture array. To test whether common variant associations have potential effect on rare variant association signal, we obtained genotype data from HapMap3 within a 642-kb region surrounding COBLL1 (Supplementary Figure S6). A common intronic variant rs6712203 in WHR data set were found in a linkage disequilibrium (LD) block with the GWAS index SNPs (r2=0.79 for rs10195252 and r2=0.85 for rs6717858, Supplementary Figure S6). Conditional regression was performed adjusting for rs6712203 to eliminate potential effects of those index SNPs (Supplementary Table S10). The COBLL1 rare variant association signal still remained nominally significant after adjustment (CMC result P=3.57 × 10−4, Supplementary Table S10), indicating that the association signal from rare variants is likely to be independent of common variant associations. Additionally, some of the nominally associated genes that were not previously shown to be associated with WHR have data from functional studies that suggest their involvement in adipocyte metabolism or insulin signaling, which includes CTSB (MIM 116810) (adipocyte metabolism), FOXO1 (MIM 136533) (adipocyte metabolism), ITIH5 (MIM 609783) (adipocyte metabolism) and PAQR3 (MIM 614577) (insulin signaling) (see Supplementary Tables S5–S7 for additional information on β and P-values for these genes).

Additionally, we performed rare variant association analyses for two significantly associated genes (IKBKB and COBLL1) for the traits waist circumference and hip circumference, respectively (Supplementary Table S11). Interestingly, rare variants in IKBKB were nominally associated with waist circumference (P=9.21 × 10−6), and in COBLL1 were weakly associated (P=0.0080). Neither of the genes were associated with hip circumstance (P>0.1).

To control for multiple testing due to performing association analysis using five gene-based methods, we used Bonferroni correction. After adjusting for the five methods (CMC, BRV, WSS, VT and SKAT), rare variants in IKBKB still showed a significant association with WHR (CMC result P=2.00 × 10−7) and COBLL1 showed a suggestive association (CMC result P=0.0012, Supplementary Table S12), indicating that multiple testing did not substantially deflate P-values.

We approximated the strength of genetic contributions to WHR by comparing effect size between WHR loci from GWAS10, 13, 14 and rare variants in our study. Rare variants in IKBKB have an aggregated effect size of 0.13 of decreased WHR, which is much greater than the largest effect size from GWAS, 0.06 of the LYPLAL1 locus associated with WHR in women.10, 13, 14 For the COBLL1 locus, carriers with the T allele of common variant rs10195252 have an increased WHR of 0.053, and the increase is consistent among different GWAS.10, 13, 14 However, rare variants in COBLL1 have an aggregated effect size of 0.04 to increased WHR, which is about the same scale as the effect size of the common variant. Therefore, there is no clear evidence from this study that rare variants have larger effects than common variants for the WHR phenotype.

Discussion

We analyzed exome sequence data from 1510 EA and 1186 AA females and using different methods to detect rare variant associations with the quantitative trait WHR. Rare variants in IKBKB are strongly associated with WHR in female EA. In addition, we found nominally significant association in COBLL1, a gene that was previously shown to be associated with WHR in GWAS of common variants. Several other nominal rare variant associations with WHR are with genes previously shown to have a role in adipocyte and insulin metabolisms.

IKBKB encodes the protein IKKβ, which phosphorylates inhibitor of NF-κB (IκB) to disassociate the inhibitor/NF-κB complex and activate NF-κB in inflammation. IKKβ is a key mediator in inflammation pathways by several mechanisms.34 Obesity- or nutritional overload-induced IKKβ/NF-κB activation initiates the inflammatory process and ultimately results in insulin resistance in hepatocytes and adipocytes.35, 36 In addition to impairment of peripheral insulin sensitivity, the IKKβ/NF-κB pathway affects glucose metabolism in pancreatic islets by activation of inflammation, causing islet β-cell failure in type 2 diabetes.37 Furthermore, overnutrition evokes activation of the IKKβ/NF-κB pathway in the hypothalamus leading to inflammation and the disruption of central insulin and leptin signaling, thus resulting in impaired central nervous system's control of food intake and promoting body weight gain.38, 39 Consequently, inhibition of IKKβ activation was also identified as a potential therapeutic target to reverse inflammation in obesity-associated type 2 diabetes.40 Meanwhile, some rare homozygous null mutations in IKBKB have been reported in patients with severe combined immunodeficiency, which result in IKKβ loss of expression and therefore impairment of immune activation.41 Additionally, a reported GWAS signal near IKBKB downstream for the phenotype tissue plasminogen level is in partial LD with a coding variant in the tissue plasminogen activator (PLAT) gene (MIM, 173370) nearby,42 which suggests that central adiposity with insulin resistance may be correlated with plasminogen activator through IKBKB. Interestingly, our results showed IKBKB rare variants were associated with reduced WHRs which suggested that they are protective against abdominal obesity (Figure 1a). IKBKB amino-acid alterations could lead to IKKβ protein substructure change, and further cause IKKβ function inactive in inflammation pathways. The malfunction of IKKβ protein may reverse the insulin resistance and promote normal regulation of food intake and metabolic homeostasis (Figure 1b).

Rare variants within COBLL1 displayed a nominal association with increased WHR. Previously, GWAS also reported an association between common variants in COBLL1 (rs10195252 and rs6717858) and increased WHR.10, 12, 13, 14 Conditional analysis adjusting for ESP common SNP suggested that the COBLL1 rare variant association signal is unlikely influenced by associated GWAS common SNPs. Moreover, other common variants in this locus were also found to be associated with elevated fasting insulin,43 increased high-density lipoprotein cholesterol44 and risk of developing type 2 diabetes.45 These prior association findings provide evidence that genetic variations in this region may contribute to multiple biological traits, which could potentially influence WHR.

Among novel genes nominally associated with WHR, some have been found to have functional impact on adipocyte and insulin metabolism. Cathepsin B, encoded by the CTSB gene, contributes to adipocyte cell death and macrophage infiltration into adipose tissue associated with adipocyte hypertrophy.46 FOXO1 (Forkhead box protein O1), encoded by FOXO1 gene, is a transcription factor that is essential to the decision for a preadipocyte to commit to adipogenesis.47 ITIH5 (inter-alpha-trypsin inhibitor heavy chain family, member 5), encoded by ITIH5, is highly expressed in adipose tissue and is increased in obesity while being reduced after diet-induced weight loss.48 PAQR3 (progestin and adipoQ receptor family member III), encoded by PAQR3, was found to modulate insulin signaling by phosphoinositide 3-kinase pathway.49 Although these genes did not reach exome-wide significance, these related studies provide additional evidence that rare variants in these genes may have a role in WHR.

In addition to the association with WHR, we identified that rare variants in IKBKB showed evidence of association with waist circumference but showed no association with hip circumference. Therefore, rare variants in these WHR-associated genes might exert their impacts primarily though waist circumference instead of hip circumference, indicating that waist circumference might be a potential driver behind WHR rare variant association.

Interestingly, most of our significant findings are confined to either EAs or AAs, with the majority of associations being found for EAs. This can be due to differences in allelic architecture between populations and the effect sizes of causal variants. Most variant sites are EA or AA specific or are private variants, for example, IKBKB. Additionally, only a small proportion, 8.7%, of the analyzed rare variants was shared by EAs and AAs. AAs have a greater number of rare variants compared with EAs, which should increase the power to detect WHR associations unless a greater proportion of AA-specific variants are non-causal. Additionally, the sample size for AAs is approximately three-fourths the size for EAs, reducing the power to detect associations in AAs. The failure to replicate associations between EAs and AAs may indicate that some of these findings are false positives. Our rare variant association study with current sample size (N=1510 for EAs and N=1186 for AAs) is underpowered, which can increase false-positive findings. Simulation has been used to evaluate the necessary sample sizes for rare variant association studies. These simulation studies suggest that for the most part large sample sizes, for example, 50 000 individuals, are necessary for a sufficient power.50

Replication of our findings using the currently available HumanOmni5Exome BeadChip is not possible, owing to the large number variants that are absent from the exome chip (Supplementary Tables S6–S7). For example, of the 8 variants observed in EA for the IKBKB gene only 2 are found on the exome chip, while for COBLL1, of the 26 variants observed for EA only 7 are present on the exome chip. Therefore, deep targeted sequencing or whole exome or genome data sets with information on WHR are necessary to replicate our findings.

When association results for the gene-based tests were compared, it was observed that the results for most genes are correlated, for example, genes with suggestive significant results shared similar results for each burden tests, despite that some genes had a result that was unique to one method (Supplementary Tables S5–S7). Moreover, although performing a variety of gene-based tests come at a cost of multiple testing that can reduce power, there is no single uniquely most powerful test and performance of tests varies depending on the underlying genetic model, which is unknown.51 For example, fixed-effect test BRV is powerful when the majority of variants have an effect that is unidirectional while the variance component test SKAT is powerful when either a small proportion of variants are causal or the causal variants have bidirectional effects. Even after performing a Bonferroni correction for five gene-based methods, the association between rare variants in IKBKB and WHR remained exome-wide significant (Supplementary Table S12).

In summary, we performed the first study to detect the association between rare variants and complex trait WHR using a variety of rare variant association methods using exome sequence data. Our study provides a preliminary understanding of the role of rare variants in WHR and potentially insulin response pathways, which may also contribute to obesity and type 2 diabetes. Although many of our findings are intriguing and limitations are lack of replication and small sample size, replication and functional studies are needed to confirm the results and evaluate whether these same genes also have a role in obesity and type 2 diabetes.

Web resources

1000 Genomes: http://www.1000genomes.org/

Combined Annotation Dependent Depletion (CADD): http://cadd.gs.washington.edu/

DistiLD: http://distild.jensenlab.org/

Database of Genotypes and Phenotypes (dbGaP): http://www.ncbi.nlm.nih.gov/gap

dbSNP: http://www.ncbi.nlm.nih.gov/SNP/

dbSNP138: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/snp138.txt.gz

Exome Aggregation Consortium (ExAC): http://exac.broadinstitute.org/

Exome Variant Server (EVS): http://evs.gs.washington.edu/EVS/

GERP: http://mendel.stanford.edu/SidowLab/downloads/gerp/

HapMap3: ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/

LRT: http://www.genetics.wustl.edu/jflab/lrt_query.html

MetaSKAT: http://www.hsph.harvard.edu/skat/metaskat/

MutationTaster: http://www.mutationtaster.org/

PhenoMan: https://code.google.com/p/phenoman/

PhyloP: http://compgen.bscb.cornell.edu/phast/

PolyPhen2: http://genetics.bwh.harvard.edu/pph2/

PROVEAN: http://provean.jcvi.org/index.php

RefSeq database: ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/RefSeqGene/

SIFT: http://sift.jcvi.org/

Variant Association Tools (VAT): http://varianttools.sourceforge.net/

Data submission

The data analyzed are from NHLBI-ESP. The data, that is, BAM and VCF files, and phenotypes are available in the database of Genotypes and Phenotypes (dbGaP) (http://www.ncbi.nlm.nih.gov/gap). Additionally, variant-level data are available from dbSNP, the Exome Variant Server and ExAC databases.