Abstract
The impact of the parental origin of associated alleles in GWAS has been largely ignored. Yet sequence variants could affect traits differently depending on whether they are inherited from the mother or the father, as in imprinted regions, where identical inherited DNA sequences can have different effects based on the parental origin. To explore parent-of-origin effects (POEs), we studied 21 quantitative phenotypes in a large Hutterite pedigree to identify variants with single parent (maternal-only or paternal-only) effects, and then variants with opposite parental effects. Here we show that POEs, which can be opposite in direction, are relatively common in humans, have potentially important clinical effects, and will be missed in traditional GWAS. We identified POEs with 11 phenotypes, most of which are risk factors for cardiovascular disease. Many of the loci identified are characteristic of imprinted regions and are associated with the expression of nearby genes.
Similar content being viewed by others
Introduction
Genome-wide association studies (GWAS) typically treat alleles inherited from the mother and the father as equivalent, although variants can affect traits differently depending on whether they are maternal or paternal in origin. In particular, parent-of-origin effects (POEs) can result from imprinting, where epigenetic modifications allows for differential gene expression on homologous chromosomes that is determined by the parental origin of the chromosome. Mutations in imprinted genes or regions can result in diseases. For example, two very different diseases, Prader-Willi Syndrome and Angelman Syndrome are due to loss of function alleles in genes within an imprinted region on chromosome 15q11-13. Inheriting a loss of function mutation for the SNRPN gene from the father results in Prader-Willi Syndrome but inheriting a loss of function mutation for the UBE3A gene from the mother results in Angelman Syndrome1,2. Long noncoding RNA genes at this and other imprinted regions act to silence (i.e., imprint) genes in cis. Imprinted genes are often part of imprinted gene networks, suggesting regulatory links between these genes3,4,5. More than 150 imprinted genes have been described in humans6, but there are likely many other, as yet undiscovered, imprinted loci.
Previous studies have utilized pedigrees to test maternal and paternal alleles separately for association with phenotypes or with gene expression to uncover new imprinted loci6,7,8,9,10. Kong et al.7 discovered one locus associated with breast cancer risk only when the allele is inherited from the father and another locus associated with type 2 diabetes risk only when the allele is inherited from the mother. Garg et al.8 reported parent-of-origin cis-eQTLs with known or putative imprinted genes affecting gene expression. Two additional studies by Zoledziewska et al.11 and Benonisdottir et al.6 identified opposite POEs on adult height at known imprinted loci. Both studies reported associations with variants at the KCNQ1 gene, and one showed additional opposite POEs with height at two known imprinted loci (IGF2-H19 and DLK1-MEG3)6. These studies provide proof-of-principle that alleles at imprinted loci can show POEs, some with opposite effects, with common phenotypes.
Many existing studies and methods identify parent-of-origin effects use case/parent trios or case/mother duos12,13,14,15,16. Similar to Kong et al.7, our method does not require data on the parent and only uses the parent-of-origin informative alleles, which were assigned and phased using PRIMAL17. In contrast to Kong et al.7 which used binary traits, our method tests for parent-of-origin effects on quantitative traits, similar to Benonisdottir et al.6, which tested for parent-of-origin effects on height.
No previous study has included a broad range of human quantitative phenotypes or has studied genome-wide variants with effects in different directions depending on the parent-of-origin. To address this possibility, we develop a statistical model that directly compares the effects of the maternal and paternal alleles to identify effects that are different, including those that are opposite. We apply this model in a study of 21 common quantitative traits that were measured in the Hutterites, a founder population of European descent for which we have phased genotype data17. We identify variants with maternally inherited or paternally inherited effects only and variants with opposite POEs. Some of the identified regions have characteristics similar to known imprinted genes. Overall, we show that this model can identify putative imprinted regions with POEs for a broad range of clinically relevant quantitative phenotypes.
Results
Genome-wide association studies (GWAS)
We first performed standard GWAS of 21 traits in the Hutterites (Supplementary Table 1). These studies identified one genome-wide significant association (p < 5 × 10–8) with each of five of the 21 traits: low-density lipoprotein level (LDL)-cholesterol, triglycerides, carotid artery intima media thickness (CIMT), left ventricular mass index (LVMI), and monocyte count. The results of all 21 GWAS are summarized in Supplementary Table 2 and Supplementary Fig. 1. Results for all variants for all GWAS are deposited in dbGaP (phs000185).
Parent-of-origin GWAS
We considered two possible mechanisms of POEs. In the first, the effect size of one parent’s allele is close to zero and the effect size of the other parent’s allele is different from zero. For these cases, we performed a paternal only or maternal only GWAS. In other cases, the maternal and paternal alleles may both have effect sizes different from zero, but the effects are significantly different from each other or opposite in direction. To detect these types of POEs, we developed a model that tests for differences between parental effects (see Methods). This model is especially powerful to identify variants with parental effects in opposite directions.
Maternal and paternal GWAS: Using the same phenotypes, genotypes, pedigree, and criteria for significance as in the standard GWAS, we tested for maternal and paternal effects on each trait by testing each parentally inherited allele with the trait of interest, similar to previous studies7,8,11. Variants were considered to have POEs if they had a p-value less than 5 × 10–8 in only one parent and were not significant in the standard GWAS (i.e., the LDL association on chromosome 19 and the triglycerides association chromosome 11 were not considered to have POEs; see Supplementary Table 1). The most significant parent-of-origin associations are summarized in Table 1. All significant results of the parent-of-origin maternal and paternal GWAS for all 21 phenotypes are included in Supplementary Data 1 and 2.
Overall, seven phenotypes had genome-wide significant parent-of-origin associations: four in the maternal only GWAS and three in the paternal only GWAS. Three cardiovascular disease (CVD)-associated phenotypes (age at menarche, CIMT, LVMI) and one lung function phenotype (forced expiratory volume in 1 second [FEV1]) were associated with maternally inherited alleles only.
A maternally inherited allele at rs7184983 (G) on chromosome 16 was associated with younger age of menarche (p = 3.11 × 10–8) (Fig. 1). This SNP, rs7184983, is located upstream of the BBS2 gene and is associated with increased expression of OGFOD1 in transformed fibroblast cells and tibial nerve (p = 6.3 × 10–10)18. The maternally inherited allele at rs4077567 (G) on chromosome 2 was associated with decreased CIMT (p = 3.02 × 10–8) (Supplementary Fig. 2). This SNP is in the intron of a long intergenic noncoding gene, LINC00607, that is expressed in aorta, coronary, and tibial artery, all tissues potentially relevant to CIMT and atherosclerosis18. A maternally inherited allele at rs574232282 (G) in the intron of SCMH1 on chromosome 1 was associated with increased LVMI (p = 1.39 × 10–8) (Supplementary Fig. 3). SCMH1 is expressed in aorta, coronary, and tibial artery18. SCMH1 protein associates with the polycomb group multiprotein complexes required to maintain the transcriptionally repressive state of certain genes18. Lastly, maternally inherited alleles at rs9849387 (A) and rs6791779 (C) on chromosome 3 were both associated with reduced FEV1 (p = 4.10 × 10–9 and 1.48 × 10–8, respectively) (Supplementary Fig. 4). The nearest gene to rs9849387 is ROBO2 (65 kb, downstream), which is expressed in the lung as well as in brain, and ovary18. The nearest gene to rs6791779 is MIR4444-1(267 kb) whose expression has not been characterized.
Three other CVD-related phenotypes (systolic blood pressure, LDL-C, and total cholesterol) had associations with paternally inherited alleles only. The paternally inherited allele at rs12024326 (A) on chromosome 1 was associated with lower LDL-cholesterol levels (p = 8.06 × 10–10) (Fig. 2). rs12024326 is in the intron of gene ADCK3, and the same allele was associated with increased expression of ADCK3 in whole blood (p = 2.5 × 10–11), as well as decreased expression of a neighboring gene, CDC42BPA in brain (cerebellum), heart (left ventricle), esophagus, and tibial artery (p = 3 × 10–11)18. The paternal G allele at rs4843650 on chromosome 16 was associated with increased LDL-C and is located in the intron of JPH3, which is expressed predominantly in the brain18. A SNP on chromosome 13 (rs1536182) was associated with systolic blood pressure levels when it was inherited from the father (Supplementary Fig. 5). The paternally inherited A allele at this SNP was associated with decreased systolic blood pressure, as well as decreased expression of its closest gene, LINC01055, a long intergenic noncoding gene, in testis (p = 2.5 × 10–07)18. A paternally inherited allele at rs113588203 (G) on chromosome 1 was associated with lower total cholesterol (p = 1.76 × 10–8) (Supplementary Fig. 6). This SNP is intergenic between RHOU (96 kb, downstream), which is expressed across multiple tissues, and MIRR4454 (331 kb), which is expressed in adipose, kidney and heart tissues18.
GWAS for differential parent-of-origin effects: Because some imprinted regions include genes that have both maternal and paternal specific tissue expression, we next tested for such differential effects with these 21 phenotypes. In these analyses, we compared the effect and direction of the association between maternal and paternal alleles to identify variants that have different effects, including opposite effects, on the phenotype. Such loci would be completely hidden in standard GWAS in which paternally and maternally inherited alleles are combined. These opposite effect GWAS revealed 11 independent loci with opposite POEs for nine different traits, at least six of which are associated with CVD risk (Table 2 and Supplementary Fig. 7). All significant results of the parent-of-origin GWAS for all 21 phenotypes are included in Supplementary Data 3.
A locus on chromosome 16, near the CDH8 gene (128 kb, upstream), was associated with opposite POEs with age of menarche (Fig. 3). CDH8 is highly expressed in the brain, as well as in the aorta artery and pituitary gland. Two loci on chromosomes 5 and 6 were associated with opposite POEs on body mass index (BMI) (Fig. 4). The most significant variant on chromosome 5 (rs77785972) is near a long intergenic noncoding gene, LINC01340 (409 kb, downstream), whose expression has not been well characterized. The SNP on chromosome 6 (rs17605739) is also in a long intergenic noncoding gene, RP1-209A6.1, which is expressed in low levels in the tibial artery, bladder, spleen, lung, pituitary gland, as well as testis.
A SNP on chromosome 16 (rs1032596) was associated with opposite POEs on LDL-cholesterol (Supplementary Fig. 8). This SNP lies in the intron of another long noncoding RNA gene, LINC01081, which has been suggested to be imprinted because its downstream genes have also been shown to have parent- and tissue-specific activity19. A region on chromosome 2 has opposite effects associated with LVMI (Supplementary Fig. 9). The associated SNPs are in the intron of XIRP2, a cardiomyopathy associated protein that is expressed in skeletal muscle and heart left ventricle, suggesting that this gene could play a role in determining left ventricular mass20,21,22. In addition, the most significant SNP at this region, rs17616252 (and multiple SNPs in LD) is a strong eQTL (p = 1.8 × 10–13) for the gene XIRP2 in skeletal muscle, XIRP2-AS1 in testis, and B3GALT1 in transformed fibroblast cells (p = 9.9 × 10–09)18. Four variants in a region on chromosome 1 in a microRNA gene, MIR548F3, were associated with opposite POEs on triglyceride levels (Supplementary Fig. 10). The expression of MIR548F3 has not been characterized. SNP rs7033776 near MELK (27 kb, downstream) on chromosome 9 was associated with opposite effects on total cholesterol (Supplementary Fig. 11). MELK is expressed in the colon and esophagus in addition to transformed lymphocytes and fibroblasts18.
Nine linked variants on chromosome 1 were associated with opposite POEs of blood eosinophil count (Supplementary Fig. 12). These variants are near the gene IGSF21 (27 kb, downstream) which is a member of the immunoglobulin superfamily and likely acts as a receptor in immune response pathways23. A variant on chromosome 3, rs12714812, was associated with opposite POEs for FEV1 (Supplementary Fig. 13). This variant has been shown to regulate the expression of a gene CNTN3 (45 kb, upstream) in heart and brain (p = 2.2 × 10–20)18. Studies in mice have suggested that this gene is imprinted and maternally expressed in the murine placenta24. Variant rs142030841 in the intron of the gene TPGS2 on chromosome 18 has opposite POEs with neutrophil levels (Supplementary Fig. 14). This SNP is an expression quantitative trait locus (eQTL) for the noncoding RNA gene RP11-95O2.5 in skin, testis, breast, thyroid and adipose tissue (p = 4.5 × 10–09), for CELF4 in tibial nerve and lung (p = 6 × 10–11), and for TPGS2 in tibial artery and transformed fibroblast cells (p = 1.5 × 10–05)18.
Parent-of-origin effects on gene expression
To determine if any of the associated variants also showed POEs on gene expression in the Hutterites, we used RNA-seq gene expression data from lymphoblastoid cell lines (LCLs) collected from 430 of the individuals in the GWAS sample. We first tested for association of maternal and paternal variants with genes detected as expressed in the LCLs and whose transcript start site was within 1 Mb of each associated SNP (Supplementary Table 3). There were no significant associations after multiple testing correction, similar to a previous study6. However, because we considered this to be exploratory analyses, we show results for the five most significant parent-of-origin eQTLs (Table 3). We next used the opposite effect model for each SNP in Table 2 and expression of all genes that were detected as expressed in LCLs and whose transcript start site was within 1 Mb of the associated SNP (Supplementary Table 4). This resulted in 57 tests (1 SNP for each of 8 phenotypes, and 57 genes). The five most significant opposite effect eQTLs, none of which passed the Bonferroni threshold of 8.77 × 10–4, are shown in Table 4. The most significant opposite effect eQTL was for POLR1E expression with the SNP on chromosome 9 (rs7033776) that was associated with total cholesterol (opposite effect eQTL p = 9.86 × 10–4) (Supplementary Fig. 15). POLR1E is involved in the purine metabolism pathway as well as DNA-directed polymerase activity. The same SNP, rs7033776, had modest opposite effects with the expression of three other genes in the region (PAX5, FBXO10, and FRMPD1), a signature consistent with an imprinted region. Another SNP with opposite POEs on LVMI, rs16853098, was an opposite effect eQTL for STK39, a gene that has been previously associated with hypertension25.
Replication in SardiNIA
We attempted replication of our opposite effect GWAS and single parent association studies in the Sardinia population (Supplementary Table 5). Ten SNPs significantly associated with nine phenotypes (BMI, Total cholesterol, eosinophil count, triglycerides, FEV1, LDL-C, LVMI, neutrophil count, age of menarche) with opposite effects in the Hutterites were analyzed using the same method in the Sardinia population. The SNP on chromosome 5, rs77785972, associated with BMI was close to suggestively replicated in Sardinia with p-value 7.7e-05 using a replication p-value threshold of 0.005 to correct for the ten tests performed. However, the difference in parental effect size (−1.90) was in the opposite direction. The difference in effect size and p-value could be due to differences in sample sizes, transformations, or other standardization methods on the data or population differences. Of the remaining eight single parent associations with six phenotypes (SBP, age of menarche, LDL-C, Total cholesterol, CIMT, FEV1), only one SNP with FEV1 was nominally significant (p-value 0.0391; replication p-value threshold 0.003125 for 16 tests) in the Sardinians.
Discussion
In this study, we introduced a statistical method that allows assessment of standard GWAS signals along with measures of differential POEs on common quantitative phenotypes. Similar to previous parent-of-origin studies of fewer phenotypes6,7,8, we tested for associations of maternally- or paternally derived alleles with each phenotype. We then extended this method to identify variants for which maternally- and paternally derived alleles have different, including opposite, effects on phenotypic values. Others have used similar methods to test for opposite effects in body weight and growth in mice26 or on methylation levels in humans27. In contrast, our study focused on 21 common disease-associated phenotypes in a single large pedigree and allowed us to broadly survey physiological effects of putative imprinted regions and the candidate genes at each associated locus.
Our studies of > 1000 Hutterites who are related to each other in a single pedigree allowed us to detect POEs, even when few genome-wide significant associations were detected in standard GWAS of the same phenotypes. Our method revealed parent-of-origin specific genome-wide significant associations for seven of the 21 phenotypes examined, with maternally inherited alleles associated with four phenotypes, paternally inherited alleles with three phenotypes (Table 1), and opposite parent-of-origin alleles with nine phenotypes, of which five also showed single POEs at different loci (Table 2). Overall, 11 of the 21 phenotypes examined showed genome-wide significant evidence of POEs with alleles at one or more loci. In contrast, standard GWAS of these same phenotypes and using the same markers in these same individuals revealed genome-wide significant association for only five traits.
It is notable that four of the nine significant opposite parent-of-origin effects (one each with LDL-C and triglycerides, and two with BMI) lie in or near long intergenic noncoding RNA genes (lincRNAs). LincRNAs are a feature of imprinted regions1, where they can silence the expression of genes on the opposite chromosome3,28. One of the variants, rs1032596, with an opposite parent-of-origin effect on LDL-C is located in the LINC01081 gene. This noncoding RNA, along with LINC01082, regulates the FOXF1 enhancer resulting in FOXF1 parent- and tissue-specific activity19 providing experimental support for tissue-specific expression, a feature of imprinted regions.
Another variant with POEs in our study has been suggested to be imprinted in previously published work. The variant associated with opposite POEs for FEV1 is an eQTL for the gene CNTN3. CNTN3 was shown to have exclusive maternal allele-specific expression in murine placentas24, although this finding may have been due to contaminating maternal cells29,30.
Other regions associated with POEs harbor genes involved in transcriptional repression (e.g., SCMH1 with LVMI on chromosome 1) or the associated SNPs are reported as eQTLs in GTEx with expression in tissues relevant to the phenotype under investigation (e.g., the LVMI-associated SNPs are eQTLs for XIRP2, which is expressed in skeletal muscle and heart left ventricle)18. Overall, these patterns of expression provide additional support that the parent-of-origin associations in our study are flagging imprinted regions or regions involved in the regulation of gene expression. Finally, we used gene expression in LCLs from the Hutterites to directly test for parent-of-origin eQTLs among SNPs associated with phenotypes in the parent-of-origin GWAS. Although none of the parent-of-origin eQTLs met criteria for significance after correcting for multiple testing, the SNP on chromosome 9 with opposite POEs on total cholesterol levels was borderline significant as an opposite parent-of-origin eQTL for POLR1E, and possible for three other genes at the same locus (PAX5, FBXO10, and FRMPD1). The presence of multiple genes with potential parent-of-origin expression patterns is further supportive of an imprinted locus. The availability of gene expression only in LCLs from the Hutterites limits the inferences we can draw about effects on expression because imprinted regions are often tissue-specific and sometimes developmentally regulated1,2. Despite this limitation, the fact that many of the SNPs associated with POEs on phenotypes are themselves eQTLs in relevant tissues (GTEx) and some are suggestive of having POEs on expression in LCLs from the Hutterites is generally supportive of the suggestion that some of the regions identified in this study are imprinted or have network interactions with imprinted genes31 in humans. Additionally, our data suggest that loci with POEs influence a broad spectrum of quantitative phenotypes that are themselves risk factors for common diseases.
In particular, the discovery of POEs for eight traits that are associated cardiovascular disease risk is intriguing. These include metabolic phenotypes, such as BMI, total cholesterol, triglycerides, LDL, and age of menarche, that have indirect effects on cardiac health, as well as LVMI and CIMT, which more directly reflect cardiac health. Some of these phenotypes showed associations with paternally inherited alleles only (systolic blood pressure, LDL-C, total cholesterol), maternally inherited alleles only (LVMI, CIMT, and age at menarche), and/or with opposite effect variants (BMI, LDL-C, triglycerides, total cholesterol, LVMI, age at menarche). It has been suggested that genomic imprinting evolved in the mammalian lineage as a way to regulate maternally and paternally expressed genes in the placenta during pregnancy and modulate metabolic functions related to growth, where the parental interests may be in conflict—paternal alleles favoring growth of the fetus at the expense of the mother while maternal alleles favor restricting resources to the fetus to ensure preservation of her nutritional needs3,28,32. Our data show some effects that are consistent with this theory. For example, three independent paternally inherited alleles on chromosome 1 are associated with increased LDL-C (Fig. 2) and total cholesterol (Supplementary Fig. 7); a paternal allele on chromosome 13 is also associated with increased systolic blood pressure (Supplementary Fig. 6). However, it is not always possible to interpret our results in light of this model, such as the association of maternal allele on chromosome 2 with decreased CIMT (Supplementary Fig. 3), where decreased cardiovascular risk is associated with latter age of menarche, or the maternal allele on chromosome 16 associated with decreased age of menarche (Fig. 1), which is associated with increased cardiovascular risk33. However, because many of the traits associated with POEs in this study were measured in adults, and none were measured in neonates, we are likely observing the downstream effects of processes that occurred in utero. Nonetheless, this kinship theory, or parent-conflict hypothesis, could account for the enrichment of parent-of-origin associations, particularly those with opposite effects, among metabolic and CVD-associated traits1.
Although we identified two parent-of-origin associations with nominal significance in the Sardinia population, an opposite effect association with BMI and a single parent maternal association with FEV1, there are many potential reasons for the overall lack of replication. It is possible, for example, that the most significant SNP is not the causal SNP and LD structures differ in the Hutterites and the Sardinian population. Additionally, it is also possible that different genetic backgrounds modify these associations. Lastly, the Hutterites and Sardinians are exposed to different environments and have different lifestyles that could differentially affect the associations between genotype and phenotype.
Finally, we note that the parent-of-origin GWAS for 21 phenotypes in the Hutterites revealed overall twice as many genome-wide significant loci compared to standard GWAS of the same phenotypes in the same individuals, suggesting that variation at imprinted loci may represent some of the missing heritability of these phenotypes and potentially for the disease for which they confer risk. This idea is consistent with observations in both mice and humans34. POEs in mice contribute disproportionally to the heritability of 97 traits, including those related to total cholesterol, weight, HDL, and triglycerides35. Exactly how much loci with POEs in humans contribute to phenotypic variation and disease risk overall remains to be determined, but our study provides compelling evidence that it is likely to be significant for many important traits.
Methods
Sample composition
The individuals in this study have participated in one or more of our studies on the genetics of complex traits in the Hutterites36,37,38. The more than 1500 Hutterites in our study are related to each other in a 13-generation pedigree including 3671 individuals. Informed consent was obtained from all subjects, under University of Chicago IRB-approved protocols.
Genotype data
Variants detected in the whole genome sequences of 98 Hutterites were previously imputed to an additional 1317 individuals who were genotyped on one of three Affymetrix arrays (500k, 5.0, and 6.0) using PRIMAL, a high-accuracy pedigree-based imputation method17. PRIMAL is a phasing and imputation software that uses pedigree-based identity-by-descent (IBD) information for accurate analyses. IBD segments are obtained with Hidden-Markov Models and organized into an IBD clique dictionary, a data structure for efficient lookup queries that enables fast imputation. IBD cliques serve the role of ‘parents’ in family-based phasing. The method is characterized by very high-accuracy on the chromosomes that are shared IBD. PRIMAL was used to phase alleles and assign parent-of-origin for 83% of about 12 million autosomal SNPs. For these studies, we selected SNPs that had a MAF > 1% and genotype call rate > 85%. This yielded 5,891,982 autosomal SNPs. Parent-of-origin allele call rates differed among individuals and between phenotypes (Supplementary Table 1).
Phenotype data
We included 21 quantitative phenotypes that were previously measured in the Hutterites. Descriptions for each phenotype, as well as exclusion criteria, transformations, and covariates used with each phenotype in the GWAS, are available in the Supplementary Methods (Supplementary Table 1). Detailed descriptions for 18 of the 21 phenotypes can be found in Cusanovich et al.36 The remaining three are described here. Height was measured in cm on a stadiometer with shoes removed. BMI was calculated using weight (kg, measured on scale) divided by height (m) squared. Age at menarche was collected retrospectively by interview.
Genome-wide association studies
We used a linear mixed model as implemented in GEMMA to test for genome-wide association with 21 phenotypes using an additive model. We corrected for relatedness, as well as relevant covariates (Supplementary Table 1). We used a threshold of significance at p < 5 × 10–8. This threshold does not account for the extensive linkage disequilibrium in the Hutterites (which reduces the effective number of tests) or for the 21 GWAS performed (which increases the number of tests). For these reasons, we consider the threshold to be anti-conservative. The results are summarized in Supplementary Table 2.
Maternal and paternal GWAS
To evaluated the evidence for POEs, we tested maternal and paternal alleles separately with each phenotype, comparing phenotypic differences between the maternally inherited alleles and between the paternally inherited alleles. We used a linear mixed model as implemented in GEMMA, which allows us to correct for relatedness as a random effect, as well as sex, age, and other covariates as fixed effects39. The linear mixed model for the parent-of-origin GWAS for testing maternal alleles and paternal alleles is shown in Eqs. 1 and 2, respectively.
n is the number of individuals, Y is an n × 1 vector of quantitative traits, W is an n × c matrix of covariates (fixed effects) including intercept 1. α is a c × 1 vector of covariate coefficients. XM is an n × 1 vector of maternal alleles, and Xp an n ×1 vector of paternal alleles. βM and βp are the effect sizes of maternal and paternal alleles, respectively. g is a vector of genetic effects with \({\mathbf{g}}\sim {\mathrm{N}}({0,{{A\sigma }}_{\mathrm{g}}^2})\)where A is the genetic relatedness matrix; ε is a vector of non-genetic effects with \({\mathbf{\varepsilon }}\sim {\mathrm{N}}(0,{\mathrm{I\sigma }}_{\mathrm{e}}^2)\). Parent-of-origin allele frequency of significant SNPs are in Supplementary Table 6.
Differential effect GWAS (PO-GWAS)
To test for a difference in the same allele inherited from each parent, including opposite effects, we re-parameterized the test model (Equation 3) from Garg et al.8 resulting in a regression model similar to that used in Wolf et al.26,27.The null model (Eq. 4) is a standard GWAS model, ignoring parent-of-origin of alleles. The test model (Eq. 3) is more significant when maternal and paternal alleles have differential effects on gene expression.
This new model allows us to measure the difference in parental effect of the same allele when the genotype is a covariate in Eq. 5.
XPM is a n × 1 vector of genotypes with possible values [0,1,2], equivalent to Xp+ XM. \(\left( {{\boldsymbol{\beta }}_{\mathbf{M}} - {\boldsymbol{\beta }}_{\mathbf{P}}} \right)\) is the difference in parental effect size. If the difference in parental effect size is large and significantly different from 0 it suggests a parent-of-origin effect exists at this variant. \(\frac{{\left( {{\mathbf{X}}_{\mathbf{M}} - {\mathbf{X}}_{\mathbf{P}}} \right)}}{2}\) is a n × 1 vector of genotypes with possible values [−1,0,1]. \(\frac{{({\boldsymbol{\beta }}_{\mathbf{P}} + {\boldsymbol{\beta }}_{\mathbf{M}})}}{2}\) is the average parental effect size that is captured in normal GWAS using genotypes The average genotypes are added in as a covariate, with the average parental effect size the corresponding covariate coefficient. This differential effect GWAS was tested in GEMMA using BIMBAM format to use average genotype values40. Parent-of-origin allele frequency of significant SNPs are in Supplementary Table 6.
Parent-of-origin eQTL studies
RNA-seq data from LCLs were available for 430 Hutterites included in a previous study (50 bp single end reads; median depth of 10.5 million reads)36. For this study, sequencing reads were reprocessed as follows. Reads were trimmed for adaptors using Cutadapt (with reads < 5 bp discarded) then remapped to hg19 using STAR indexed with gencode version 19 gene annotations41,42. To remove mapping bias, reads were processed using WASP mapping pipeline43. Gene counts were collected using HTSeq-count44. VerifyBamID was used to identify sample swaps to include individuals that were previously excluded45. Genes mapping to the X and Y chromosome were removed; genes with a Counts Per Million (CPM) value of 1 (expressed with less than 20 counts in the sample with lowest sequencing depth) were also removed. Limma was used to normalize and convert counts to log transformed CPM values46. Technical covariates that showed a significant association at p < 0.05 with any of the top 10 principal components were regressed out (RNA Integrity Number and RNA concentration).
Maternal and paternal parent-of-origin eQTL
LCL RNA-seq data was used to test the single parent model for the most significant SNP from the maternal or paternal only GWAS for each phenotype. We selected all genes detected as expressed in the LCLs and residing within 1 Mb of each most significant associated SNP. Summary of the SNPs and genes tested are in Supplementary Table 3.
Differential parent-of-origin eQTL
LCL RNA-seq data was used to test the opposite effect model for the most significant SNP in each region that was associated with a phenotype in the parent-of-origin opposite effects GWAS. We selected all genes detected as expressed in the LCLs and residing within 1 Mb of each associated SNP. Summary of the SNPs and genes tested are in Supplementary Table 4.
Replication in SardiNIA
The SardiNIA study is a longitudinal population-based cohort study started in 2001 to study quantitative traits of biomedical relevance with a special emphasis on those influencing aging. In a first survey, the project recruited 6148 individuals from four towns in the Lanusei Valley (east-central Sardinia) and assessed 98 quantitative traits in over 62% of the eligible population living in the region (age 14–102 years), and at least 96% of the initial cohort have all grandparents born in the same province. Recently, the study recruited 773 additional individuals, involving a total of 6921 subjects. The longitudinal study, now in its 14th year and in its fourth phase, collected the longitudinal information on more than 1000 quantitative traits, including inflammatory markers and immune related traits, that can be scored on a continuous scale. Written informed consent was obtained from all participants. Genotypes were obtained after imputation with a reference panel generated from low pass sequencing of 3514 Sardinians as described in Sidore et al.47. The parent-of-origin genotypes were evaluated with a custom script applied to the 1308 complete trios present in the cohort. Trait measures were adjusted by age and age squared; residuals were inverse-normalized by quantile transformation. Single parent effects were tested separately for maternal and paternal allele using epacts (test q.emmax)48. Opposite effects were tested with the same methods as in the Hutterites, described above.
Code availability
Code for PO-GWAS: https://github.com/smozaffari/PO_GWAS
Data availability
The accession number for the Hutterite data reported in this paper are dbGaP:phs000185.
References
Peters, J. The role of genomic imprinting in biology and disease: an expanding view. Nat. Rev. Genet. 15, 517–530 (2014).
Falls, J. G., Pulford, D. J., Wylie, A. A. & Jirtle, R. L. Genomic imprinting: Implications for human disease. Am. J. Pathol. 154, 635–647 (1999).
Patten, M. M., Cowley, M., Oakey, R. J. & Feil, R. Regulatory links between imprinted genes: evolutionary predictions and consequences. Proc. R. Soc. B 283, 20152760 (2016).
Gabory, A. et al. H19 acts as a trans regulator of the imprinted gene network controlling growth in mice. Development 136, 3413–3421 (2009).
Varrault, A. et al. Zac1 regulates an imprinted gene network critically involved in the control of embryonic growth. Dev. Cell 11, 711–722 (2006).
Benonisdottir, S. et al. Epigenetic and genetic components of height regulation. Nat. Commun. 7, 13490 (2016).
Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).
Garg, P., Borel, C. & Sharp, A. J. Detection of parent-of-origin specific expression quantitative trait loci by Cis-association analysis of gene expression in trios. PLoS ONE 7, e41695 (2012).
Perry, J. R. et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature 514, 92–97 (2014).
Baran, Y. et al. The landscape of genomic imprinting across diverse adult human tissues. Genome Res. 25, 927–936 (2015).
Zoledziewska, M. et al. Height-reducing variants and selection for short stature in Sardinia. Nat. Genet. 47, 1352–1356 (2015).
Chuang, T. J., Tseng, Y. H., Chen, C. Y. & Wang, Y. D. Assessment of imprinting- and genetic variation-dependent monoallelic expression using reciprocal allele descendants between human family trios. Scientific Reports 7, 1–12 (2017).
Howey, R. & Cordell, H. J. PREMIM and EMIM: tools for estimation of maternal, imprinting and interaction effects using multinomial modelling. BMC Bioinforma. 13, 149 (2012).
Ainsworth, H. F., Unwin, J., Jamison, D. L. & Cordell, H. J. Investigation of maternal effects, maternal-fetal interactions and parent-of-origin effects (imprinting), using mothers and their offspring. Genet. Epidemiol. 35, 19–45 (2010).
Weinberg, C. R. Methods for detection of parent-of-origin effects in genetic studies of case-parents triads. Am. J. Human. Genet. 65, 229–235 (1999).
Weinberg, C. R., Wilcox, A. J. & Lie, R. T. A log-linear approach to case-parent–triad data: Assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am. J. Human. Genet. 62, 969–978 (1998).
Livne, O. E. et al. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population. PLoS Comput. Biol. 11, e1004139 (2015).
GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Szafranski, P. et al. Pathogenetics of alveolar capillary dysplasia with misalignment of pulmonary veins. Hum. Genet. 135, 569–586 (2016).
Wang, Q., Lin, J. L. -C., Erives, A. J., Lin, C. -I. & Lin, J. J. -C. New insights into the roles of Xin repeat-containing proteins in cardiac development, function, and disease. Int. Rev. Cell Mol. Biol. 310, 89–128 (2014).
Nilsson, M. I. et al. Xin is a marker of skeletal muscle damage severity in myopathies. Am. J. Pathol. 183, 1703–1709 (2013).
GTEx Consortium. Human genomics. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucl. Acids Res. 44, D733–D745 (2016).
Brideau, C. M., Eilertson, K. E., Hagarman, J. A., Bustamante, C. D. & Soloway, P. D. Successful computational prediction of novel imprinted genes from epigenomic features. Mol. Cell. Biol. 30, 3357–3370 (2010).
Wang, Y. et al. Whole-genome association study identifies STK39 as a hypertension susceptibility gene. Proc. Natl Acad. Sci. USA 106, 226–231 (2009).
Wolf, J. B., Cheverud, J. M., Roseman, C. & Hager, R. Genome-Wide Analysis Reveals a Complex Pattern of Genomic Imprinting in Mice. PLoS Genet. 4, e1000091–12 (2008).
Cuellar-Partida, G. et al. Genome-wide survey of parent-of-origin effects on DNA methylation identifies candidate imprinted loci in humans. Hum. Mol. Genet. 27, 2927–2939 (2018).
Barlow, D. P. & Bartolomei, M. S. Genomic imprinting in mammals. Cold Spring Harb. Perspect. Biol. 6, a018382–a018382 (2014).
Okae, H. et al. Re-investigation and RNA sequencing-based identification of genes with placenta-specific imprinted expression. Hum. Mol. Genet. 21, 548–558 (2011).
Proudhon, C. & Bourc’his, D. Identification and resolution of artifacts in the interpretation of imprinted gene expression. Brief. Funct. Genom. 9, 374–384 (2011).
Chess, A. Monoallelic gene expression in mammals. Annu. Rev. Genet. 50, 317–327 (2016).
Haig, D. The kinship theory of genomic imprinting. Annu. Rev. Ecol. Syst. 31, 9–32 (2000).
Canoy, D. et al. Age at menarche and risks of coronary heart and other vascular diseases in a large UK cohort. Circulation 131, 237–244 (2015).
Laurin, C. A. et al. Partitioning phenotypic variance due to parent-of-origin effects using genomic relatedness matrices. Behav. Genet. 48, 67-79 (2017).
Mott, R. et al. The architecture of parent-of-origin effects in mice. Cell 156, 332–342 (2014).
Cusanovich, D. A. et al. Integrated analyses of gene expression and genetic association studies in a founder population. Hum. Mol. Genet. 25, 2104–2112 (2016).
Weiss, L. A., Abney, M., Cook, E. H. Jr. & Ober, C. Sex-specific genetic architecture of whole blood serotonin levels. Am. J. Human. Genet. 76, 33–41 (2005).
Abney, M., McPeek, M. S. & Ober, C. Broad and narrow heritabilities of quantitative traits in a founder population. Am. J. Human. Genet. 68, 1302–1307 (2001).
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Servin, B. & Stephens, M. Imputation-based analysis of association studies: Candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2012).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10 (2011).
van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Meth 12, 1061–1063 (2015).
Anders, S., Pyl, P. T. & Huber, W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Jun, G. et al. Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data. Am. J. Human. Genet. 91, 839–848 (2012).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucl. Acids Res. 43, e47–e47 (2015).
Sidore, C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015).
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
Acknowledgements
We thank Catherine Stanhope for help with processing phenotype data, Mark Abney, John Novembre, and members of the Ober lab for useful discussions, Joe Urbanski and Lorenzo Pesce for assistance using Beagle, the many members of our field trip teams for help in phenotyping and collecting and processing samples, and the Hutterites for their continued support of our studies. This work was supported by NIH grants HL085197 and HD21244; and in part by NIH through resources provided by the Computation Institute and the Biological Sciences Division of the University of Chicago and Argonne National Laboratory, under grant 1S10OD018495-01. S.V.M was supported by NIH Grant T32 GM007197 and the Ruth L. Kirschstein NRSA Award F31HL134315.
Author information
Authors and Affiliations
Contributions
S.V.M., D.L.N., and C.O. designed the study and wrote the paper. J.M.D., S.J.S., and R.M.L provided clinical data. S.V.M. performed analyses. C.S. performed replication analyses. S.V.M., D.L.N., C.O., J.M.D., S.J.S, R.M.L., C.S., E.F., F.C, discussed results and commented on the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mozaffari, S.V., DeCara, J.M., Shah, S.J. et al. Parent-of-origin effects on quantitative phenotypes in a large Hutterite pedigree. Commun Biol 2, 28 (2019). https://doi.org/10.1038/s42003-018-0267-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-018-0267-4
This article is cited by
-
Phenome-wide analyses identify an association between the parent-of-origin effects dependent methylome and the rate of aging in humans
Genome Biology (2023)
-
A family-based study of genetic and epigenetic effects across multiple neurocognitive, motor, social-cognitive and social-behavioral functions
Behavioral and Brain Functions (2022)
-
Sex-specific differences in peripheral blood leukocyte transcriptional response to LPS are enriched for HLA region and X chromosome genes
Scientific Reports (2021)
-
Evidence for germline non-genetic inheritance of human phenotypes and diseases
Clinical Epigenetics (2020)
-
Quantitative genome-wide association analyses of receptive language in the Danish High Risk and Resilience Study
BMC Neuroscience (2020)