Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Genetic variants regulating expression levels and isoform diversity during embryogenesis

Abstract

Embryonic development is driven by tightly regulated patterns of gene expression, despite extensive genetic variation among individuals. Studies of expression quantitative trait loci1,2,3,4 (eQTL) indicate that genetic variation frequently alters gene expression in cell-culture models and differentiated tissues5,6. However, the extent and types of genetic variation impacting embryonic gene expression, and their interactions with developmental programs, remain largely unknown. Here we assessed the effect of genetic variation on transcriptional (expression levels) and post-transcriptional (3′ RNA processing) regulation across multiple stages of metazoan development, using 80 inbred Drosophila wild isolates7, identifying thousands of developmental-stage-specific and shared QTL. Given the small blocks of linkage disequilibrium in Drosophila7,8,9, we obtain near base-pair resolution, resolving causal mutations in developmental enhancers, validated transcription-factor-binding sites and RNA motifs. This fine-grain mapping uncovered extensive allelic interactions within enhancers that have opposite effects, thereby buffering their impact on enhancer activity. QTL affecting 3′ RNA processing identify new functional motifs leading to transcript isoform diversity and changes in the lengths of 3′ untranslated regions. These results highlight how developmental stage influences the effects of genetic variation and uncover multiple mechanisms that regulate and buffer expression variation during embryogenesis.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Quantifying developmental and genetic variance.
Figure 2: Regulatory QTL at developmental enhancers.
Figure 3: 3′ Isoform QTL during embryonic development.
Figure 4: Developmental and genetic regulation of 3′ UTR length.

Similar content being viewed by others

Accession codes

Primary accessions

ArrayExpress

References

  1. Jin, W. et al. The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nature Genet. 29, 389–395 (2001)

    Article  CAS  Google Scholar 

  2. Schadt, E. E. et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003)

    Article  ADS  CAS  Google Scholar 

  3. Cheung, V. G. et al. Mapping determinants of human gene expression by regional and genome-wide association. Nature 437, 1365–1369 (2005)

    Article  ADS  CAS  Google Scholar 

  4. Stranger, B. E. et al. Population genomics of human gene expression. Nature Genet. 39, 1217–1224 (2007)

    Article  CAS  Google Scholar 

  5. Pickrell, J. K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010)

    Article  ADS  CAS  Google Scholar 

  6. Stranger, B. E. et al. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 8, e1002639 (2012)

    Article  CAS  Google Scholar 

  7. Huang, W. et al. Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines. Genome Res. 24, 1193–1208 (2014)

    Article  CAS  Google Scholar 

  8. Zichner, T. et al. Impact of genomic structural variation in Drosophila melanogaster based on population-scale sequencing. Genome Res. 23, 568–579 (2013)

    Article  CAS  Google Scholar 

  9. Massouras, A. et al. Genomic variation and its impact on gene expression in Drosophila melanogaster. PLoS Genet. 8, e1003055 (2012)

    Article  CAS  Google Scholar 

  10. Graveley, B. R. et al. The developmental transcriptome of Drosophila melanogaster. Nature 471, 473–479 (2011)

    Article  ADS  CAS  Google Scholar 

  11. Ulitsky, I. et al. Extensive alternative polyadenylation during zebrafish development. Genome Res. 22, 2054–2066 (2012)

    Article  CAS  Google Scholar 

  12. Derti, A. et al. A quantitative atlas of polyadenylation in five mammals. Genome Res. 22, 1173–1183 (2012)

    Article  CAS  Google Scholar 

  13. Smibert, P. et al. Global patterns of tissue-specific alternative polyadenylation in Drosophila. Cell Reports 1, 277–289 (2012)

    Article  CAS  Google Scholar 

  14. Brown, J. B. et al. Diversity and dynamics of the Drosophila transcriptome. Nature 512, 393–399 (2014)

    Article  ADS  CAS  Google Scholar 

  15. Korte, A. et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nature Genet. 44, 1066–1071 (2012)

    Article  CAS  Google Scholar 

  16. Casale, F. P., Rakitsch, B., Lippert, C. & Stegle, O. Efficient set tests for the genetic analysis of correlated traits. Nature Methods 12, 755–758 (2015)

    Article  CAS  Google Scholar 

  17. Brem, R. B., Yvert, G., Clinton, R. & Kruglyak, L. Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755 (2002)

    Article  ADS  CAS  Google Scholar 

  18. Tadros, W. & Lipshitz, H. D. The maternal-to-zygotic transition: a play in two acts. Development 136, 3033–3042 (2009)

    Article  CAS  Google Scholar 

  19. Lippert, C., Casale, F. P., Rakitsch, B. & Stegle, O. LIMIX: genetic analysis of multiple traits. Preprint at http://dx.doi.org/10.1101/003905 (2015)

  20. Nakamura, A., Yoshizaki, I. & Kobayashi, S. Spatial expression of Drosophila Glutathione S-transferase-D1 in the alimentary canal is regulated by the overlying visceral mesoderm. Dev. Growth Differ. 41, 699–702 (1999)

    Article  CAS  Google Scholar 

  21. Ghavi-Helm, Y. et al. Enhancer loops appear stable during development and are associated with paused polymerase. Nature 512, 96–100 (2014)

    Article  ADS  CAS  Google Scholar 

  22. Kvon, E. Z. et al. Genome-scale functional characterization of Drosophila developmental enhancers in vivo. Nature 512, 91–95 (2014)

    Article  ADS  CAS  Google Scholar 

  23. Yoon, O. K., Hsu, T. Y., Im, J. H. & Brem, R. B. Genetics and regulatory impact of alternative polyadenylation in human B-lymphoblastoid cells. PLoS Genet. 8, e1002882 (2012)

    Article  CAS  Google Scholar 

  24. Best, A. et al. Tra2 protein biology and mechanisms of splicing control. Biochem. Soc. Trans. 42, 1152–1158 (2014)

    Article  CAS  Google Scholar 

  25. Sandberg, R., Neilson, J. R., Sarma, A., Sharp, P. A. & Burge, C. B. Proliferating cells express mRNAs with shortened 3′ untranslated regions and fewer microRNA target sites. Science 320, 1643–1647 (2008)

    Article  ADS  CAS  Google Scholar 

  26. Hilgers, V. et al. Neural-specific elongation of 3′ UTRs during Drosophila development. Proc. Natl Acad. Sci. USA 108, 15864–15869 (2011)

    Article  ADS  CAS  Google Scholar 

  27. Ji, Z., Lee, J. Y., Pan, Z., Jiang, B. & Tian, B. Progressive lengthening of 3′untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc. Natl Acad. Sci. USA 106, 7028–7033 (2009)

    Article  ADS  CAS  Google Scholar 

  28. Hilgers, V., Lemke, S. B. & Levine, M. ELAV mediates 3′ UTR extension in the Drosophila nervous system. Genes Dev. 26, 2259–2264 (2012)

    Article  CAS  Google Scholar 

  29. Vu, V. et al. Natural variation in gene expression modulates the severity of mutant phenotypes. Cell 162, 391–402 (2015)

    Article  CAS  Google Scholar 

  30. Montgomery, S. B., Lappalainen, T., Gutierrez-Arcelus, M. & Dermitzakis, E. T. Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011)

    Article  CAS  Google Scholar 

  31. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nature Protocols 7, 500–507 (2012)

    Article  CAS  Google Scholar 

  32. Lott, S. E. et al. Noncanonical compensation of zygotic X transcription in early Drosophila melanogaster development revealed through single-embryo RNA-seq. PLoS Biol. 9, e1000590 (2011)

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported technically by the European Molecular Biology Laboratory (EMBL) Genomics Core facility, and financially by the European Research Council (ERC; FP/2007-2013), ERC advanced grant CisRegVar to E.E.M.F., EMBL predoctoral funds to E.E.M.F. and E.B.

Author information

Authors and Affiliations

Authors

Contributions

E.E.M.F., E.B., E.C. and N.K. designed the study, explored results and prepared the manuscript, with contributions from all authors. E.C. led the experiments with help from L.C., H.E.G., R.R.V., R.M.-F. and B.Z. N.K. led the data processing and QTL calling, with help from F.P.C., J.F.D. and O.S. D.H. led the biological analysis, with input from D.G and N.K.

Corresponding authors

Correspondence to Ewan Birney or Eileen E. M. Furlong.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks S. Celniker and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 3′-Tag-seq provides an accurate measure of gene expression.

a, Tightly staged embryos at three embryonic time points were collected from 80 genetically diverse inbred lines. Stages 5–6 (2–4 h) before lineage commitment, including blastoderm stages, stages 10–11 (6–8 h) when major cell lineages within the mesoderm and ectoderm are specified, stages 13–14 (10–12 h), onset of tissue differentiation. b, Confirming the developmental stage of all samples: 3′ Tag-seq gene expression levels for all 254 samples (including replicates) were correlated to time points from a reference embryonic time-course (modENCODE). Two-hundred and forty-two samples showed their strongest, and a further five samples their second-strongest, correlation with the expected time points (indicated in red). The remaining mis-staged samples were discarded from the QTL analysis. c, 3′-Tag-seq is highly reproducible; correlation between two biological replicates (independent embryo collections, RNA-extraction and 3′-Tag-seq) from the Drosophila Genetic Reference Panel (DGRP) line RAL-375. d, Gene expression level estimates from 3′-Tag-seq are highly correlated with standard RNA-seq. Scatter plot shows 5% trimmed means of expression levels from 22 RNA samples, obtained from 10–12 h staged embryos from 22 inbred lines, sequenced by both 3′-Tag-seq and standard RNA-seq. Spearman’s rho of means = 0.9, P < 2.2 × 10−16.

Extended Data Figure 2 Variance decomposition analysis.

a, Out-of-sample prediction to assess the importance of the genotype by developmental component in the variance component analysis (shown in Fig. 1b). Left, comparison of the out-of-sample Pearson correlation coefficient without accounting for GxD interactions (x axes) versus the full model, including a trans GxD component (y axes). Right, corresponding analysis for the cis component. Prediction performance was assessed using tenfold cross-validation. Genes above the diagonal (red) indicate improved prediction by accounting for GxD. b, c, Biological and molecular function gene ontology (GO) enrichments are shown for all significant terms (Fisher’s exact test, P value < 0.05). Clustergrams reflect the fraction of genes shared by any two categories, with 0 indicating nested categories and 1 reflecting orthogonality. b, Enrichment for trans variation. Genes that function in DNA binding (for example, transcription factor activity) are almost exclusively enriched. c, Enrichment for cis variation. Genes that function in metabolic processes, RNA maturation and binding are enriched. d, F1 embryos from genetic crosses between three genotypes (RAL-517 × RAL-765 and RAL-362 × RAL-517), were collected at three stages of embryogenesis. Mean relative expression of maternal allele as a fraction of total expression is shown for each developmental stage. Point of balanced maternal/paternal expression is indicated by red dotted line.

Extended Data Figure 3 Getting to the causal genetic variant of embryonic eQTL.

a, Overview of eQTL and 3iQTL, with numbers of genes with eQTL in different functional categories. b, The relationship between stage-specific eQTL (single-stage) and gene expression. Box plot, median expression levels of genes with stage-specific eQTL at each developmental stage. Genes are grouped by eQTL stage specificity. Genes do not appear more highly expressed at the stage at which the stage-specific eQTL was called compared to other stages. c, Gene ontology enrichment of common eQTL. y Axis shows −log10 P value (Fisher’s exact test) after FDR correction of selected biological process gene ontology terms enriched (positive) or depleted (negative). eQTL are enriched in genes involved in metabolic and enzyme-driven catalytic processes (catalytic activity, FDR-adjusted P value = 0.0193), and depleted in essential developmental genes. Although true globally, there are a surprising number of essential developmental regulators with expression variation during embryogenesis, including 103 transcription factors, 27 of which have relatively large effect sizes (a, bottom table). d, Global enrichment of eQTL in gene features calculated using multivariate logistic regression. Bars represent 95% confidence interval of odds ratios, showing increased/decreased likelihood for a genetic variant to be a QTL. e, Distribution of lead eQTL variants in a meta gene plot showing the gene body and 1 kb upstream/downstream. f, Representative example of QTL with multiple significant associated variants. Top, Manhattan plot showing all tested variants in region (strongly associated variants in red, neighbouring genes with unadjusted P values in grey). Middle top, median 3′-Tag-seq coverage for Major (dark red) and Minor (blue) genotypes. For comparison, the Major signal is shown in light red and the Minor signal in light blue. Middle, library-size-adjusted coverage for each line with Major (red) and Minor (blue) genotypes. Bottom, median signal of standard RNA-seq on a subset of lines. Box plot (right), normalized expression levels across time points for both genotypes. ecd has multiple associated variants within the exon of the gene, the lead variant is associated with an increase in the overall levels of the gene’s expression in the Minor, compared with the Major, genotype.

Extended Data Figure 4 eQTL are highly concordant with three independent data sets.

a–d, Overlap between 3′ Tag-seq eQTL effects and different orthogonal data sets, showing genes with concordant (blue) and discordant (red) direction of effects. Black line shows linear fit through all data points. a, Comparison between common eQTL and expression level differences observed from standard RNA-seq of 22 lines at the 10–12 h stage. Only eQTL with at least two RNA-seq data points for both alleles and a predicted effect size of >0.25 at 10–12 h are shown. b, Comparison between common eQTL and ASE in embryos from two F1 crosses. Only genes with significant ASE (binomial test, P < 0.1) and consistent direction in all three developmental stages are shown. Circles, genes classified as maternal32; triangles, non-maternal genes (zygotic). c, Comparison between embryonic common eQTL and adult eQTL (microarray-based expression) from the same population9. Only genes for which the lead variant was tested in both studies are shown. d, Comparison between embryonic-stage-specific eQTL and adult eQTL from the same population9. Only genes for which the lead variant was tested in both studies are shown.

Extended Data Figure 5 Identifying the underlying cause of regulatory eQTL.

a, Global enrichment of QTL in DNaseI (DHS), TF-bound regions (ChIP) or occupied TF-binding sites (TFBS bound), compared with randomly selected regions of the genome using multivariate logistic regression. Bars represent 95% confidence interval of odds ratios. b, Heatmap showing distal QTL (>1 kb from TSS) overlap with putative developmental enhancers based on the presence of DHS, occupancy by two or more transcription factors (TFBS) or the histone modifications H3K4me1 and H3K27ac at one or more stages of embryogenesis. c, Frequency (left axis) and cumulative distribution (right axis) of eQTL >1 kb away from the TSS. d, Obtaining a high confidence set of bound TF motifs. Assessment of Drosophila TF ChIP data sets during embryogenesis with associated position weight matrices (PWMs). Only PWM-ChIP pairs with high area under the curve in receiver-operating characteristic curves and enrichment over shuffled motifs were included. e, Broken TFBS (in the Minor genotype) in regions bound by the TF (dark red) or DHS (light red) during embryogenesis. f, Created TFBS (in the Minor genotype) for the indicated TF in DHS regions during embryogenesis. g, Luciferase assays of regulatory QTL associated with the genes indicated. The activity of each Minor allele and Majormin (where the Minor allele variant was placed in the Major allele) is compared to that of the Major genotype (set to one). All values are expressed as mean ± s.d. for three biological replicates, normalized to values of the Major allele. For the regulatory region associated with CG9601, the lead variant is causal and has the same effect in the Minor or Major background. CG1113, CG5039 and CG31922 confirm the significant difference between the Major and Minor genotypes, but the lead variant tested is not causal. Student’s t-test, *P < 0.05, **P < 0.01, ***P < 0.001.

Extended Data Figure 6 QTL in occupied transcription factor binding sites

a, Top, Manhattan plot showing all tested variants around CG17343 locus (lead variant in red, neighbouring genes with unadjusted P values in grey). Middle, ChIP-chip signal for the GATA transcription factor Pannier (Pnr) at 4–6 h (stages 8/9) and 6–8 h (stages 10–11, matching the middle QTL time point) of development. Bottom, Pnr PWM and changes in its motif in Minor allele (red). Box plot (right), normalized expression levels across time points for both genotypes. b, Luciferase assay for the promoter-proximal element, showing a 0.85-fold decrease in the Minor genotype over the Major genotype. When the Minor allele variant is placed in the Major genetic background (Majormin) expression is decreased to 0.36-fold. c, As in a for the Eogt locus. Note that the QTL is stage-specific, significantly affecting gene expression at the middle time point (6–8 h; indicated as filled boxes in boxplot), matching the occupancy of Pnr at this stage. d, Luciferase assay for the Eogt 3′ element. Although the GATA site mutation has little effect within the Minor genotype, it causes a significant reduction in expression when placed in the Major background (Major min), reducing reporter gene expression to 0.47-fold. b, d, All values are expressed as mean ± s.d. for three biological replicates, normalized to values of the Major allele. Student’s t-test, **P < 0.01, ***P < 0.001.

Extended Data Figure 7 Global scans for allelic heterogeneity and epistasis.

a, b, A simple linear model was used to test for allelic heterogeneity (marginal effects of additional loci; a) and interactions/epistasis (b) for all variants within 500 bp (approximate size of regulatory elements) of the lead variant for each common eQTL. This process was iterated to remove all significant marginal effects and the residuals used to test for epistatic interactions with the lead variant. Resulting P values for both marginal and epistatic effects were corrected for multiple testing (Bonferroni) within each eQTL. The minimum P value for each eQTL was then corrected across all eQTL and used to control the FDR for epistatic and marginal effects. Distributions of these FDRs (a, b) are shown. Out of 1,164 gene eQTL tested, 9.3% have evidence for one or more marginal effects and 1.8% for an epistatic effect (FDR < 10%), despite limited power due to strong linkage and relatively modest sample sizes. c, Manhattan plots show Bonferroni-corrected P values for both marginal (top) and epistatic (middle top) effects, around the eQTL associated with the gene CG8564 (located ~35 kb downstream). Genetic variants are colour-coded based on their degree of linkage to the lead SNP (r2). d, Table showing summary statistics for marginal and epistatic effects.

Extended Data Figure 8 De novo motif discovery and enrichment of pA-associated RNA motifs.

a, Global enrichment of 3iQTL in gene features compared with randomly selected regions of the genome using multivariate logistic regression. Bars represent 95% profile confidence interval of odds ratios. Common 3iQTL are enriched for variants in both the 3′ and 5′ UTRs, while stage-specific 3iQTL are predominantly associated with 5′ regions, in keeping with their enrichment (both stage-specific eQTL and 3iQTL) for regulatory variants within enhancers and promoters. b, Schematic diagram of the alternative cleavage and polyadenylation machinery. CPSF and CSTF move from the RNA polymerase II C-terminal tail to the growing nascent mRNA strand. Scissors indicates the site of RNA cleavage (transcript end), which is then polyadenylated. c, De novo motif discovery identified known pA associated motifs at pA sites (E value is shown underneath motif logo). Two variations of the PAS motif (bound by CPSF) are the top two most enriched motifs. Right, the relative position of each site, centred on the transcriptional end site (TES). d, Two de novo discovered motifs (Supplementary Table 12) are highly enriched at pA sites and localized around the point of cleavage. e, Number of 3iQTL either breaking (top) or creating (bottom) motifs for known polyadenylation cleavage motifs (red), discovered positioned (orange), discovered unpositioned (dark green) and cisBP motifs (light green).

Extended Data Figure 9 3iQTL and utrQTL affecting known motifs.

a, b, Two representative examples for 3iQTL or utrQTL associated with disruption of a characterized RNA motif. Top, Manhattan plot showing tested variants around the CR42254 and CG8004 loci (lead variant in red, neighbouring genes with unadjusted P values in grey). Middle, median 3′-Tag-seq coverage between all tested lines for the Major (dark red) and Minor (blue) genotypes. For comparison, the Major and Minor genotype signals are shadowed. The sequences of RNA motifs broken and created by the QTL are shown in the middle right panel (disrupted base in red). Bottom, heatmap of 3′-Tag-seq coverage for each line.

Extended Data Figure 10 3iQTL can have phenotypes in vivo.

a, UTR-QTL breaking a PAS motif. Top, Manhattan plot of YL-1 locus (lead variant in red, neighbouring genes and unadjusted P values in grey). Middle, median 3′-Tag-seq coverage between all tested lines for the Major (dark red) and Minor (blue) genotypes. For comparison, the Major and Minor genotype signals are shadowed. Sequence of broken PAS motif (variant in red). Bottom, heatmap of 3′-Tag-seq coverage for each line. b, Luciferase assay for the YL-1 proximal pA region overlapping the lead SNP, with significantly higher activity in the Major compared to the Minor allele. When the Minor variant is placed in the Major background (Majormin), activity is decreased to the Minor level. Values are mean ± s.d. for three biological replicates, normalized to values of SV40 polyA control. Student’s t-test, **P < 0.01, ***P < 0.001. c, 3iQTL as hypomorphic alleles. Top, Manhattan plot showing tested variants around the vls locus (lead variant in red, neighbouring genes with unadjusted P values in grey). Middle, median 3′-Tag-seq coverage between all tested lines for the Major (dark red) and Minor (blue) genotypes. Bottom, heatmap of 3′-Tag-seq coverage for each line. d, Genetic epistasis experiment placing a loss-of-function vls2 allele (maintained over a CyO marked balancer chromosome) in trans to two Major or Minor 3iQTL alleles in the vls locus (shown in b: vls2/CyO × DGRP/DGRP). Each cross was performed in both directions, table shows the average viability index for each line (brackets show RAL identifiers of DGRP lines used). Viability index is the ratio of flies with the genotype DGRP/vls2 over DGRP/CyO: an index of 1 indicates no genetic interaction between the DGRP line and the vls2 allele. On average ~80 flies were counted for each of the eight crosses. Student’s t-test, *P < 0.05.

Supplementary information

Supplementary Information

This file contains a detailed description of the experimental and computational methods. (PDF 868 kb)

Supplementary Data

This file contains Supplementary Tables 1-14. (ZIP 36761 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cannavò, E., Koelling, N., Harnett, D. et al. Genetic variants regulating expression levels and isoform diversity during embryogenesis. Nature 541, 402–406 (2017). https://doi.org/10.1038/nature20802

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature20802

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing