The industrial melanism mutation in British peppered moths is a transposable element

Journal name:
Nature
Volume:
534,
Pages:
102–105
Date published:
DOI:
doi:10.1038/nature17951
Received
Accepted
Published online

Discovering the mutational events that fuel adaptation to environmental change remains an important challenge for evolutionary biology. The classroom example of a visible evolutionary response is industrial melanism in the peppered moth (Biston betularia): the replacement, during the Industrial Revolution, of the common pale typica form by a previously unknown black (carbonaria) form, driven by the interaction between bird predation and coal pollution1. The carbonaria locus has been coarsely localized to a 200-kilobase region, but the specific identity and nature of the sequence difference controlling the carbonariatypica polymorphism, and the gene it influences, are unknown2. Here we show that the mutation event giving rise to industrial melanism in Britain was the insertion of a large, tandemly repeated, transposable element into the first intron of the gene cortex. Statistical inference based on the distribution of recombined carbonaria haplotypes indicates that this transposition event occurred around 1819, consistent with the historical record. We have begun to dissect the mode of action of the carbonaria transposable element by showing that it increases the abundance of a cortex transcript, the protein product of which plays an important role in cell-cycle regulation, during early wing disc development. Our findings fill a substantial knowledge gap in the iconic example of microevolutionary change, adding a further layer of insight into the mechanism of adaptation in response to natural selection. The discovery that the mutation itself is a transposable element will stimulate further debate about the importance of ‘jumping genes’ as a source of major phenotypic novelty3.

At a glance

Figures

  1. The carbonaria candidate region, and the position and structure of the carbonaria mutation.
    Figure 1: The carbonaria candidate region, and the position and structure of the carbonaria mutation.

    a, Approximately 400-kb candidate region (bounded by marker loci b and d (ref. 2)) indicating gene content and genotyping positions (vertical lines in the continuous grey bar). Intron–exon structure and orientation are illustrated separately for each gene (annotated in GenBank accession KT182637. b, Refined candidate region including candidate polymorphisms (lines on the grey bar). The intron–exon structure of cortex is shown for carbonaria (black moth) and typica (speckled moth), highlighting the presence of a large (22 kb) indel (orange) within the first intron. Exons 1A and 1B are alternative transcription starts followed by the shared exons 2–9. c, The only exclusive carbonariatypica polymorphism within the candidate region. The structure of the insert, shown in the carbonaria sequence, corresponds to a class II DNA transposon, with direct repeats resulting from target site duplication (black nucleotides) next to inverted repeats (red nucleotides). Typica haplotypes (lower sequence) lack the 4-base target site duplication, the inverted repeats and the core insert sequence. The transposon consists of ~9 kb tandemly repeated two and one-third times (repeat unit (RU)1–RU3), with three short tandem subrepeat units (green dots, SRU1–SRU9) within each repeat unit. Moth images were created from photographs taken by A.E.v.H.

  2. Recombination pattern and ageing of the carb-TE mutation.
    Figure 2: Recombination pattern and ageing of the carb-TE mutation.

    a, Nearest recombination between carbonaria (carb-TE present (orange)) and non-carbonaria (typica and insularia (light grey)) haplotypes (n = 107), 200 kb either side of the carb-TE (at position 0). Dark grey areas indicate boundaries within which recombination occurred. b, Multilocus linkage disequilibrium (rd) across the same sequence window among carbonaria and non-carbonaria haplotypes. Grey area indicates the widest 99% confidence region, across loci, for the null hypothesis (rd ≈ 0). Red lines represent the simulation-based upper bound under the extreme assumption that all alleles defining the carbonaria haplotype were initially exclusive to it (mean and 90% interval). c, Introgression of the ancestral carbonaria haplotype (black) into non-carbonaria haplotypes (grey; carb-TE absent (n = 144)). Red lines represent the simulation-based expectations (mean and 90% interval). d, Probability density for the age of the carb-TE mutation inferred from the recombination pattern in the carbonaria haplotypes (maximum density at 1819 shown by dotted line; first record of carbonaria in 1848 shown by dashed line).

  3. Relative expression of cortex in developing wings of B. betularia.
    Figure 3: Relative expression of cortex in developing wings of B. betularia.

    a, Average expression (across typica and carbonaria morphs) of all cortex splice variants (exons 7–9) relative to the control gene α-Spec in wing discs at different developmental stages (La6, sixth instar larvae; Cr2, day 2 crawler; Pu2, day 2 pupae; PDP, post-diapause pupae). Bars are s.e.m. b, Scaled images (created from photographs taken by I.J.S.) of B. betularia forewings at different stages. c, d, Tukey plots for relative expression of cortex 1B (c) and 1A (d) full transcripts in developing wings of the three carbonaria-locus genotypes (c/c, c/t and t/t) produced within the progeny of a c/t × c/t cross (no data for c/c at Cr2). Genotypes differ significantly for the 1B full transcript (P < 0.001, generalized linear model (GLM)), whereas genotypes do not differ for the 1A full transcript (P > 0.2, GLM). (Note the differing y-axes scales.) Equivalent graphs for the progeny of c/t × t/t crosses (which lack the c/c genotype) are presented in Extended Data Fig. 8.

  4. BAC and fosmid haplotype tilepaths used to define carbonaria candidate polymorphisms.
    Extended Data Fig. 1: BAC and fosmid haplotype tilepaths used to define carbonaria candidate polymorphisms.

    a, BAC and fosmid tilepaths of the carbonaria haplotype (black bars) and three typica haplotypes (different shades of grey). Two small regions not covered by BACs or fosmids were reconstructed using parent and offspring sequences from the same heterozygous family (FAM11). The positions of loci b and d (see Figure 1) are indicated by the dashed lines, and the carbonaria candidate region is highlighted blue. Fosmid 25H14 containing carb-TE appears small because it is aligned against the typica reference sequence, which does not include the carb-TE. b, Alignment of three typica haplotypes against the carbonaria haplotype for a short section within the carbonaria candidate region, showing SNPs (dots are nucleotides identical to the carbonaria sequence). Polymorphisms in which all three typica alleles differed from carbonaria were treated as carbonaria candidates; polymorphisms in which the same allele occurred in carbonaria and at least one typica were excluded from further consideration.

  5. Validation of the 3-primer PCR carb-TE genotyping assay in a family and its application in a variety of wild-caught moths.
    Extended Data Fig. 2: Validation of the 3-primer PCR carb-TE genotyping assay in a family and its application in a variety of wild-caught moths.

    a, Schematic alignment of carbonaria and typica haplotypes showing the position of the three primers (A, B and C, not to scale) used in the same PCR to detect the presence and absence of the 22 kb carb-TE. In the presence of the carb-TE, primers A and C are too far apart to generate a product; the repeat structure of the carb-TE presents three annealing sites for primer B but only the shortest primer B–C combination is amplified when using 45-s extension (primer sequences are listed in Supplementary Table 1). b, carb-TE genotypes for father (lane 2), mother (lane 3) and 15 offspring (lanes 4–18); the two brightest bands in the size ladder are 300 bp and 1 kb (lane 1). The parents were full siblings and known to be heterozygous (c/t), and therefore expected to generate c/c, c/t and t/t offspring. The larger band (primers B–C) indicates the presence of the carb-TE and the smaller band (primers A–C) its absence (typica allele in this family); heterozygotes have both bands. The individual in lane 15 (135F1-12) is the homozygous male used for whole genome sequencing. c, Presence or absence of the carb-TE in a carbonaria haplotype fosmid clone (lane 2), three different typica haplotype clones (lanes 3–5; one fosmid, two BACs), wild carbonaria homozygotes (lanes 6 and 7), wild carbonaria heterozygotes (lanes 8–10), typica with a flanking haplotype similar to the carbonaria haplotype but lacking the carb-TE (lanes 11–13), light insularia (lanes 14–16), intermediate insularia (lanes 17–19), dark insularia (lanes 20–22) and carbonaria-like insularia (lanes 23–25).

  6. Hypothetical reconstruction of the birth of the carbonaria allele.
    Extended Data Fig. 3: Hypothetical reconstruction of the birth of the carbonaria allele.

    Class II non-autonomous DNA transposition is mediated by two transposase monomers linked to terminal inverted repeats (TIR). The monomers form a dimer at the target site that is cleaved to leave short direct repeated overhangs. The transposable element including TIRs is inserted and finally the single-stranded cleaved sites are filled in to complete the target site duplication39. The unduplicated target site motif (CCTC) is common, possibly ubiquitous, in all non-carbonaria (typica and insularia) haplotypes, but a typica ancestor is more likely given the pattern of haplotype similarities and the presumed prevalence of typica haplotypes around 1800.

  7. The rise and fall of carbonaria in the Manchester area.
    Extended Data Fig. 4: The rise and fall of carbonaria in the Manchester area.

    a, Frequency of the carbonaria phenotype from ~1800 to 2009. b, Corresponding frequencies of the carbonaria allele. The envelopes show the confidence intervals (50%, 90% and 99%) for the simulated trajectories. Dark-red dots, observations falling within the simulated trajectories; orange dots, additional data collected after 2002 (year during which >85% of the field sample was collected). Stars indicate likely frequencies where historical data are scarce. Data and sources are listed in Supplementary Table 3.

  8. Position and tissue-specific expression of alternative first exons of the gene cortex.
    Extended Data Fig. 5: Position and tissue-specific expression of alternative first exons of the gene cortex.

    a, Illustration of cortex exon structure indicating the positions of thirteen alternative transcription starts and subsequent exons relative to the flanking genes in the b–d region (position of carb-TE indicated by orange bar). b, Expression of different starting position cortex transcripts. End-point RT–PCR with reduced cycles (35) was used to exclude transcripts with negligible dosage. Amplicon intensities are scaled between + (faint but visible) and +++(strong PCR product). Negative PCRs represent expression below the detection threshold; this may even occur in ‘origin’ tissue types (wing disc/pupa/testes) in which the alternative starts were discovered owing to the fact that 5′ RACE used ~20 times the amount of RNA template relative to the standard cDNA synthesis for the 35 cycle end-point PCRs. Ovaries were not used for 5′ RACE, which may have caused gonad expression bias towards testes. Test tissues are sixth instar larvae gonads and wing discs at different developmental stages (abbreviations as in Fig. 3).

  9. Examples of cortex splice variation pattern in typica and carbonaria developing wing discs.
    Extended Data Fig. 6: Examples of cortex splice variation pattern in typica and carbonaria developing wing discs.

    End-point PCR on wing disc cDNA amplified with primers in the first and last exons (E1–E9), with typica individuals to the left of the central ladder (the two brightest bands in the size ladder are 300 bp and 1 kb) and carbonaria individuals (all c/t heterozygotes) to the right of the central ladder. a, Exon 1A variants in Cr2 stage. b, Exon 1B variants in Cr4 stage. (See Fig. 3 for stage abbreviations.)

  10. Exonic structure and size distributions of cortex splice variants amplified by end-point RT–PCR with primers in exon 1A or 1B and exon 9.
    Extended Data Fig. 7: Exonic structure and size distributions of cortex splice variants amplified by end-point RT–PCR with primers in exon 1A or 1B and exon 9.

    Size distributions of the PacBio reads are displayed for the two alternative first exons 1A (a) and 1B (b) of cortex. c, d, Comparison of carbonaria locus genotypes (t/t pale blue fill, c/t light blue line, c/c dark blue line) measured with Fragment Analyzer. Relative fluorescence units (RFU) were averaged across individuals for fragments amplified with E1A-E9 (c) or E1B-E9 (d) primers. Prior to averaging, RFUs were standardized so that the total fluorescence (area under the curve) per individual scaled to 1. Arrows with the same numbers denote either similar exonic structure (E1A versus E1B variants) or fragment identity between the two sources of data (PacBio reads and Fragment Analyzer). Exonic structure of the six main splice variants is represented in matrices (a, b), in which white cells represent skipped exons in a splice variant (asterisk indicates full transcript in which the first 71 bp of exon 6 are missing). Apparent differences among melanic and non-melanic for 1A number 2 and number 3 splice variants were not consistent among families.

  11. Tukey plots for relative expression of cortex full transcript in developing wing discs.
    Extended Data Fig. 8: Tukey plots for relative expression of cortex full transcript in developing wing discs.

    c/t heterozygotes are compared with t/t homozygotes produced from c/t × t/t crosses (starting with exon 1B (a) or exon 1A (b)). Genotypes differ significantly for 1B full transcript (P = 0.001, GLM), whereas genotypes do not differ for 1A full transcript (P > 0.5, GLM). Note the differing y-axes scales. c, Sample sizes for cortex qPCR experiments by wing disc developmental stage and carbonaria-locus genotype.

  12. Orthology and functional domain conservation of cortex protein.
    Extended Data Fig. 9: Orthology and functional domain conservation of cortex protein.

    a, Schematic illustration, not to scale, of molecular features of B. betularia cortex protein sequence. b, Bootstrapped Maximum Likelihood consensus tree calculated with MEGA 6 of fzy/cortex derived from the propeller domain of the alignment in Supplementary Data 2. Branches are collapsed where partitions were reproduced in less than half of bootstrap replicates. Major groups containing lepidopteran cortex (black circles), non-lepidopteran cortex (red circles), fzr/rap (yellow circles) or fzy/cdc20/cdh1 proteins (green circles) are similarly unequivocally defined in trees obtained by neighbour joining or maximum parsimony methods (not shown). c, 3D protein sequence conservation mapping of lepidopteran cortex sequences onto a homology model of B. betularia cortex (top); all cortex sequences onto the same B. betularia model (middle); non-lepidopteran cortex sequences onto a model of D. melanogaster cortex (bottom). Molecular surfaces are shown in PyMOL using a spectrum from high (blue) to low (red) conservation. The mapping reveals the shared presence of a presumed inter-blade D box-like degron-binding site (pink segment is superimposed D box-mimicking sequence from the structures of human APC/C (PDB accession 4ui9)40). In contrast, there is much weaker conservation of surface regions corresponding to facial KEN box or helical specificity determinant sites (white and grey ribbons, respectively, from the same structure), suggesting that cortex proteins lack these functionalities. Note that the greater sequence variability in the non-lepidopteran set leads to lower overall sequence conservation (bottom) but that overall patterns in all panels are similar.

Tables

  1. Predicted functionality of B. betularia cortex isoforms (starting with exon 1A or 1B)
    Extended Data Table 1: Predicted functionality of B. betularia cortex isoforms (starting with exon 1A or 1B)

Accession codes

References

  1. Cook, L. M. The rise and fall of the carbonaria form of the peppered moth. Q. Rev. Biol. 78, 399417 (2003)
  2. van’t Hof, A. E., Edmonds, N., Dalíková, M., Marec, F. & Saccheri, I. J. Industrial melanism in British peppered moths has a singular and recent mutational origin. Science 332, 958960 (2011)
  3. Brookfield, J. F. Y. Evolutionary genetics: mobile DNAs as sources of adaptive change? Curr. Biol. 14, R344R345 (2004)
  4. Barrett, R. D. H. & Hoekstra, H. E. Molecular spandrels: tests of adaptation at the genetic level. Nature Rev. Genet. 12, 767780 (2011)
  5. Nadeau, N. J. & Jiggins, C. D. A golden age for evolutionary genetics ? Genomic studies of adaptation in natural populations. Trends Genet. 26, 484492 (2010)
  6. Martin, A. & Orgogozo, V. The loci of repeated evolution: a catalog of genetic hotspots of phenotypic variation. Evolution 67, 12351250 (2013)
  7. Stern, D. L. The genetic causes of convergent evolution. Nature Rev. Genet. 14, 751764 (2013)
  8. Savolainen, O., Lascoux, M. & Merila, J. Ecological genomics of local adaptation. Nature Rev. Genet. 14, 807820 (2013)
  9. Hoekstra, H. E. & Coyne, J. A. The locus of evolution: evo devo and the genetics of adaptation. Evolution 61, 9951016 (2007)
  10. Cook, L. M. & Saccheri, I. J. The peppered moth and industrial melanism: evolution of a natural selection case study. Heredity 110, 207212 (2013)
  11. Chu, T., Henrion, G., Haegeli, V. & Strickland, S. Cortex, a Drosophila gene required to complete oocyte meiosis, is a member of the Cdc20/fizzy protein family. Genesis 29, 141152 (2001)
  12. Saccheri, I. J., Rousset, F., Watts, P. C., Brakefield, P. M. & Cook, L. M. Selection and gene flow along a diminishing cline of melanic peppered moths. Proc. Natl Acad. Sci. USA 105, 1621216217 (2008)
  13. Clarke, C. A. Biston betularia, obligate f. insularia indistinguishable from f. carbonaria (Geometridae). J. Lepid. Soc. 33, 6064 (1979)
  14. Lees, D. R. & Creed, E. R. Genetics of insularia forms of peppered moth, Biston betularia. Heredity 39, 6773 (1977)
  15. Kim, Y. & Nielsen, R. Linkage disequilibrium as a signature of selective sweeps. Genetics 167, 15131524 (2004)
  16. Cook, L. M., Sutton, S. L. & Crawford, T. J. Melanic moth frequencies in Yorkshire, an old English industrial hot spot. J. Hered. 96, 522528 (2005)
  17. Feschotte, C. Transposable elements and the evolution of regulatory networks. Nature Rev. Genet. 9, 397405 (2008)
  18. He, J. et al. Insights into degron recognition by APC/C coactivators from the structure of an Acm1-Cdh1 complex. Mol. Cell 50, 649660 (2013)
  19. Whitfield, Z. J., Chisholm, J., Hawley, R. S. & Orr-Weaver, T. L. A meiosis-specific form of the APC/C promotes the oocyte-to-embryo transition by decreasing levels of the polo kinase inhibitor matrimony. PLoS Biol. 11, e1001648 (2013)
  20. Nadeau, N. J. et al. The gene cortex controls mimicry and crypsis in butterflies and moths. Nature http://dx.doi.org/10.1038/nature17961 (this issue)
  21. Ito, K. et al. Mapping and recombination analysis of two moth colour mutations, Black moth and Wild wing spot, in the silkworm Bombyx mori. Heredity 116, 5259 (2016)
  22. González, J., Karasov, T. L., Messer, P. W. & Petrov, D. A. Genome-wide patterns of adaptation to temperate environments associated with transposable elements in Drosophila. PLoS Genet. 6, e1000905 (2010)
  23. Schlenke, T. A. & Begun, D. J. Strong selective sweep associated with a transposon insertion in Drosophila simulans. Proc. Natl Acad. Sci. USA 101, 16261631 (2004)
  24. Schrader, L. et al. Transposable element islands facilitate adaptation to novel environments in an invasive species. Nature Commun. 5, 5495 (2014)
  25. Casacuberta, E. & González, J. The impact of transposable elements in environmental adaptation. Mol. Ecol. 22, 15031517 (2013)
  26. Koga, A., Iida, A., Hori, H., Shimada, A. & Shima, A. Vertebrate DNA transposon as a natural mutator: the medaka fish Tol2 element contributes to genetic variation without recognizable traces. Mol. Biol. Evol. 23, 14141419 (2006)
  27. van’t Hof, A. E. et al. Linkage map of the peppered moth, Biston betularia (Lepidoptera, Geometridae): a model of industrial melanism. Heredity 110, 283295 (2013)
  28. Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33, W465W467 (2005)
  29. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnol. 29, 644652 (2011)
  30. Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. Nature Methods 9, 179181 (2011)
  31. Agapow, P.-M. & Burt, A. Indices of multilocus linkage disequilibrium. Mol. Ecol. Notes 1, 101102 (2001)
  32. Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012)
  33. Baxter, S. W. et al. Genomic hotspots for adaptation: the population genetics of Müllerian mimicry in the Heliconius melpomene clade. PLoS Genet. 6, e1000794 (2010)
  34. Reed, R. D., McMillan, W. O. & Nagy, L. M. Gene expression underlying adaptive variation in Heliconius wing patterns: non-modular regulation of overlapping cinnabar and vermilion prepatterns. Proc. R. Soc. Lond. B 275, 3745 (2008)
  35. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772780 (2013)
  36. Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 27252729 (2013)
  37. Šali, A. & Blundell, T. L. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779815 (1993)
  38. Ashkenazy, H., Erez, E., Martz, E., Pupko, T. & Ben-Tal, N. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 38, W529W533 (2010)
  39. Muñoz-López, M. & García-Pérez, J. L. DNA transposons: nature and applications in genomics. Curr. Genomics 11, 115128 (2010)
  40. Chang, L., Zhang, Z., Yang, J., McLaughlin, S. H. & Barford, D. Atomic structure of the APC/C and its mechanism of protein ubiquitination. Nature 522, 450454 (2015)

Download references

Author information

  1. These authors contributed equally to this work.

    • Arjen E. van’t Hof &
    • Pascal Campagne

Affiliations

  1. Institute of Integrative Biology, University of Liverpool, Biosciences Building, Crown Street, Liverpool L69 7ZB, UK

    • Arjen E. van’t Hof,
    • Pascal Campagne,
    • Daniel J. Rigden,
    • Carl J. Yung,
    • Jessica Lingley,
    • Neil Hall,
    • Alistair C. Darby &
    • Ilik J. Saccheri
  2. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK

    • Michael A. Quail

Contributions

I.J.S., A.E.v.H. and P.C. designed the study and wrote the paper; P.C., A.E.v.H. and D.J.R. produced the figures; A.E.v.H. directed molecular biology experiments; A.E.v.H., C.J.Y. and J.L. conducted molecular biology experiments; A.E.v.H. constructed the BAC and fosmid tilepaths; A.E.v.H. and A.C.D. assembled, finished and annotated sequences; P.C. analysed population genetic and gene expression data; I.J.S. collected the wild sample; I.J.S. and C.J.Y. reared the samples and performed dissections; D.J.R. and A.E.v.H. built the cortex tree; D.J.R. modelled the cortex structure; M.A.Q. constructed the fosmid library; and A.C.D. and N.H. advised on the design of sequencing strategies.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

The typica 1 haplotype (b–d interval) reference sequence has been deposited in GenBank under accession number KT182637; The B. betularia whole genome sequence has been deposited in the NCBI SRA database under accession number SRX1060178; the cortex splice variants have been deposited in GenBank under accession numbers KT235895KT235906; Rps3A has been deposited in GenBank under accession number JF811439; α-spec has been deposited in GenBank under accession number KT182638.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: BAC and fosmid haplotype tilepaths used to define carbonaria candidate polymorphisms. (144 KB)

    a, BAC and fosmid tilepaths of the carbonaria haplotype (black bars) and three typica haplotypes (different shades of grey). Two small regions not covered by BACs or fosmids were reconstructed using parent and offspring sequences from the same heterozygous family (FAM11). The positions of loci b and d (see Figure 1) are indicated by the dashed lines, and the carbonaria candidate region is highlighted blue. Fosmid 25H14 containing carb-TE appears small because it is aligned against the typica reference sequence, which does not include the carb-TE. b, Alignment of three typica haplotypes against the carbonaria haplotype for a short section within the carbonaria candidate region, showing SNPs (dots are nucleotides identical to the carbonaria sequence). Polymorphisms in which all three typica alleles differed from carbonaria were treated as carbonaria candidates; polymorphisms in which the same allele occurred in carbonaria and at least one typica were excluded from further consideration.

  2. Extended Data Figure 2: Validation of the 3-primer PCR carb-TE genotyping assay in a family and its application in a variety of wild-caught moths. (266 KB)

    a, Schematic alignment of carbonaria and typica haplotypes showing the position of the three primers (A, B and C, not to scale) used in the same PCR to detect the presence and absence of the 22 kb carb-TE. In the presence of the carb-TE, primers A and C are too far apart to generate a product; the repeat structure of the carb-TE presents three annealing sites for primer B but only the shortest primer B–C combination is amplified when using 45-s extension (primer sequences are listed in Supplementary Table 1). b, carb-TE genotypes for father (lane 2), mother (lane 3) and 15 offspring (lanes 4–18); the two brightest bands in the size ladder are 300 bp and 1 kb (lane 1). The parents were full siblings and known to be heterozygous (c/t), and therefore expected to generate c/c, c/t and t/t offspring. The larger band (primers B–C) indicates the presence of the carb-TE and the smaller band (primers A–C) its absence (typica allele in this family); heterozygotes have both bands. The individual in lane 15 (135F1-12) is the homozygous male used for whole genome sequencing. c, Presence or absence of the carb-TE in a carbonaria haplotype fosmid clone (lane 2), three different typica haplotype clones (lanes 3–5; one fosmid, two BACs), wild carbonaria homozygotes (lanes 6 and 7), wild carbonaria heterozygotes (lanes 8–10), typica with a flanking haplotype similar to the carbonaria haplotype but lacking the carb-TE (lanes 11–13), light insularia (lanes 14–16), intermediate insularia (lanes 17–19), dark insularia (lanes 20–22) and carbonaria-like insularia (lanes 23–25).

  3. Extended Data Figure 3: Hypothetical reconstruction of the birth of the carbonaria allele. (157 KB)

    Class II non-autonomous DNA transposition is mediated by two transposase monomers linked to terminal inverted repeats (TIR). The monomers form a dimer at the target site that is cleaved to leave short direct repeated overhangs. The transposable element including TIRs is inserted and finally the single-stranded cleaved sites are filled in to complete the target site duplication39. The unduplicated target site motif (CCTC) is common, possibly ubiquitous, in all non-carbonaria (typica and insularia) haplotypes, but a typica ancestor is more likely given the pattern of haplotype similarities and the presumed prevalence of typica haplotypes around 1800.

  4. Extended Data Figure 4: The rise and fall of carbonaria in the Manchester area. (132 KB)

    a, Frequency of the carbonaria phenotype from ~1800 to 2009. b, Corresponding frequencies of the carbonaria allele. The envelopes show the confidence intervals (50%, 90% and 99%) for the simulated trajectories. Dark-red dots, observations falling within the simulated trajectories; orange dots, additional data collected after 2002 (year during which >85% of the field sample was collected). Stars indicate likely frequencies where historical data are scarce. Data and sources are listed in Supplementary Table 3.

  5. Extended Data Figure 5: Position and tissue-specific expression of alternative first exons of the gene cortex. (177 KB)

    a, Illustration of cortex exon structure indicating the positions of thirteen alternative transcription starts and subsequent exons relative to the flanking genes in the b–d region (position of carb-TE indicated by orange bar). b, Expression of different starting position cortex transcripts. End-point RT–PCR with reduced cycles (35) was used to exclude transcripts with negligible dosage. Amplicon intensities are scaled between + (faint but visible) and +++(strong PCR product). Negative PCRs represent expression below the detection threshold; this may even occur in ‘origin’ tissue types (wing disc/pupa/testes) in which the alternative starts were discovered owing to the fact that 5′ RACE used ~20 times the amount of RNA template relative to the standard cDNA synthesis for the 35 cycle end-point PCRs. Ovaries were not used for 5′ RACE, which may have caused gonad expression bias towards testes. Test tissues are sixth instar larvae gonads and wing discs at different developmental stages (abbreviations as in Fig. 3).

  6. Extended Data Figure 6: Examples of cortex splice variation pattern in typica and carbonaria developing wing discs. (214 KB)

    End-point PCR on wing disc cDNA amplified with primers in the first and last exons (E1–E9), with typica individuals to the left of the central ladder (the two brightest bands in the size ladder are 300 bp and 1 kb) and carbonaria individuals (all c/t heterozygotes) to the right of the central ladder. a, Exon 1A variants in Cr2 stage. b, Exon 1B variants in Cr4 stage. (See Fig. 3 for stage abbreviations.)

  7. Extended Data Figure 7: Exonic structure and size distributions of cortex splice variants amplified by end-point RT–PCR with primers in exon 1A or 1B and exon 9. (181 KB)

    Size distributions of the PacBio reads are displayed for the two alternative first exons 1A (a) and 1B (b) of cortex. c, d, Comparison of carbonaria locus genotypes (t/t pale blue fill, c/t light blue line, c/c dark blue line) measured with Fragment Analyzer. Relative fluorescence units (RFU) were averaged across individuals for fragments amplified with E1A-E9 (c) or E1B-E9 (d) primers. Prior to averaging, RFUs were standardized so that the total fluorescence (area under the curve) per individual scaled to 1. Arrows with the same numbers denote either similar exonic structure (E1A versus E1B variants) or fragment identity between the two sources of data (PacBio reads and Fragment Analyzer). Exonic structure of the six main splice variants is represented in matrices (a, b), in which white cells represent skipped exons in a splice variant (asterisk indicates full transcript in which the first 71 bp of exon 6 are missing). Apparent differences among melanic and non-melanic for 1A number 2 and number 3 splice variants were not consistent among families.

  8. Extended Data Figure 8: Tukey plots for relative expression of cortex full transcript in developing wing discs. (122 KB)

    c/t heterozygotes are compared with t/t homozygotes produced from c/t × t/t crosses (starting with exon 1B (a) or exon 1A (b)). Genotypes differ significantly for 1B full transcript (P = 0.001, GLM), whereas genotypes do not differ for 1A full transcript (P > 0.5, GLM). Note the differing y-axes scales. c, Sample sizes for cortex qPCR experiments by wing disc developmental stage and carbonaria-locus genotype.

  9. Extended Data Figure 9: Orthology and functional domain conservation of cortex protein. (459 KB)

    a, Schematic illustration, not to scale, of molecular features of B. betularia cortex protein sequence. b, Bootstrapped Maximum Likelihood consensus tree calculated with MEGA 6 of fzy/cortex derived from the propeller domain of the alignment in Supplementary Data 2. Branches are collapsed where partitions were reproduced in less than half of bootstrap replicates. Major groups containing lepidopteran cortex (black circles), non-lepidopteran cortex (red circles), fzr/rap (yellow circles) or fzy/cdc20/cdh1 proteins (green circles) are similarly unequivocally defined in trees obtained by neighbour joining or maximum parsimony methods (not shown). c, 3D protein sequence conservation mapping of lepidopteran cortex sequences onto a homology model of B. betularia cortex (top); all cortex sequences onto the same B. betularia model (middle); non-lepidopteran cortex sequences onto a model of D. melanogaster cortex (bottom). Molecular surfaces are shown in PyMOL using a spectrum from high (blue) to low (red) conservation. The mapping reveals the shared presence of a presumed inter-blade D box-like degron-binding site (pink segment is superimposed D box-mimicking sequence from the structures of human APC/C (PDB accession 4ui9)40). In contrast, there is much weaker conservation of surface regions corresponding to facial KEN box or helical specificity determinant sites (white and grey ribbons, respectively, from the same structure), suggesting that cortex proteins lack these functionalities. Note that the greater sequence variability in the non-lepidopteran set leads to lower overall sequence conservation (bottom) but that overall patterns in all panels are similar.

Extended Data Tables

  1. Extended Data Table 1: Predicted functionality of B. betularia cortex isoforms (starting with exon 1A or 1B) (118 KB)

Supplementary information

PDF files

  1. Supplementary Information (484 KB)

    This file contains Supplementary Methods, Supplementary Figures 1-2 and Supplementary References.

Text files

  1. Supplementary Data (455 KB)

    This file contains the sequence alignment of the carbonaria and three typica haplotypes spanning the ‘b-d’ region (illustrated in Extended Data Figure 1).

  2. Supplementary Data (61 KB)

    This file contains full-length sequence alignment in aligned FASTA format of cortex proteins and selected homologues. Incompleteness of some sequences at the N-terminus and some uncertainty regarding translation start sites have no impact on the phylogenetic tree since it was calculated using only the propeller domain (see Extended Data Figure 9a).

Excel files

  1. Supplementary Table 1 (63 KB)

    This table shows polymorphisms in the carbonaria candidate region.

  2. Supplementary Table 2 (19 KB)

    This table contains polymorphisms in the locus b-d region.

  3. Supplementary Table 3 (11 KB)

    This table contains Carbonaria morph frequencies in the Manchester area.

  4. Supplementary Table 4 (11 KB)

    This table contains PCR primers for cortex, control genes and candidate genes.

  5. Supplementary Table 5 (15 KB)

    This table contains sources, including accession numbers, for cortex and Fizzy family sequences.

Additional data