Introduction

Genomics outside established model organisms

The initial lament that genomics ‘would accelerate the migration of biologists to the ‘superb six’: humans, mice, fruitflies, worms, yeast, and Arabidopsis’ (Murray, 2000) has failed to materialize (Crawford, 2001). Less than 5 years after the first draft of the human genome was published, nearly 600 eukaryotic genome-sequencing projects are completed or underway (cf. http://www.genomesonline.org/). The advantages of phylogenetically broad genome coverage are clear, and comparative analysis of diverse genomes will certainly continue to yield important insights into genome evolution and the relationships among branches of the tree of life. However, more than accumulating sequence data for comparative analysis, genomic research offers a unique opportunity to pursue a complete understanding of how genetic information is translated to produce an organism, and how modifications in genomic composition and organization give rise to biological diversity. In this quest, research on a new class of ‘emerging’ model organisms is an essential complement to the in-depth and finely detailed analysis of traditional genetic model organisms.

Evolutionary and ecological functional genomics

The relatively new field of ‘evolutionary and ecological functional genomics’ (EEFG), and its goal of finding ‘the genes that affect ecological success and evolutionary fitness in natural environments and populations’ (Feder and Mitchell-Olds, 2003), requires an expansion outside classical model organisms. Model organisms for EEFG must combine broad genetic and ecological tractability with naturally occurring, functional variation (Feder and Mitchell-Olds, 2003).

Lepidoptera in general, and butterflies in particular, offer outstanding opportunities for integrative research at the interface between genomes and biological complexity. In spite of their immense biological (very species rich), economical (pests, pollinators and silk production), and societal (education and public understanding of science) value, available genomic resources in Lepidoptera have been limited. This situation is finally changing and independent efforts to develop core resources are underway for several species of butterflies and moths. Here we provide an overview of the strengths of butterflies as models in EEFG and summarize current efforts to develop resources in two groups, Bicyclus and Heliconius, which offer unique and complementary opportunities to study the links between genomic, developmental, and phenotypic diversity. Within this context, we discuss the general challenges facing the research community and highlight the need for a community-wide effort to consolidate and extend ongoing research.

Butterflies as emerging model organisms in genomics

The strength of butterflies as research targets derives from their extraordinary diversity, coupled with the exceptional opportunities to study the origins and maintenance of variation at nearly every biological level. The historical roots of butterfly research are deep, and the current research community is very active in a variety of areas of ecology and evolution (Boggs et al., 2003) ranging from the molecular details of insect color-vision (Briscoe and Chittka, 2001; Stavenga, 2002) to the analysis of human impact on biodiversity (Kotiaho et al., 2005; Mulder et al., 2005). Different species have provided some of the most important case studies on diverse topics in ecology and evolution. These include (1) population genetics and metapopulation dynamics focusing on the Glanville fritillary, Melitaea cinxia (Hanski, 2005, 2) long distance migration of the monarch butterfly, Danaus plexippus (Brower, 1996; Wassenaar and Hobson, 1998; Froy et al., 2003, 3) studies of Batesian mimicry, host plant detoxification, and pigment production in Papilio swallowtails (Li et al., 2003; Nijhout, 2003), and (4) evolution and development of wing patterns in the buckeye, Junonia coenia and a number of other species, including Bicyclus anynana and Heliconius (Beldade and Brakefield, 2002; McMillan et al., 2002; Marcus, 2005).

Evolution and development of butterfly wing patterns

Research on wing pattern formation is perhaps the most visually appealing example of the contribution butterflies can make to the understanding of the origins, maintenance, and modification of diversity. Virtually all of the more than 17 000 species of butterflies can be identified on the basis of the color patterns on their wings, and these highly diverse traits are emerging as invaluable systems for linking genes, gene networks, development, form, and function (Figure 1) (Nijhout, 1991; Beldade and Brakefield, 2002; McMillan et al., 2002; Brakefield et al., 2003; Evans and Marcus, 2006). Wings covered with colored scales (Figure 1d) are a morphological innovation of Lepidopterans and there is enormous pattern variation both within and across species. This variation is generally ecologically relevant and its adaptive value in natural populations has been extensively documented in relation to both biotic and abiotic factors (examples in Nijhout, 1991; Beldade and Brakefield, 2002; McMillan et al., 2002). Furthermore, the production and maintenance of this variation can be studied across a multitude of levels of biological organization (reviewed in Beldade and Brakefield, 2002; McMillan et al., 2002), ranging from the molecular details of pattern formation to the ecological relevance of pattern variation in natural populations (Figure 1).

Figure 1
figure 1

Multidisciplinary research in two colorful dimensions. Panels illustrate the different levels at which the mechanisms governing the production and modification of butterfly wing patterns can be studied. (a) A number of developmental candidate genes have been implicated in the formation of particular pattern elements such as eyespots. The genes spalt (pink) and engrailed (green) are expressed in pupal wings (right) in the center and in the different color rings of the future adult eyespot (left) (Brunetti et al., 2001). (b) Wing color pattern has also been studied in terms of the cellular interactions that underlie pattern formation and which are best understood for eyespots. In early pupal wings, the cells at the center of the presumptive eyespot produce a ‘morphogen’, which diffuses away from the center (arrows) to create a concentration gradient (gray curve). Neighboring cells then become fated to synthesize a particular color pigment depending on the morphogen concentrations they experience (where the vertical lines intersect the gray curve). (c) In Heliconius, the omochrome and melanin pathways (Nijhout, 1991) synthesize the pigment molecules that color the monochromatic scales. The deposition of different color pigments in different wing areas occurs late in pupal wing development (shown here for a pupa whose cuticle has been removed approximately 1 day before eclosion to expose the dorsal surface of the developing forewing). (d) The spatial arrangement of these scales in a single layer of parallel and overlapping rows produces the different pattern elements on the adult color phenotype (e.g. the eyespot on the photo). (e) Butterfly wing patterns play an important role in minimizing predation. The eyespots in B. anynana, for example, are thought to deflect predator's attention away from the fragile body as seen in this specimen photographed in the wild. (f) In addition, wing patterns have also been shown to play a role in mate selection and speciation. For example, the wing patterns in Heliconius provide both a source of ecological post-mating isolation and mating cues important in the incipient stages of speciation. See acknowledgements regarding source of photographs.

During the last 15 years, explicit efforts to integrate methods and concepts from evolutionary and developmental biology have brought increased attention to research on butterfly wing patterns (Beldade and Brakefield, 2002; McMillan et al., 2002; Beldade et al., 2005; Joron et al., 2006a). This research has illustrated such exciting findings as the co-option of conserved pathways to produce evolutionary novelties (Brakefield et al., 1996; Brunetti et al., 2001; Reed and Serfas, 2004), the contribution of key development candidate genes to phenotypic variation (Beldade et al., 2002a; Kronforst et al., 2006), the mapping to the same genomic location of color pattern switch genes from different species (Joron et al., 2006b), and experimental tests of evolutionary constraints in morphological change (Beldade et al., 2002b; Frankino et al., 2005).

Two complementary systems

The African bush-brown B. anynana (Nymphalidae, Satyrinae) and species within the South-American genus Heliconius (Nymphalidae, Heliconiinae) have emerged as important players in research on how the reciprocal interactions between development and selection shape functional diversity (Figure 1).

The wings of these two Nymphalid clades are very different in shape and pattern (Figure 2). Furthermore, the striking phenotypic differences are accompanied by clear differences in ecological function and in the underlying genetic and developmental basis. Both groups are well suited for analysis at the molecular, organismal, and population levels and are textbook examples of natural polymorphisms. Heliconius is characterized by amazing geographic pattern divergence within species and pattern convergence between distantly related species (reviewed in Joron et al., 2006a), and B. anynana by striking seasonal variation and adaptive phenotypic plasticity (Brakefield and French, 1999). In both groups, wing patterns play a role in avoiding predation (Figure 1f) (Benson, 1972; Mallet and Barton, 1989; Kapan, 2001; Langham, 2004; Lyytinen et al., 2004; Brakefield and Frankino, 2006) and in mate selection (Figure 1e) (McMillan et al., 1997; Jiggins et al., 2001; Breuker and Brakefield, 2002; Robertson and Monteiro, 2005), but they seem to function in different manners. While the bright colors in Heliconius warn potential predators of the butterflies’ distastefulness (Langham, 2004), those on B. anynana are associated to different seasonal strategies to avoid predation (camouflaging the dull-brown butterfly against a background of dry leaves, or attracting predators’ attention away from the body against a green background; Figure 2a) (Brakefield and Frankino, 2006). These different ecological pressures lead to quite distinct modes of selection in natural populations: strong directional selection in Heliconius and divergent selection for opposite extreme phenotypes in the two seasonal environments experienced by B. anynana populations.

Figure 2
figure 2

Extensive morphological variation in the wing patterns of B. anynana and Heliconius provide exciting opportunities for comparative work into the interplay between genes, development, and ecology. (a) Variation in Bicyclus wing patterns is extensive within and across species, and laboratory B. anynana provides the opportunity to study different types of variation (e.g. due to plasticity, to many alleles of small effect, or to single alleles of large effect) in detail. The B. anynana Stock Center in Leiden maintains over 20 lines with divergent phenotypes generated by artificial selection and over 30 mutant stocks carrying spontaneous mutations of large effect. The panel shows the ventral surface of both fore- and hindwing in different stocks of B. anynana. The first two photos on the left correspond to the ‘wild-type’ outbred stock and illustrate the seasonal polyphenism that results from plasticity in relation to temperature and humidity during development (Brakefield and Frankino, 2006) (on the left, a butterfly with conspicuous eyespots typical of the ‘wet season’, and to the right a dull-colored butterfly more typical of the ‘dry season’). The remaining photos correspond to different mutant stocks with altered eyespot patterns (from left to right: Bigeye with enlarged eyespots, spotty with extra eyespots on the forewing, Goldeneye with the typically black ring replaced with golden scales, and Missing with two eyespots absent from the hindwing). (b) The radiation in Heliconius color patterns couples both divergent evolution and multiple independent cases of convergent evolution. Different Heliconius species can be easily maintained in captivity and different populations or closely related species can be crossed to study naturally occurring variation. The panel shows geographic variation in the mimetic species, H. erato (top row) and H. melpomene (second row). The two species fall on divergent lineages in the genus, yet share identical wing patterns across their sympatric ranges and have undergone a parallel radiation into over 30 different geographic forms (Sheppard et al., 1985). Color pattern variation in these species is largely explained by changes at 4–5 loci or complex of tightly linked loci of large effect. For example, allelic changes at the Cr locus in H. erato and in a complex of at least three tightly linked loci (N, Yb, Sb) in H. melpomene control most of the variation in yellow and white pattern elements among five geographic races shown. See acknowledgements regarding source of photographs.

The genetic and developmental basis of wing pattern(s) formation also seems distinct in the two target groups. Study of laboratory populations of B. anynana have revealed both the presence of large amounts of segregating quantitative variation contributing to gradual response to artificial selection (Monteiro et al., 1994; Monteiro et al., 1997; Beldade et al., 2002b), and a number of spontaneous mutant alleles with a dramatic effect on phenotype (Beldade and Brakefield, 2002; Beldade et al., 2005). In Heliconius, in contrast, pattern variation is primarily attributable to a few genes of large effect with some minor effect modifiers (reviewed in Joron et al., 2006a). Differences in overall genetic architecture are emphasized by a more detailed analysis of specific candidate genes and pathways. The formation of butterfly eyespots, including those in B. anynana, involves expression of genes from classical wing development pathways (Brakefield et al., 1996; Brunetti et al., 2001; Reed and Serfas, 2004) in and around the area of the centers (foci) of presumptive eyespots with described organizing properties (French and Brakefield, 1995; Figure 1b). However, with the notable exception of tight linkage between wingless and the white/yellow color switch locus K in H. cydno (Kronforst et al., 2006), the bands and patches of color in Heliconius wings have so far shown no evidence for the involvement of the same developmental pathways (Reed and Gilbert, 2004; Jiggins et al., 2005; Tobler et al., 2005; Kapan et al., 2006; Joron et al., 2006a) or any type of patterning foci. Instead, genetic crosses and developmental mutants suggest that Heliconius patterns develop in a whole-wing proximo-distal manner, independently of wing veins (Reed and Gilbert, 2004). These two seemingly distinct patterning systems within Nymphalid butterflies offer an excellent opportunity for a broad understanding of pattern formation and of the ecological consequences of variation in phenotype.

Genomic resources in butterflies

Advances in available genomic resources are fueling genome-wide research in B. anynana and Heliconius. The functional analysis of genotypic and phenotypic variants can be pursued both at the level of the molecular details of gene function during wing development (e.g. using spontaneous mutations and genetic transformation techniques (Lewis et al., 1999; Weatherbee et al., 1999; Marcus et al., 2004; Lewis and Brunetti, 2006)) and, at the other end of the spectrum, at the level of the ecological analysis of the adaptive value of variant phenotypes (Benson, 1972; Kapan, 2001; Langham, 2004; Mallet and Barton, 1989). Ultimately, this research promises to identify the genes and gene regions that underlie adaptive variation, link these to the genetic and biochemical networks responsible for pattern formation, and generate a fuller understanding of the interplay between genomic, developmental, and evolutionary processes.

Genetic sequence information

Construction and analysis of both cDNA and gDNA libraries is expanding the amount of sequence information available in butterflies. In the last couple of years, moderate-scale sequencing of expression sequence tags (ESTs) has catapulted gene discovery in B. anynana and Heliconius erato and H. melpomene. ESTs derived from developing wings (Papanicolaou et al., 2005; Beldade et al., 2006) have been independently assembled resulting in the identification of thousands of putative gene objects (Table 1; Figure 3). These, together with publicly available sequences from other Lepidopteran species, have been assembled in a dedicated and web-accessible database, ButterflyBase (via http://www.butterflybase.org), designed to optimize the retrieval of individual ESTs or assembled gene objects annotated based on sequence similarity and protein prediction algorithms (Papanicolaou et al., 2005).

Table 1 Resources in B. anynana and Heliconius
Figure 3
figure 3

Overlap in EST-derived gene collections of Heliconius, B. anynana, and B. mori. The scaled Venn diagram (created using Vennmaster 0.17a; http://www.informatik.uni-ulm.de/ni/staff/HKestler/vennm/) shows the overlap between the collections of three Lepidoptera EST-cluster data sets from ButterflyBase (5721 clusters for B. anynana, 28 036 for B. mori, and 3632 for the pooled H. erato and H. melpomene collections) and the Drosophila proteins from FlyBase (69 920 peptide sequences from the Genome Annotation, Release 3). Each Lepidoptera collection was compared to the known Drosophila proteins using BLASTX similarity analysis, and to each other lepidopteran data set using BLASTN analysis. Lepidopteran gene clusters were assigned to the different areas of the Venn diagram based on a bit-score cutoff point of 70 bits. A total of 11 541 Lepidoptera clusters were significantly similar to proteins from the insect model Drosophila, and a subset of 1769 (white area) were conserved between all data sets. A total of 2161 gene clusters are shared across at least two Lepidoptera, but show no similarity with Drosophila peptides (black areas). The areas with clusters having no significant similarity to the other collections (in color) will likely decrease as the publicly available EST collections in lepidopterans increase, since a large proportion of these clusters likely reflect limitations of sampling cDNA (relatively few ESTs are available for butterflies) and sequencing (short reads make it harder to detect sequence homology). The use of a rather conservative estimate BLAST cutoff significance level (minimum 70 bits score corresponding to e-values lower than E−12) ensures lower rates of false positives (problematic when using gene collections that are not full sequence) but results in a potentially high number of false negatives (i.e. gene objects that do correspond to Lepidopteran homologs of annotated Drosophila peptides but which were not found significant here). Expansion of EST data sets for butterflies will enhance the estimates not only for large proteins but also of rapidly evolving genes or Lepidoptera- and Butterfly-specific genes.

Gene discovery projects in Heliconius and Bicyclus have generated much sequence information, providing the first step towards enabling the study of genome evolution in butterflies. Many of the gene objects identified in initial EST scans showed similarity to genes in publicly available collections. This analysis has enabled the identification of genes from different functional categories, including genes known to be involved in insect wing development (candidate genes for wing pattern variation) and common ‘house keeping genes’ (valuable in comparative mapping studies, see below) (Papanicolaou et al., 2005; Beldade et al., 2006). However, there is a fairly large subset of coding regions that do not show clear homology to genes in publicly available collections (Beldade et al., 2006 and ButterflyBase), including those of the insect model Drosophila melanogaster and the Lepidopteran model Bombyx mori (with recently published genome (Mita et al., 2004; Xia et al., 2004) and large-scale EST projects (Mita et al., 2003; Cheng et al., 2004)) (Figure 3). Particularly exciting are a few hundred fairly large predicted peptides that may be new or highly diverged genes in butterflies (Papanicolaou et al., 2005; Beldade et al., 2006). A functional analysis of these genes (e.g. with analysis of patterns of gene expression) and the expansion of gene collections within butterflies will help to better characterize these emerging patterns. In this respect, the planned addition of tens of thousands ESTs for B. anynana by the Joint Genome Institute (http://www.jgi.doe.gov) will provide an exciting data set of the genes expressed in different tissues and developmental stages in butterflies, and a powerful basis for comparative studies of Lepidopterans.

From identified genes to the genetic dissection of variation

The accumulation of sequence information is accelerating the development of the next generation of genomic resources in Bicyclus and Heliconius and expanding ongoing genetic mapping and expression profiling efforts.

High-density linkage maps predominately composed of amplified fragment length polymorphisms (AFLPs) and microsatellite markers are available for B. anynana and several Heliconius species (Table 1). These maps have been used to identify genomic regions that contribute to different types of phenotypic variation in the target Nymphalids (Jiggins et al., 2005; Tobler et al., 2005; Kapan et al., 2006; Joron et al., 2006b; van't Hof et al., 2007). Finer resolution mapping is being pursued by (1) adding gene-based markers throughout the genome (see below) and (2) by using linked AFLP markers and bacterial artificial chromosome (BAC) libraries now available in H. erato, H. melpomene, H. numata, and B. anynana to develop markers in genomic regions of interest. The latter strategy has been used successfully in Heliconius to show that the NYbSb gene complex in H. melpomene, the P locus in H. numata, and the Cr locus in H. erato all map to homologous regions of the genome (Joron et al., 2006b). This finding has been interpreted to suggest that a conserved, yet relatively unconstrained, mechanism affects pattern variation in Heliconius, and to imply that both convergent and divergent change can occur by the recruitment of homologous genomic regions. Positional cloning of these regions, now ongoing in all three species, will allow deeper insights into architecture, identity, and mode of action of this ‘developmental hotspot’ (cf. Richardson and Brakefield, 2003).

Current mapping efforts in both Bicyclus and Heliconius are concentrating on generating high-resolution gene-based maps. In this respect, ongoing EST projects are invaluable for the development of more markers for mapping and linkage analysis (Papanicolaou et al., 2005; Beldade et al., 2006). Sequence tags can be used to identify sequence polymorphisms in particular genes of interest, or, with targeted design, EST projects can directly combine gene and polymorphism discovery. In B. anynana and Heliconius, such a strategy has identified single-nucleotide polymorphisms and microsatellite repeats in thousands of gene objects (Beldade et al., 2006 and assembled ESTs in ButterflyBase). These types of markers are being added to existing linkage maps and will be a very powerful tool in moving from mapped regions to the identification of the actual genes that contribute to phenotypic variation. Particularly relevant are genes whose described role in wing development makes good candidates for wing pattern variation (cf. Beldade et al., 2002a). In addition, ‘housekeeping’ genes recurrent in EST projects of all species provide a common suite of reference markers for gene-based maps. Ribosomal protein genes, in particular, are ubiquitous in even moderate-scale EST scans and are excellent anchors for comparative linkage analysis (Yasukochi et al., 2006) (Table 2). Initial analysis based on ∼30 orthologous markers mapped in H. erato, H. melpomene, and B. mori shows surprising levels of synteny (Jiggins et al., 2005; Kapan et al., 2006; Yasukochi et al., 2006). It will be exciting to confirm this observation for more markers in more species as the conservation of gene order would be a powerful tool to eventually identify mapped loci by comparison of maps from different species.

Table 2 Ribosomal proteins as candidate anchor loci for comparative mapping

Gene mapping studies attempting to identify genes and gene regions contributing to variation in phenotype will be complemented with a detailed analysis of the changes in the levels of gene expression that accompany such variation. First generation high-density arrays composed of genes expressed during wing development are being tested in both Bicyclus and Heliconius (Reed et al., 2007). These resources will allow expression profiling of different parts of the developing wing and different variants of the same species. Furthermore, the availability of BAC libraries will allow the characterization of regulatory regions around those genes whose map location or expression changes are associated with variation in phenotype. As the community continues to identify genes and genetic regulatory regions associated with pattern formation and pattern variation, the tools to test the functional importance of these loci are being perfected. Germline transformation technology has been developed in B. anynana (Marcus et al., 2004), and will be the basis for the next generation of functional experiments such as gene-targeted expression or knockouts.

Extending genomic research in butterflies

Core resources for genomic research in butterflies have expanded substantially over the last few years. However, for butterflies to fully emerge as ecological and evolutionary genomic models, commitment of the whole research community is required. A concerted effort is crucial to stimulate the development of shared resources and strategies are required to turn butterflies into competitive players in the genomics era and to enable a more complete analysis of the questions that have made this group such powerful biological models over the last couple hundred years.

Linking genomic, phenotypic, and ecological data

There is a rich history of collaborative multidisciplinary research in the butterfly community and the time has come to develop a common database containing both emerging genetic and genomic information and the vast amount of non-genomic data available for butterflies. Such database would link genomic/genetic diversity data (physical/linkage maps, expression data, ESTs, sequence polymorphisms, and genomic sequences) and phenotypic diversity data (quantitative and qualitative descriptions of phenotypes, images, and pedigrees) within the context of clear spatial (e.g. habitats and sampling sites) and temporal scales. Equally important is the development of common tools to utilize such a database and permit detailed queries across species collections. These are challenging issues that require broad community participation. Fortunately, we are not alone and the challenges faced by the butterfly community are identical to those faced by other emerging model groups including Mimulus, Cichlids, Sticklebacks, Daphnia, and Dictyostelium. A number of bioinformatics solutions to these challenges are available including, for example, GMOD (http://www.gmod.org), a generalized open-source resource fully equipped with standard ontologies, file formats, web site and database options, and tools for organizing genomic data.

Prioritize genome sequencing

Very importantly, the community must push forward efforts to get at least one butterfly genome sequenced. Genome sequence information will provide an invaluable anchor for all genetic and genomic research in this group. Genome projects in Lepidoptera are so far restricted to moths, with B. mori being the only published effort (Mita et al., 2004; Xia et al., 2004). It is hoped that newly available physical maps (Yamamoto et al., 2006; Yasukochi et al., 2006) will accelerate assembly and annotation of the silkmoth genome, but it is still unclear how far this resource can be used in a detailed genetic analysis of butterflies. Butterflies and the Lepidopteran lineage containing B. mori have probably diverged more than hundred Mya (Vane-Wright, 2004), and have quite distinct biological properties related to the contrast between the diurnal (in butterflies) and the nocturnal (moths) lifestyles. Unfortunately, the same diversity that makes butterflies such attractive models has so far made community cohesion challenging. While genome projects continue to be a major financial and technical undertaking, the community will need to rally behind one or perhaps two species to be able to make the strongest possible argument for sequencing a butterfly genome. The creation of a ‘Butterfly Consortium’, similar to what has been put together for other organisms, is necessary to fuel discussions and overcome these types of challenges. As new technology reduces the cost of sequencing and enables the addition of new genomes (see Bonetta, 2006), the community will be well positioned to capitalize on the strength of lepidopteran diversity to study a wide array of biological processes.

Butterfly genomics eclosing

These are exciting times, as we witness the metamorphosing of butterflies from classical organisms in ecological and evolutionary analysis to players in the genomics era. Indeed, research on B. anynana and Heliconius highlights the utility of butterflies as models for evolutionary and ecological genomic research, both satisfying essential EEFG criteria (Feder and Mitchell-Olds, 2003). With expanding genomic resources, EEFG on butterflies promises to provide important insights into the links between developmental diversity, phenotypic variation, and macroevolution. Ultimately, the combination of new tools, extraordinary diversity, and a rich history of research in ecology and evolution will ensure that butterflies can fully realize the long promised potential illustrated by the words of the nineteen century naturalist H.W. Bates, ‘the study of butterflies – creatures selected as the types of airiness and frivolity…, will some day be valued as one of the most important branches of the Biological Sciences’ (Henry Walter Bates, The Naturalist on the River Amazons, 1864).