Introduction

Through all the major transitions in genetics over the past 100 years—from early mutation and mapping studies involving countless crosses and phenotypic analyses, to karyotyping and polytene chromosome banding, to the application of allozymes in population-level surveys, to the advent of complete genome sequencing and the rise of ‘evo-devo’—the fly Drosophila melanogaster has maintained its uncontested status as a preeminent model organism (Brookes, 2001; Beller and Oliver, 2006). Several entire volumes have been devoted to its use in experimental genetics (Demerec and Kaufman, 1996; Powell, 1997; Sulivan et al., 2000; Henderson, 2003; Ashburner et al., 2005), and it is estimated that there are well over 1000 research groups worldwide who use Drosophila as a key model (Clark et al., 2003). As Demerec and Kaufman (1996, p. 1) put it, ‘it would be not an exaggeration to say that we have learned more about the basic laws of heredity from the study of this fly than from work on all other organisms combined’.

As might be expected, the D. melanogaster genome was among the earliest to be sequenced for a metazoan (behind only Caenorhabditis elegans) and was the first animal genome to be completed using the now-standard shotgun approach (Adams et al., 2000; Celniker et al., 2002). Although genome sequences are available for other pairs of related species (for example, the rodents Mus musculus and Rattus norvegicus and the pufferfishes Takifugu rubripes and Tetraodon nigriviridis), the recently completed sequence of D. pseudoobscura (Richards et al., 2005) made Drosophila the first genus to be represented by multiple species. These have since been joined by 10 additional species: D. ananassae, D. erecta, D. grimshawi, D. mojavensis, D. persimilis, D. sechellia, D. simulans, D. virilis, D. willistoni and D. yakuba (Ashburner, 2007; Crosby et al., 2007; Drosophila 12 Genomes Consortium, 2007; Gilbert, 2007).

Drosophila species continue to contribute to important discoveries with regard to the complex nature of genome form, function and evolution. Most recently, studies of Drosophila genomes have provided new insights into the extent of chromosomal rearrangements (Bartolomé and Charlesworth, 2006), the occurrence of heterochromatic genes (Yasuhara and Wakimoto, 2006) and the potential role of some regions of noncoding DNA in both the origin of new genes (Levine et al., 2006; Casola et al., 2007) and the regulation of existing ones (Halligan and Keightley, 2006; Stark et al., 2007).

In light of these considerations, it is very surprising that only limited effort has been devoted to the study of genome size in Drosophila. Not including the data presented here, only 42 of the roughly 2600 species of Drosophila (that is, ∼1.6%) have been assessed with regard to this fundamental aspect of genome organization (Bosco et al., 2007; Gregory, 2008; see also Ashburner et al., 2005, which provides a preliminary subset of the present data). Many of the previous estimates have been based on non-best-practice methods, and even some that have used modern techniques such as flow cytometry have used potentially problematic staining protocols. This scarcity and limited reliability of drosophilid genome size data is a significant concern given the growing emphasis on interspecific comparisons in the family (Markow and O’Grady, 2006), the unparalleled focus on Drosophila in comparative genomics (Clark et al., 2003; Drosophila 12 Genomes Consortium, 2007; Stark et al., 2007) and the combined theoretical and practical importance of genome size data for large-scale genome research (Gregory, 2005a, 2005b).

As an early corrective, the present study provides new genome size data for 67 species in the family Drosophilidae using best-practice techniques involving more than 800 individual flies. The analysis includes representatives from the genera Chymomyza, Drosophila, Hirtodrosophila, Samoaia, Scaptodrosophila and Zaprionus; more than 40 of the species analyzed have never been the subject of genome quantification. In combination with previously published estimates, these genome size data are used to evaluate the phenotypic implications of bulk DNA content in the Drosophilidae. The results shed new light on the significance of genome size diversity in this family.

Materials and methods

Source of specimens

Specimens were obtained from the Tucson Drosophila Species Stock Center (http://stockcenter.arl.arizona.edu/), with the exception of the D. melanogaster Iso-1 strain which was provided by Jerry Rubin from the line sequenced in the D. melanogaster genome project. Species names and other taxonomic details follow the BioSystematic Database of World Diptera (http://www.sel.barc.usda.gov/Diptera/biosys.htm) and the TaxoDros Database (http://www.taxodros.unizh.ch/).

Flow cytometry

Genome sizes were estimated using neural tissue nuclei and a propidium iodide flow cytometry protocol described in detail elsewhere (Bennett et al., 2003; DeSalle et al., 2005). Briefly, this involved the grinding of heads in ice-cold Galbraith buffer using 2 ml Kontes dounce tissue grinders with a type A pestle to free individual nuclei, filtering through 20 μm nylon mesh to eliminate large debris, staining for 1–9 h in 50 μg ml−1 propidium iodide and analysis using a Coulter Epics Elite flow cytometer with a laser tuned to 514 nm and 500 mW. Given potential problems with overlap between fluorescence peaks from species of Drosophila (Figure 1), sperm and blood of the green pufferfish, T. fluviatilis, were used as internal standards rather than D. melanogaster Iso-1 (cf. DeSalle et al., 2005). However, the relative DNA content of T. fluviatilis was first established by comparison with female D. melanogaster Iso-1 strain (1C=0.18 pg), such that in the present study D. melanogaster can be considered an indirect secondary standard. All runs included at least five female D. melanogaster, so that linearity or other machine related problems could be detected. A second measurement run was performed in any cases in which deviations of more than 0.01 pg were registered among female D. melanogaster from the same sample colony. Standards and unknowns were run independently and directly against female D. melanogaster Iso-1 as necessary to distinguish among close peaks in the output histograms (Figure 1), and all genome size determinations were ultimately made using co-stained samples as per best-practice techniques (DeSalle et al., 2005).

Figure 1
figure 1

Sample histogram from the flow cytometric analysis of propidium iodide-stained neural nuclei of female D. melanogaster Iso-1 and D. virilis. 2C and 4C peaks were apparent for D. virilis, but any 4C nuclei from D. melanogaster were likely to have fallen under the D. virilis 2C peak as this species has a genome almost exactly twice as large. In cases where fly peaks were not so distantly separated, T. fluvialitis sperm was used as a relative internal standard, first calibrated against D. melanogaster Iso-1.

Comparative data

Data for various potential correlates of genome size were obtained from compiled lists available in the literature. Temperature-controlled developmental time data (duration from egg to adult in days at 18 °C) were taken from the Tucson Drosophila Species Stock Center, as reported in Markow and O’Grady (2006). Data for post-eclosion time to sexual maturity (in days) were also obtained from Markow and O’Grady (2006), whereas data with regard to sperm length (in mm) and body size (thorax length in mm) were derived from Pitnick et al. (1995), Kacmarczyk and Craddock (2000) and Moreteau et al. (2003). Karyotypic data were taken from Clayton and Wheeler (1975) and Clayton and Guest (1986). Total haploid chromosome number was computed independently of chromosome morphology, and a ‘karyotypic index’ that reflected a general measure of total chromosomal size was calculated according to the following formula:

where the letters refer to the number of chromosomes per morphological category as designated in Clayton and Wheeler (1975): V, V-shaped; J, J-shaped; R, rod; D, dot; v, small V-shaped; r, small rod.

Phylogenetic analyses

Two tree topologies (‘Tree 1: more genes’ and ‘Tree 2: more taxa’) were assembled from phylogenetic information available in the literature. The trees and detailed descriptions of their assembly are given in the Supplementary Material. Felsenstein's (1985) phylogenetically independent contrasts (PICs), positivized and forced through the origin, were calculated using log-transformed data under both tree topologies with the PDAP module (Midford et al., 2002) in Mesquite version 1.11 (Maddison and Maddison, 2006). Branch lengths were all set to 1 because the assembled trees did not contain branch length information. One degree of freedom was subtracted for soft polytomies (Purvis and Garland, 1993). In all comparative analyses (direct or phylogenetic), data from different strains and from male and female flies were averaged to give a single species value.

Results

General patterns of variation

The genome sizes of the 67 species analyzed in the present study varied nearly threefold, from 0. 14 pg in male D. mauritiana (Rob strain) and Hirtodrosophila pictiventris to 0.40 pg in female Chymomyza amoena (Table 1). With only a few exceptions, male flies had significantly smaller estimated genome sizes than female flies in any given species (paired t-test, P<0.0001; Table 1). The mean genome size among all species of Drosophila studied to date is 0.21 pg±0.005 s.e. (Table 1; Gregory, 2008). Some previously published estimates suggest both smaller and larger values in the genus (Gregory, 2008), however the lowest estimates must be interpreted with caution as they were obtained using older methods such as reassociation kinetics. Likewise, the value of 0.40 pg reported for D. nasutoides by Zacharias et al. (1982) was based on a Feulgen densitometry comparison of brain tissue versus a chicken blood standard, and may have been overestimated due to differences in DNA compaction level (and thus stain uptake) between the two tissue types. Bosco et al. (2007) reported values as high as 0.44 pg in some drosophilids, but these were acknowledged to be overestimates based on their choice of flow cytometric protocols. The present study therefore represents the most reliable estimate of the range of genome size diversity in Drosophila species and their relatives.

Table 1 New flow cytometry-based haploid genome size estimates for 67 species from six genera in the family Drosophilidae

A nested analysis of variance (ANOVA) using all available data indicated that 5% of the variation among the Drosophila species studied thus far occurs at the level of subgenera within the genus, 59% among groups within subgenera, 14% among subgroups within groups and 22% among species within subgroups. However, it bears noting that this analysis is sensitive to the taxa that are included, especially in terms of the representation of higher level categories. At the subgenus level, there is a significant difference in mean genome size between the subgenera Drosophila (mean=0.24±0.02 s.e.) and Sophophora (mean=0.20±0.004 s.e.) (t-test, P<0.04), which may reflect an underlying difference in modal chromosome numbers between the two subgenera (Drosophila: 2n=12; Sophophora: 2n=8) (Powell, 1997). Nevertheless, across the genus as a whole there were no significant correlations between genome size and either chromosome number or karyotypic index (a rough indicator of total chromosome size) with or without log transformation (all P>0.56, n=63). Phylogenetically corrected correlation analysis revealed a significant negative relationship between genome size and chromosome number, although this was marginally nonsignificant with one topology (Tree 1; r=−0.30, P<0.02; Tree 2: r=−0.23, P<0.09). No relationships were found between genome size and karyotypic index using either tree topology (Tree 1: P>0.43; Tree 2: P>0.18).

Finally, species were grouped according to biotic region of habitation following the designations provided by the BioSystematic Database of World Diptera (Afrotropical, Australasian/Oceanian, Nearctic, Neotropical, Oriental and Palaearctic) (Figure 2). Species from the Afrotropical region had the smallest mean genome size in absolute terms, but overall there were no clear differences among groups (ANOVA, P>0.05). Species inhabiting multiple regions had significantly smaller mean genomes than the pooled average of the other regions (t-test, P<0.05).

Figure 2
figure 2

Mean genome sizes for species of Drosophila grouped according to biotic region, following the designations provided by the BioSystematic Database of World Diptera. AF, Afrotropical; AU, Australasian/Oceanian; NE, Nearctic; NT, Neotropical; OR, Oriental; PA, Palaearctic; MU, multiple regions. There were no statistical differences between species inhabiting the different regions, nor between species inhabiting multiple regions and those restricted to a single region. The number of species included from each biotic region is given in parentheses. Error bars represent standard error.

Phenotypic correlates

A significant positive correlation was found between haploid genome size and development time from egg to adult at 18 °C (r=0.48, P=0.0004, n=50), indicating that at a constant temperature, species with smaller genome sizes develop more quickly than those with larger genomes (Figure 3a). Hierarchical taxonomic correlation analyses (Gregory, 2002a) showed that this relationship becomes increasingly strong at the subgroup (r=0.56, P<0.008, n=21) and group (r=0.82, P<0.03, n=7) levels. The relationship between genome size and development time remained significant when controlled for phylogenetic nonindependence of species data using either of the two tree topologies (Tree 1: r=0.34, P<0.02; Tree 2: r=0.35, P<0.02; Figure 3b).

Figure 3
figure 3

The relationship between genome size and duration of development from egg to adult at 18 °C in 50 species of the genus Drosophila. (a) The relationship is significant at the species level using Pearson's correlation (r=0.48, P=0.0004). It also persists when the analysis is conducted at the subgroup (r=0.56, P<0.008, n=21) and group (r=0.82, P<0.03, n=7) levels. (b) The relationship is also significant with phylogenetically independent contrasts (data shown are from Tree 1: r=0.34, P<0.02; results were very similar using Tree 2: r=0.35, P<0.02; see Supplementary Material for details about the alternate trees).

Although data were available for a relatively small number of species, preliminary relationships were found between genome size and additional biological characteristics (Figure 4). A significant positive correlation was noted between genome size and post-eclosion time to sexual maturity in male flies (r=0.63, P<0.02, n=15), but not in female flies (P=0.79), the latter of which appears highly constrained in this character. A significant positive correlation was also observed between genome size and thorax length in both male flies (r=0.53, P<0.005, n=26) and female flies (r=0.47, P<0.02, n=26), and a significant positive association was found between genome size and sperm length (r=0.62, P<0.02, n=15). Unlike with developmental duration, these relationships did not remain significant following phylogenetic correction (all P>0.19), but this could reflect the sensitivity of such analyses to small sample sizes (Martins et al., 2002).

Figure 4
figure 4

Relationships between genome size and various phenotypic parameters in the genus Drosophila. (a) Post-eclosion time to sexual maturity versus genome size in male flies (•, solid line; r=0.63, P<0.02) and female flies (○, dashed line; P=0.79). Eclosion time data were log(x+1)-transformed as male individuals of some species have a value of 0 for this parameter. (b) Thorax length versus genome size in male flies (•, solid line; r=0.53, P<0.005) and female flies (○, dashed line; r=0.47, P<0.02). (c) Sperm length versus genome size (r=0.62, P<0.02). Note that, unlike the correlation with development time (Figure 3), these relationships generally did not persist following phylogenetic correction.

Insights from 12 Drosophila genome sequences

The results of correlation analyses using data from the 12 sequenced Drosophila genomes are presented in Table 2. The number of predicted protein-coding genes per genome did not correlate with genome size, nor did the number of predicted pseudogenes. Although they did not vary greatly among species, the percentages of the genome composed of coding genes and introns were both inversely correlated with genome size, indicating that noncoding DNA amount is the driving force behind genome size diversity in this genus. In keeping with this, satellite DNA and (euchromatic) transposable element content were the strongest positive correlates of genome size.

Table 2 Results of correlation analyses against haploid genome size (pg) using data from the 12 sequenced Drosophila genomes

Discussion

The drosophilid genome size data set

The present study more than doubles the available genome size data set to 74 species of Drosophila, including representatives of several more species groups, and broadens the coverage of the family Drosophilidae by including members of five additional genera (Chymomyza, Hirtodrosophila, Samoaia, Scaptodrosophila and Zaprionus). Drosophilids display genome sizes much smaller than the average of all currently available insect data (0.22 vs 1.56 pg). Their genomes are also significantly smaller than the current average for Diptera (0.61 pg), and appear to be less than 25% as large as those of the next best-studied group of true flies, the mosquitoes (family Culicidae) (Gregory, 2008).

Since the late 1940s, genome size has been considered a (mostly) constant characteristic within species, and limited intraspecific variation is a standard assumption in measurements and comparative analyses of genome size. Modern methods of genome size estimation are sufficiently rapid and precise to allow investigations of intraspecific variation, however this is only true if best-practice methods are implemented as there are numerous sources of error that could compromise such analyses (Greilhuber, 1998).

Following the conventions of the botanical literature (in which the issue has been far more thoroughly investigated than in animals), only genome size differences within species that are based on demonstrable chromosomal or sequence-level effects are sufficiently convincing to qualify as ‘orthodox’ intraspecific variation (Greilhuber, 1998; Bennett and Leitch, 2005). Some species of Drosophila studied in this and other reports exhibit orthodox intraspecific variation on the basis of the chromosomal sex determination system(s) in the genus. Repeatable variation among highly inbred laboratory strains is also apparent in some species (Table 1; Bosco et al., 2007), but it is unclear to what extent this reflects the situation in nature. Some studies have been conducted to investigate this in Diptera (Rao and Rai, 1987; Black and Rai, 1988; Kumar and Rai, 1990; Vieira et al., 2002), but to date best-practice methods in combination with extensive sampling have yet to be brought to bear on the question.

Phenotypic consequences of genome size variation

On the basis of the data available at the time, Powell (1997, p. 303) argued that:

One of the major adaptive hypotheses about genome size is that it is related to rate of cell division and thus development time; furthermore, larger genomes are associated with larger cell volume, which often increases the size of the organism. This explanation does not seem to hold for Drosophila. Species with relatively large genomes like D. virilis and D. funebris are quite large and slowly developing, but D. willistoni, with [a] genome size nearly identical to D. funebris, is a rapidly developing small species. D. hydei and its close relatives including D. neohydei, are nearly as large as D. virilis. Finally, D. nasutoides is not particularly large or slowly developing, yet has a huge genome relative to other drosophilids.

In contrast to this assessment, the present study revealed a strong positive relationship between genome size and the temperature-controlled duration of development that persists when the data are analyzed at multiple taxonomic levels or corrected for phylogenetic nonindependence (Figure 3). Moreover, the pattern observed using the present data set indicates that this relationship is in fact fully consistent with every proposed counterexample listed by Powell (1997). Thus, D. virilis development is indeed comparatively slow (20 days at 18 °C), and while D. funebris data were not available at 18 °C, its development is likewise relatively protracted when data from higher temperatures are considered (18 days at 21 °C). D. willistoni is not rapidly developing compared to other species in the present data set, rather it exhibits both an above-average genome size (0.24 vs 0.21 pg; Powell, 1997) and slightly slower than average development (15.5 vs 15.1 days). D. neohydei was not included in the present analysis as developmental data were not provided in the source data set, but D. hydei was not an outlier in the correlation. D. nasutoides, with a reported (but somewhat questionable) genome size estimate of 0.40 pg, would be expected to exhibit slow development, and although data were not available for development at 18 °C, this species has been reported to take 19 days at 20 °C—this is slower than all other species in the data set assessed at 18 °C, with the notable exception of D. virilis. In addition, it appears that post-eclosion time to sexual maturity may be related to genome size in male (but not female) flies.

Although sample sizes need to be expanded considerably before any reliable conclusions can be drawn, the present results are suggestive of a positive association between genome size and body size (Figure 4). This is consistent with earlier reports of positive correlations between wing cell area and measures of body size, including thorax length, wing length, whole body length, fore-tibia length and body mass in a sample of six Hawaiian and two non-Hawaiian species of Drosophila (Kacmarczyk and Craddock, 2000). Cell size and body size were also reported to be correlated positively with genome size in the same species (Craddock et al., 2000), although details of this analysis were not provided.

At first sight it may seem remarkable that phenotypic relationships can be determined at all within the narrow absolute range of genome sizes exhibited by these insects, especially considering the additional error that is introduced by compiling phenotypic and genomic data from several different studies. However, in relative terms the reported range in Drosophila is at least 2.5-fold, and it should be borne in mind that correlations with cell size and metabolic rate can be found within birds, whose genome sizes vary by a smaller relative margin (∼2.2-fold; Gregory, 2002b). It would seem that in organisms like Drosophila, which reside near the extreme of diminutive body size and complex but rapid development, even small absolute variations in DNA content may be biologically relevant.

Sources and significance of genome size diversity

Large-scale features such as chromosome number do not appear to be related to genome size diversity in the genus Drosophila, but there is mounting evidence that several subgenomic components do scale with total genome size. For example, it has been noted that D. virilis has both longer introns (Moriyama et al., 1998) and lengthier stretches of microsatellites (Schlötterer and Harr, 2000) than D. melanogaster, in keeping with its roughly twofold larger genome. Rough comparisons between D. melanogaster and its close relative D. simulans also suggest that larger-genomed species may contain more transposable elements (Biémont and Cizeron, 1999; Vieira et al., 2002; Vieira and Biémont, 2004). Using data from the 12 Drosophila genomes, it appears that microsatellite and transposable element content, at least, probably do scale with genome size across the genus (Table 2), as do the total and pseudogenized copy numbers of chemoreceptor genes (Gardiner et al., 2008). However, copy number of the DINE-I element in particular does not scale with genome size (Table 2).

With regard to transposable elements, it has been argued that the stress inherent in the invasion of new environments triggers their spread, which generates significant genome size variation within, and later presumably among, species of Drosophila (Biémont et al., 2001; Vieira et al., 1999, 2002; Nardon et al., 2005). Although this is an interesting possibility, caution is warranted when extrapolating to larger taxonomic scales. Notably, species inhabiting multiple biotic regions exhibit smaller, not larger, average genome sizes than those of species restricted to a single region. Moreover, the climatic differences among biotic regions themselves, which surely differ in their degrees of environmental ‘stress’, do not appear to contribute substantially to the pattern of genome size diversity in the genus (Figure 2).

Recent analyses of introns in D. melanogaster have suggested that these sequences are under selective constraints even though they do not encode any protein products (Andolfatto, 2005; Marais et al., 2005). A subsequent comparison of the D. melanogaster and D. simulans genomes indicated that >50% of point mutations in both intronic and intergenic elements are removed by selection (Halligan and Keightley, 2006). Such high levels of selective constraint could be suggestive of function, for example relating to gene regulation, and it has been argued that ‘there is now increasing evidence that the Drosophila genome may be highly compact and contain very little nonfunctional DNA’ (Halligan and Keightley, 2006, p. 881). If so, then the maintenance of noncoding DNA in Drosophila genomes could be explained in terms of adaptive benefit at the organismal level, rather than through (or perhaps in addition to) mutation pressure exerted by selfish elements. However, this does not provide an answer as to why there is a general paucity of noncoding DNA in this genus, even as compared to other Diptera.

As Powell noted in 1997 (p. 304), ‘an adaptive basis of variation in genome size in Drosophila remains to be documented’, and it is therefore perhaps not surprising that many recent discussions have focused on neutralist models of genome size change (Petrov, 2002; Boulesteix et al., 2006; Halligan and Keightley, 2006). In contrast, the present study provides evidence that genome size is indeed of biological significance in Drosophila through its effects on developmental rate (Figure 3) and perhaps also on body size and/or sperm size (Figure 4). Thus, there may be pressures to reduce genome size due to organismal effects, which are limited by genome level constraints on the deletion of a moderate quantity of functional non-genic DNA.

Concluding remarks

The broad view of genomic diversity that is being adopted in Drosophila research by expanding beyond the core set of model species is a very welcome development. As a result, there is little doubt that flies in the family Drosophilidae will feature prominently in future studies of both genome sequences and genome sizes, and that the integration of these two levels of analysis will provide many mutually enlightening insights. In combination, such studies will make it possible to decipher the complex relationships that link the genome and the phenotype, which clearly extends beyond the influence of individual protein-coding genes. As a start, the role of (some) noncoding sequences in structural and regulatory capacities is becoming more widely appreciated, thanks in part to work with Drosophila, and the results of the present study suggest that its bulk amount is also important through effects on traits of fundamental biological importance such as development and morphology. Many challenging questions remain with regard to the mechanisms of gain and loss, potential functions and phenotypic consequences of noncoding DNA, but as they enter their second century of undisputed preeminence in genetics research, these humble flies are poised to contribute mightily in the quest to answer each of them.