Introduction

Globalization facilitates the introduction of invasive species that can damage native ecosystems, cause severe economic losses and threaten human health1,2. Early detection and rapid response are cornerstones of successful management strategies. However, identification can be problematic, first and foremost due to a lack in taxonomic expertise3,4. DNA-based approaches such as ‘DNA barcoding’ potentially provide relatively rapid and inexpensive species identifications4. Although the reliability and usefulness of DNA barcoding are subject of extensive debate5,6,7,8, this method has been shown to effectively distinguish species in many animal groups9,10,11,12,13. By comparing a short, standardized fragment of the mitochondrial gene cytochrome c oxidase I (COI) to a reference DNA barcode library, animal specimens can usually be assigned to species, as long as the database contains relevant reference sequences14,15. This approach has been applied to native populations of diverse animals16,17,18,19,20,21 and a variety of invasive taxa3,22,23,24,25. Successful invaders are generally presumed to represent expansions of a small founder population and are therefore likely genetically uniform24,26,27,28,29. For example, the invasive gall wasp Quadrastichus erythrinae shows a complete lack of mitochondrial as well as nuclear diversity across the Pacific, including Japan, Hawaii, Guam and Samoa30, presumably reflecting a single outbreak starting with a small number of individuals. On the other hand, invasive populations may also represent multiple independent introductions, potentially from genetically distinct source populations31,32. Such a pattern of introduction may complicate the identification of invasive species via DNA barcodes and may require more extensive sampling to establish a reliable reference library.

Here we assess DNA barcode variation in invasive populations of the American cockroach, Periplaneta americana (Linnaeus), one of the most abundant, widely distributed and hated urban pests33,34. Though the native range of P. americana is unknown (possibly tropical Africa or South Asia34,35,36), all urban populations will be considered invasive in this study as it seems likely that they were established via human-aided dispersal. According to the World Health Organization, cockroaches are highly damaging pests worldwide in terms of potential health problems (allergies, asthma and transmission of pathogens by contaminating food) and costs for pest control37. Furthermore, cockroaches are uniquely unpopular as most people find them disgusting, associating their presence with uncleanliness and disease38.

Although abundant and living in close proximity to humans, surprisingly little is known about P. americana biology outside the laboratory. Aspects of infestation control have been examined39,40,41,42, but few studies have investigated the biology of P. americana in urban settings33,43. Particularly strikingly, data on genetic variation is all but absent. The goal of this study was twofold: (1) to assess the genetic diversity of P. americana in urban populations, thus creating a valuable reference DNA barcode library; (2) to test the potential of DNA barcoding for quick and accurate molecular identification of this urban pest species. To facilitate specimen collection, we set up a citizen science project based in New York City (NYC), a center of world commerce and home to an abundance of cockroaches as well as more than eight million people44,45,46. We received several hundred specimens from scores of locations in NYC and beyond, enabling the first large-scale study of genetic diversity and providing a reference DNA barcode library for this important urban pest. We found deeply divergent mitochondrial lineages that could naively be interpreted as indicative of multiple cryptic species. However, nuclear DNA sequence data revealed extensive gene flow between mitochondrial clades, consistent with a single biological species. Our findings highlight the value of expanded sampling to accurately delineate species boundaries via DNA barcoding, in particular for invasive species.

Results

The collection effort of 85 participants generated 284 specimens from 16 U.S. states and Argentina, Australia, Belize, Guyana, Spain and Venezuela (Supplementary Tab. S1). In addition, 24 P. americana COI GenBank records were included in genetic analyses representing samples from Iran, China and Korea (Supplementary Tab. S2). Specimen conditions ranged from well preserved to substantially damaged (Fig. 1). COI barcodes were successfully recovered from 238 cockroach specimens (including 223 P. americana specimens), while either PCR or sequencing failed repeatedly for 46 specimens (Supplementary Tab. S1). Seven sequences were trimmed at one end (range: 3–30 base pairs (bp)) due to low quality signals. The remainder were high-quality, full-length reads of 658 bp (N = 231) that contained 43 variable and parsimony-informative sites, as well as three singletons. Among the failed 46 specimens were eight specimens morphologically identified as the German cockroach (Blatella germanica). These specimens yielded a short PCR product (approximately 400 bp). Sequencing of one of these short PCR products revealed a putative nuclear pseudogene with a 285 bp internal deletion and multiple amino acid substitutions compared to available B. germanica GenBank records, including a complete mitochondrial genome (GenBank EU854321). This aberrant sequence was identical to B. germanica COI GenBank records KC473901 and KC473904, which thus presumably represent the same pseudogene. Except for that short sequence, no stop codons, unusual amino acid substitutions, or internal sequence deletions were found in any other sequence, indicating that all other sequences were likely functional mitochondrial sequences and not nuclear pseudogenes. In addition to COI barcodes, we analyzed a portion of the nuclear gene wingless (wg) for a subset of specimens. All wg amplifications yielded a full length, high quality sequence (N = 80; 77 P. americana and three P. fuliginosa specimens). The wg alignment for P. americana contained 8 variable and parsimony-informative sites, as well as one singleton.

Figure 1
figure 1

Representative P. americana specimens, ranging from well preserved (left) to substantially damaged (right).

Scale bars are 5 mm.

The American cockroach showed an unusual pattern of genetic diversity with three major mitochondrial haplogroups (A, B, C) (Figs. 2, 3). Including the 24 GenBank sequences, the three groups comprised 15 haplotypes, six of which were represented in NYC (Supplementary Tables S1, S2). Mean p-distances between groups were larger (ranging from 2.38% to 4.65%) than within groups (mean p-distance ± SD, A: 0.12% ± 0.12%, B: 0.57% ± 0.52%, C: 0.03% ± 0.14%; N = 247 specimens) (Fig. 3). Among NYC specimens, p-distances were very similar to those in the complete dataset (mean p-distances between groups: 2.34%–4.63%; variation within groups, A: 0.10% ± 0.09%, B: 0.44% ± 0.52%, C: 0.00% ± 0%; N = 161 specimens). Thus, apparent ‘barcoding gaps’ existed between maximum intra-haplogroup and minimum inter-haplogroup p-distances (Fig. 3).

Figure 2
figure 2

Klee diagram of COI barcodes.

A Klee diagram is a colorized matrix of indicator vector correlations; identical sequences have a correlation of 187. The dataset comprises 247 P. americana sequences (including 24 from GenBank). The matrix is arranged according to a NJ tree. Labeled blocks of high correlation along the diagonal correspond to P. americana haplogroups A, B and C. The 15 individual haplotypes appear as sub-boxes. The color code for correlation coefficients is given on the right.

Figure 3
figure 3

Histogram showing intra- and inter-haplogroup p-distances between P. americana mitochondrial COI sequences (N = 247).

An apparent barcode gap (red arrow) separates maximum distances within and minimum distances between haplogroups. Mean ± SD p-distances between groups are shown.

Mitochondrial (COI) and nuclear (wg) phylogenetic trees were highly discordant, i.e., nuclear gene analysis did not recover the three major clades (Fig. 4). Specimens from different COI haplogroups shared the same wg alleles and specimens from the same haplogroup carried different wg alleles. Overall, we found no genetic differentiation at the wg locus among COI haplogroups, indicating interbreeding (within NYC population: overall FST = 0.041, P = 0.100, individuals: Nhaplogroup A = 20, Nhaplogroup B = 26, Nhaplogroup C = 14, alleles: Nhaplogroup A = 7, Nhaplogroup B = 6, Nhaplogroup C = 6; pairwise comparisons: haplogroup A vs. B: FST = 0.069, P = 0.053; haplogroup A vs. C: FST = 0.045, P = 0.13; haplogroup B vs. C: FST = −0.007, P = 0.462). The three haplogroups had broadly overlapping geographic distributions, including within NYC, across the U.S. and elsewhere in the world (Fig. 5).

Figure 4
figure 4

Lack of congruence between mitochondrial and nuclear data.

Comparison of NJ trees based on Tamura-Nei distances for mitochondrial (COI, N = 256) and nuclear (wg, N = 80) DNA sequences of P. americana and P. fuliginosa. Colors depict different COI haplogroups. Numbers on the wg tree give the sample sizes (number of individuals) and boxes in one row represent individuals with the same allele combination. Periplaneta fuliginosa was used as outgroup in both analyses. Bootstrap support for major nodes is shown. Abbreviations: N = number of individuals, * = homozygous individuals.

Figure 5
figure 5

Distribution of P. americana COI haplogroups in NYC, continental U.S. and the world.

Sample sizes per site are given in parentheses, haplogroup colors and designations correspond to those in Figure 4. NYC zip codes and U.S. states are outlined on the maps in the upper left and upper right, respectively. Sequences retrieved from GenBank are marked with asterisks. Maps were created with Adobe Photoshop. Map templates are from d-maps.com (US, http://d-maps.com/carte.php?num_car=5222&lang=en; world, http://d-maps.com/carte.php?num_car=3267&lang=en) and U.S. Census Bureau (NYC zip codes, https://www.census.gov/geo/maps-data/data/cbf/cbf_zcta.html).

All P. americana COI haplotypes formed a monophyletic lineage separate from other Periplaneta species obtained in this study or represented in GenBank (Supplementary Fig. S1). The closest p-distance was found with Periplaneta australasiae (minimum sequence divergence 7.4%). In addition to P. americana, four other cockroach species were identified by COI barcode in this study, comprising about 6% of our specimens, including Smokybrown cockroach (P. fuliginosa) (N = 9), Brown-banded cockroach (Supella longipalpa) (N = 2), Turkestan cockroach (Shelfordella lateralis) (N = 1) and Madagascar hissing cockroach (Gromphadorhina portentosa) (N = 1) (Supplementary Tab. S1). Finally, two specimens, one from the southern U.S. (Georgia) and one from Venezuela, could not be identified by barcode (Supplementary Tab. S1).

Discussion

This is the first large-scale study of mitochondrial diversity in the American cockroach (Periplaneta americana). Surprisingly we found three deeply-divergent, widely-distributed P. americana COI haplogroups. In a limited geographic area such as NYC, deeply divergent mtDNA lineages within a single species are unusual. Several scenarios can explain such a genetic pattern, which are not mutually exclusive: historical introgression between species47,48,49,50,51,52,53, manipulation by endosymbiotic bacteria54,55, secondary contact of formerly isolated populations56,57, or reproductive barriers among sympatric cryptic species17,19. The latter scenario does not apply in our case. The presence of a “barcode gap” between the maximum distances within and minimum distances between haplogroups has often been proposed to indicate cryptic species in native populations7,16,18,19,58,59 (however, for critical views see refs. 8, 55, 60, 61). Although we observed clear barcode gaps in P. americana, the nuclear data are indicative of a single biological species. Cryptic speciation should be reflected in a detectable differentiation at nuclear markers, in particular in such deeply divergent mtDNA clades (up to 4.6% sequence divergence). This was not the case and our analysis of nuclear sequences revealed extensive gene flow among COI haplogroups.

Discordant phylogenetic signals between maternally and biparentally inherited markers are sometimes due to infection with endosymbiotic bacteria54,55. Wolbachia bacteria, for example, transmit maternally and manipulate host reproduction in favor of infected females62,63. As mitochondria are maternally transmitted as well, selection favors those mtDNA types that are associated with Wolbachia infections, which can create unexpected mitochondrial population structures55. Vaishampayan and colleagues64 detected Wolbachia infections in cockroaches of the genera Blattella and Supella, but not in P. americana, suggesting that Wolbachia infections in P. americana might be rare or absent. In general, Wolbachia infections will substantially reduce mtDNA diversity in a given population and skew the frequency distribution of alleles towards a single or very few variants (the latter in cases of multiple infections)55,65,66. The pattern of mtDNA diversity detected in this study does, however, not reflect the typical pattern of reduced haplotype diversity found in Wolbachia-infected populations. We detected six different, divergent haplotypes in a single population of P. americana in NYC and this general pattern seems to hold for other populations around the globe. Thus, it seems unlikely that Wolbachia infections have played a major role in creating the unusual pattern of mtDNA variation detected in this study. However, additional work screening specifically for Wolbachia infections will be required to conclusively rule out this possibility. Likewise, there is currently no evidence for historical introgression from other Periplaneta species. All available P. americana COI barcodes formed a monophyletic lineage clearly separated from congeneric species (Supplementary Fig. S1). Future studies may, however, uncover overlap in mitochondrial haplotypes with other species not yet represented in databases.

Currently, the most likely explanation for the detected genetic pattern is multiple human-mediated introductions from allopatric source populations followed by global dispersal among commercial centers. In fact, the different haplogroups must have diverged long before human-aided dispersal, even if the highest mutation rate estimates of insect mtDNA are applied (10–20% sequence divergence per million years; see Papadopoulou et al. 2010 for a review on mtDNA clocks in insects67). Periplaneta americana's ability to inhabit human-built structures33,34 has probably facilitated its introduction to new areas. In general, human-mediated transport creates many opportunities for introduction and interbreeding of previously isolated species or populations1,2. For example, Ruddy Ducks artificially introduced to the UK from North America hybridize freely with the indigenous White-headed Duck, effectively threatening extinction of the native form68. In the present case, it appears that P. americana individuals from three or more historically isolated geographic populations are now effectively merged into a single global gene pool.

Invasive species often represent multiple introductions from genetically distinct source populations and interbreeding may both be common and critical for long-term invasion success1,26,31,32,69,70,71,72. For example, interbreeding between distinct genetic lineages in the multicolored ladybird beetle (Harmonia axyridis) led to shifts in key life history traits enabling invasion success73,74. Conversely, invasive populations are often derived from small founding populations and this genetic bottleneck can entail a substantial reduction in genetic diversity and lead to inbreeding depression1. The success of invasive species in their non-native range might be temporary in many cases, because prolonged inbreeding generally leads to a decrease in fitness75,76,77 and, possibly, population extinctions78,79. For example, inbreeding depression has been suggested as one mechanism underlying the collapse of New Zealand populations of one of the world's worst invasive pests, the Argentine ant (Linepithema humile)80. Empirical research on the influence of multiple introduction events on invasion success has just begun1, but examples suggest that preventing repeated introductions may reduce adaptive potential in some cases32 and thus may facilitate the long-term control of seemingly well-established invasive pests.

The spread of invasive species and thus of pest species like P. americana is expected to be facilitated and intensified by globalization. Our case study shows that species delimitation using a classical DNA barcoding approach can be particularly problematic when studying populations of invasive species, because existing populations might represent a mix of individuals from genetically distinct source populations. In such cases, it will be especially important to supplement mitochondrial DNA barcoding with nuclear genetic and/or morphological data. Nonetheless, with the comparatively large reference library provided in this study, DNA barcoding now has great potential to be employed as a quick and reliable tool to identify urban populations of the American cockroach.

Methods

Specimen collection and processing

To obtain specimens of the American cockroach, we set up a citizen science project based in New York City (NYC). The project was launched between November 2012 and January 2014 starting with a website including participant instructions (http://phe.rockefeller.edu/barcode/cockroachproject.html). We sought publicity via word of mouth and social media, as well as traditional outlets including television, newspapers and radio. Contributors were asked to provide specimen collection date and location and their name and contact information and to send unpreserved dead specimens via regular mail.

Each specimen was assigned a code, transferred to an individual 50 mL Falcon vial and stored at −30°C until further processing. Collection information was recorded in a spreadsheet (Supplementary Tab. S1). Six well-preserved P. americana specimens from each haplogroup (A, B and C) were morphologically identified using the key of Helfer81. These samples are stored at the American Museum of Natural History (see Supplementary Tab. S1 for unique specimen identifiers). Other cockroach specimens were identified to species via DNA barcodes. COI barcode sequences were available in GenBank for most of the common urban roaches (see Supplementary Tab. S2 for reference sequences), allowing us to assign species names even to morphologically unrecognizable specimens.

Genetic analysis

Genetic analysis was performed on 283 specimens (Supplementary Tab. S1). DNA was extracted from cockroach leg fragments using the QIAGEN® DNeasy® kit and stored at −30°C. The mitochondrial COI barcode region (658 base pairs) and a portion of the nuclear gene wingless (wg; 378 bp) were amplified in standard polymerase chain reactions (PCRs) using the primers LCO1490/HCO219882 and wg550F83/wgcockR, respectively. The primer wgcockR (sequence: 5′ AACATGCACGCACACCTCTGCACCACGGACACC 3′) was designed specifically for this study, because longer fragments using primers wg550F/wgAbrZ83 did not amplify consistently. PCRs were set up in 25 μl reaction volumes containing 14.3 μl AccuGENE® water, 0.2 μl AmpliTaq Gold®, 2.5 μl 10× PCR buffer, 2.5 μl MgCl2 (25 mM), 2.5 μl dNTPs (2 mM each), 2 μl DNA template and 0.5 μl of each of the respective primers (10 μM each). An initial denaturation step of 5 min at 95°C was followed by 40 cycles (95°C for 40 s; 55°C (COI) or 64°C (wg) for 40 s; 72°C for 40 s) and a final extension of 15 min at 72°C using an Eppendorf Mastercycler® proS. Purification and sequencing of PCR products was performed by commercial facilities (Macrogen USA or Eton Bioscience). All PCR products were sequenced in both directions. Sequences are deposited under accession numbers KM576918–KM577157 (COI) and KM591602–KM591681 (wg) in GenBank (Supplementary Tab. S1).

Data management and analysis

The laboratory information management system (LIMS) implemented in the software Geneious® (version 7.1.7) with the biocode plugin (version 2.8.0) was used to track all workflows including collection data, extractions, PCRs and cycle sequencing84. Sequences were aligned and trimmed in Geneious®. Neighbor-joining trees based on Tamura-Nei distances with bootstrap support (1,000 replicates) were created using Geneious® Tree Builder and a Randomized Axelerated Maximum Likelihood85 tree (RAxML; version 8.1.X) was created using the GTR + gamma model. MEGA 686 was used to assess number of variable, parsimony-informative sites and singletons and to calculate p-distances with gaps deleted in pairwise comparisons. Tree-Parser-aided Klee diagrams were created as described in Stoeckle and Coffran (2013)87. The level of genetic differentiation for the nuclear gene wg among different COI haplogroups was calculated as pairwise FST in the software FSTAT 2.9.3.288 following Weir and Cockerham (1984)89. Statistical significance assuming Hardy-Weinberg equilibrium was assessed using randomization tests with 10,000 iterations.

Five wg alleles segregating in the population were directly observed in homozygous individuals. Three additional alleles were inferred from heterozygous individuals by subtracting one of the five alleles observed in the homozygous state. In the vast majority of heterozygous individuals, the chromatogram could only be explained by a single combination of two of these eight alleles. Only in two heterozygous individuals was there ambiguity over the respective allele combination and therefore we omitted those individuals from the FST analysis.