Variation among 532 genomes unveils the origin and evolutionary history of a global insect herbivore

The diamondback moth, Plutella xylostella is a cosmopolitan pest that has evolved resistance to all classes of insecticide, and costs the world economy an estimated US $4-5 billion annually. We analyse patterns of variation among 532 P. xylostella genomes, representing a worldwide sample of 114 populations. We find evidence that suggests South America is the geographical area of origin of this species, challenging earlier hypotheses of an Old-World origin. Our analysis indicates that Plutella xylostella has experienced three major expansions across the world, mainly facilitated by European colonization and global trade. We identify genomic signatures of selection in genes related to metabolic and signaling pathways that could be evidence of environmental adaptation. This evolutionary history of P. xylostella provides insights into transoceanic movements that have enabled it to become a worldwide pest.

Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, seeAuthors & Referees and theEditorial Policy Checklist .

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. Diamondback moths, Plutella xylostella, (regardless of age and sex) were collected from cruciferous vegetable fields in each sampling locations. Field-collected samples were morphologically inspected and genetically checked with COI sequences to confirm their identity. The samples were preserved in 95% alcohol at -80 prior to DNA extraction.
We used an average of five (fully sequenced) individuals per site to give a robust 'picture' of the genomic variability among individuals for that site as well as to compare and contrast with individual from other sites. The number of sites was set to give comprehensive coverage of all geographical regions in which this species is present, including all zones that earlier work had suggested to be the origin.
Within each sampling location, larvae, pupae, and adults were collected from cruciferous vegetable fields by our team members and local entomologists.
The global sample of P. xylostella was collected during 2012-2014 from 114 locations that cover broad regions throughout the world, with 13 samples from Africa and Madagascar, 43 samples from Asia, 13 samples from Europe, 26 samples from North America including Hawaii, 12 samples from South America, and 7 samples from Oceania. Our collection covered an extensive scope of the eco-climatic index and areas that support differing numbers of annual generations, including those regions with year-round persistence of P. xylostella to others that are only seasonably suitable for growth and development of the species Individuals with poor DNA quality were excluded. Also excluded were individuals with resequencing data that yielded low mapping rate (<60%) and low genome coverage (<60%). None of the individuals that yielded adequate quality data were excluded for any of the regions or sites.
An average of approximately five individuals from each of the sampling locations were used for DNA extraction and sequencing, making a total of 532 individuals (with adequate quality data) in 114 locations across 55 countries worldwide. Using nuclear and mitochondrial genomes as well as COI sequences of 532 individual samples, we analyzed the phylogenetic relationships and uncovered the origin and expansion routes of P. xylostella. We believe that our findings are convincing and reproducible.
Our sampling locations (114) were randomly selected in different regions according to the geographical and climatic conditions that are suitable for growth and development of P. xylostella. Within each of the locations, larvae, pupae and adults were randomly collected from cruciferous vegetable fields.
To avoid unintentional biases, the samples were each allocated a code number that was cryptic in not allowing anyone involved in handling or analysis to identify the origin of the insect, DNA or associated genomic data. Only at the late stage of tree construction were the samples re-identified.