A new comprehensive trait database of European and Maghreb butterflies, Papilionoidea

Trait-based analyses explaining the different responses of species and communities to environmental changes are increasing in frequency. European butterflies are an indicator group that responds rapidly to environmental changes with extensive citizen science contributions to documenting changes of abundance and distribution. Species traits have been used to explain long- and short-term responses to climate, land-use and vegetation changes. Studies are often characterised by limited trait sets being used, with risks that the relative roles of different traits are not fully explored. Butterfly trait information is dispersed amongst various sources and descriptions sometimes differ between sources. We have therefore drawn together multiple information sets to provide a comprehensive trait database covering 542 taxa and 25 traits described by 217 variables and sub-states of the butterflies of Europe and Maghreb (northwest Africa) which should serve for improved trait-based ecological, conservation-related, phylogeographic and evolutionary studies of this group of insects. We provide this data in two forms; the basic data and as processed continuous and multinomial data, to enhance its potential usage.


Background & Summary
The taxonomy, distribution, and biology of European butterflies has been studied since the 18 th century. Due to the precise knowledge of changes in distribution and abundance, driven by extensive citizen science contributions, and their trophic specialisation and immediate responses to environmental changes 1 they are frequently used as indicators of environmental change 2 . Recently, a series of comprehensive resources have been published for European butterflies comprising a detailed taxonomic list 3 , a dataset for 15,609 sequences for the COI mitochondrial markers for all Western-Central European species 4 , a dated phylogenetic tree for all European species 5 , atlases describing their detailed distributions 6 , and climatic risk assessments 7 . In turn, species traits are fundamental descriptors of feeding ecology, life-history, morphology, resource use, behaviour and physiological constraints 8 . It has long been recognised 9 that the availability of such data is largely limited and incomplete. Only recently has a geographically extensive series of traits describing climatic preferences based on temperature and precipitation been produced 10 along with a series of six traits describing some features of feeding ecology,

Methods
Taxon and geographic coverage. Our dataset (542 taxa) represents the complete butterfly fauna of mainland Europe including the western parts of Russia, the European islands, Macaronesia and the Maghreb (North Africa). This includes all of the 496 species occurring in Europe according to the latest checklist of European butterflies 3 , but we have also included taxa ( Table 1) that are confined the Maghreb or have very divergent traits from the nominate species according to some sources within our study area (Fig. 1). The nomenclature is consistent with that used in the checklist 3 . Trait information is recorded for all the families of butterflies included in the geographic area (Papilionidae, Hesperiidae, Pieridae, Riodinidae, Lycaenidae and Nymphalidae). For species that also occur beyond the study area, trait information was taken from the main study area, if possible. For example, for those species that have a pan-Palearctic distribution only information from the European range was included in our dataset.
Trait information was gathered from sources including field guides, books and atlases  , scientific papers  , and some selected online resources (https:/butterfliesoffrance.com, https://iucnredlist.org, https:// lepidoptera.sk, http://luontoportti.com/suomi/en http://leps.it, http://eurobutterflies.com http://www.lepiforum.de, https://minambiente.it/home_natura https://micheltarrier.com/micheltarrier-com/rhopalocera, http:// babochki-kavkaza.ru, http://pyrgus.de, http://butterflyeurope.co.uk, http://lepiforum.de) and direct observation in the field. Species-specific information sources are given in the database 24 and website (https://butterflytraits. github.io/European-Butterfly-Traits/index.html). In cases with multiple sources of trait information, data from peer reviewed papers were preferentially used; in practice this made up a small proportion of the total trait data. In cases where differences were identified in trait information between different sources, and could be identified as representing trait diversity, all sets of information were included in the trait database. Where sources clearly conflicted, we used the information that we deemed the most reliable. When published information was lacking, we inferred traits using photographs from two reliable sources (https://www.leps.it & http://pyrgus.de) if the traits could be unequivocally determined. This included hostplants and hostplant types, egg-laying location, larval location, adult feeding, adult basking type and basking sites. Trait information based on photographs was independently assessed by the authors in order to check the validity of the inferences. For some taxa certain trait information was not available in any source, and thus it is missing in the database.
The first version of the trait database 24 was finalised after three steps had been completed for all taxa. These were 1) mining all the standard references (e.g guides and atlases) for trait information, 2) filling gaps in trait information through a thorough literature search via Google Scholar and PubMed, and 3) emailing and asking experts on particular taxa for additional trait information.
Trait types. Our database covers the traits of all stages of the butterfly life cycle. Many trait types included in the database along with their subdivisions into individual states were derived from an earlier treatment of the butterflies of the British Isles 19 but the trait types have been extended for this database. Individual traits were defined prior to the beginning of data collation to allow for unambiguous coding. Comprehensive trait definitions are in the file traitdefinitions.pdf on the Dryad repository 24 and curated on-line version (https://butterflytraits. github.io/European-Butterfly-Traits/index.html). Most of the traits types in our raw data trait database (state table) are coded as binary, variables, but a minority are continuous (Online-only Table 1). Most traits included in this dataset are divided into multiple sub-traits. For example, the trait 'overwintering stage' comprises four binary www.nature.com/scientificdata www.nature.com/scientificdata/ Traits are divided into four main types: 'life history' , 'morphological' ,'resource-based' , and 'behavioural' . Life history traits describe a species' life cycle related to reproduction and also to growth and survival, including the number of generations per year (voltinism), egg laying strategy (egg laying type) and overwintering stage. We use wing size (both forewing length and precisely defined wingspan) as key morphological traits because they have been used in previous trait-based analyses, being correlated with mobility [167][168][169] , development time 170 , and reproductive output 171 , as size correlates with many aspects of life history 172,173 . Wingspan is included as it also includes an approximate measure of thoracic size, and thus flight muscle mass which may influence flight capacity and dynamics. 'Resource-based traits' describe species' relationships with environmental resources. Resources include consumables that can be depleted over time when used or utilities that are not depleted. For example, 'adult feeding' describes the range of resources consumed by adults which may be temporarily or permanently depleted. Likewise, 'adult roosting' describes structures (utilities) used for roosting behaviour; these structures are resources, and although not directly consumed there may be a finite number of suitable features of this type within a location which may be limiting factors for local populations and may become the subject of both interspecific and intraspecific competition 25,26 .
Some traits in the database are primarily behavioural such as 'mate locating type' , but these traits are also closely linked with traits that relate more directly to resource-usage (in this case with 'mate locating location'); thus, behavioural traits can also be linked to resources. Larval hostplants are examined in detail in several traits because of their importance for the life cycle and population structure of butterflies. Some authors of previous work using butterfly traits have included 'habitat breadth' as a trait 14,174 , although the physical structures/vegetation types occupied by species are not traits themselves, but the result of species occurring in those locations where their essential resources co-occur in spatial patterns and densities that they can use and these can change substantially across the geographic area our database covers. Essentially, species habitats are defined by their resources 25,26 and the resource requirements that species have are fundamental traits. Biotope or habitat associations are therefore not included in this dataset as they can be derived from the traits described in our database 24 . Additionally, biotope traits have been shown to have poor reproducibility among different trait sources 23 and have been found to be less useful than other types of traits for understanding the responses of butterflies to environmental change over time at a large scale 19,23 . At a smaller scale, biotope associations may be useful characteristics for aiding in butterfly conservation and habitat classification, but any attempt to synthesise information at a large geographic scale describing habitat preferences from multiple sources would likely be both error-prone and probably too coarse for most analyses. We also did not include measures of climatic requirements and geographic ranges in our dataset since they are already publicly available in the CLIMBER dataset 10 . www.nature.com/scientificdata www.nature.com/scientificdata/

Data Records
The database 24 deposited on the Dryad Digital Repository and the live version (https://butterflytraits.github.io/ European-Butterfly-Traits/index.html) including species specific information sources and a PDF-file describing each of the variables in the raw state table and traits table (ButterflyTraitDefinitions.pdf). The live version includes a mechanism for feedback and adding new information. For some taxa there are missing data and some traits currently have more missing values than others. Life history and hostplant related traits are extensively covered with few missing values, but behavioural traits have the most missing values as they usually require direct observation in the field, thus the disparity. However, the types of traits with missing data (Table 2) indicate where targeted fieldwork is required. Likewise, species with poor overall data also warrant targeted future effort.

Technical Validation
The records included in the database are based on previously published information from field guides, ecological atlases and peer reviewed journal articles, supplemented with the authors' personal observations. We are therefore confident as to their accuracy. When sources highlighted that records for a particular trait were doubtful, this information was not included in the dataset. The author team comprises experts on butterfly ecology coming from seven countries across Europe thus ensuring the highest level of repeated quality control while providing best knowledge across the biomes in Europe. The authors have examined the dataset to check for errors and to assess the accuracy of the trait information included. All data included in the dataset is fully referenced which allows anyone to go back to the original records for any piece of trait information. The dataset currently contains some missing values, especially for highly localised species and we intend to keep the database 'live' and to manage updates with new information. Certain traits such as voltinism and phenology (flight months) are known to vary across the latitudinal gradient as these traits may in part be responses to accumulated growing degree days 175 . We are confident that we have captured variability of these traits for the majority of species by consulting trait sources that encompass both the full European range as well as smaller areas. We will accept data into our live version of the database from existing resources, unpublished information and new published information. Each species has its own reference list so existing data can be checked and new information correctly integrated into the database. Data submission methods are described in the live database.   www.nature.com/scientificdata www.nature.com/scientificdata/

Usage Notes
We have provided the first extensive database of butterfly traits in Europe and North Africa. Of particular value is the species and geographic coverage and the extensive sets of traits that we have included. This provides an outstanding resource for improving our understanding of fundamental processes such as how traits define species co-occurrences and their responses to environmental change, their spatial dynamics, and their associations with vegetation structures. Since traits vary within different taxonomic groups, understanding their evolution and variability among different branches of the tree of life can also provide insights into phylogenetic constraints on species resource requirements and ultimately on their local abundance and large-scale occurrence and vulnerability to environmental change. As our trait database includes a large component of resource requirements for all life-history stages it can also be used to aid conservation efforts by focusing on resources that may be limited for vulnerable species at small to large spatial scales. Additionally, the inclusion of behavioural traits within the database can contribute to increasing our understanding of the roles of behavioural characteristics in determining species occurrences and resource use.
We have minimised processing of the data within the state table of the database. Individual variables in this state table may have poor linear relationships or spurious negative correlations due to their statistical distributions and outlier effects which can constrain both phylogenetic and ecological analyses. Although fuzzy methods of multivariate analyses may accommodate these issues 176 the processed multinomial and continuous variables with measures of variability provided in the traits table facilitates more conventional approaches to multivariate analyses. For some species there is missing data and some traits currently have more missing values than others. Whilst updating the database will supply some missing data there are imputation methods 177,178 that can be used to predict these values and we are confident that in the absence of verified data, imputed data can be used to retain both species and traits with missing values within analyses.