ROSIE, a database of reptilian offspring sex ratios and sex-determining mechanisms, beginning with Testudines

In contrast to genotypic sex determination (GSD), temperature-dependent sex determination (TSD) in amniotic vertebrates eludes intuitive connections to Fisherian sex-ratio theory. Attempts to draw such connections have driven over 50 years of research on the evolution of sex-determining mechanisms (SDM), perhaps most prominently among species in the order Testudines. Despite regular advancements in our understanding of this topic, no efforts have been published compiling the entirety of data on the relationships between incubation temperature and offspring sex in any taxonomic group. Here, we present the Reptilian Offspring Sex and Incubation Environment (ROSIE) database, a comprehensive set of over 7,000 individual measurements of offspring sex ratios in the order Testudines as well as SDM classifications for 149 species. As the name suggests, we plan to expand the taxonomic coverage of ROSIE to include all non-avian reptiles and will regularly release updates to maintain its comprehensive nature. This resource will enable crucial future research probing the ecology and evolution of SDM, including the presumed sensitivity of TSD to rapid environmental change. Measurement(s) sex-determining mechanism • offspring sex ratio • sex determination • incubation environment Technology Type(s) systematic review study design Sample Characteristic - Organism Testudines • Reptile Measurement(s) sex-determining mechanism • offspring sex ratio • sex determination • incubation environment Technology Type(s) systematic review study design Sample Characteristic - Organism Testudines • Reptile Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.16415784

Investigations of aspects of the ecology and evolution of TSD in chelonians are published routinely, and the state of our understanding of SDM in reptiles more broadly is regularly summarized every few years 6,[17][18][19][20][21] . However, despite the metronomic publication of knowledgeable reviews, limited effort has been made to compile and publish the data from which this knowledge is drawn. In particular, only two efforts have attempted to organize chelonian offspring sex-ratio data, each with their own shortcomings. Paukstis & Janzen 25 represents the first effort, which includes offspring sex-ratio data spanning the diversity of non-avian reptiles, but only includes results from constant-temperature incubation experiments and can no longer be considered up to date. The more recent compilation 24 likewise spans non-avian reptiles while also including measurements of additional phenotypes beyond sex that are influenced by incubation temperatures. However, this database excludes studies on natural incubation and exogenous hormone application, two topics often investigated in the ecological/evolutionary literature 18,[26][27][28] . In addition, the authors' methods largely excluded data outside the scope of Web of Science (e.g., unpublished theses/dissertations, select journals).
Such offspring sex-ratio data are necessary to characterize TSD, but a complete understanding of SDM evolution is impossible without comprehensive data on the taxonomic distribution of both TSD and its counterpart GSD as well. Several sources present these data for turtles but suffer from shortcomings much like those described above. For example, the Tree of Sex Database 4 has, to the best of our knowledge, not been directly updated since its initial release in 2014. In addition, both it and subsequent publications that indirectly expand its taxonomic coverage (e.g. 29 ,) have not taken advantage of SDM classifications presented in gray literature, such as publications from conservation breeding programs or unpublished theses/dissertations.
Here, we present the Reptilian Offspring Sex and Incubation Environment (ROSIE) database, a comprehensive compilation of offspring sex ratios and SDM in chelonians, with future plans to include data from all non-avian reptiles. Our database is easily updatable, and can be used to address a variety of key questions, including: 1) What is the ancestral SDM in chelonians, and how often have transitions between mechanisms occurred? 2) How does the relationship between incubation temperature and offspring sex (i.e., the sex-ratio reaction norm) evolve within and among species? 3) To what extent does TSD vary geographically? Temporally? Among species? Within species? Among clutches? Across generations?

Methods
We obtained hatchling sex-ratio data in turtles using Web of Science (v.5.35) to search for research published since the discovery of TSD (1966 12 ) until 31 December 2020. On two separate occasions (17 June 2020 and 7 January 2021), we searched all databases for topics including the following terms: sex AND determin* AND incubat*, along with either turtle* or a wildcard version of each species' taxonomic name to account for suffix variation (e.g., apalon* mutic* for both Apalone muticus and Apalone mutica). Taxonomy followed the 356 chelonian species identified in Turtle Taxonomy Working Group 30 . We reviewed additional publications and gray literature known to contain hatchling sex-ratio data, as well as research referenced within sources from the systematic search. Altogether, these methods returned 910 sources for evaluation. We evaluated sources obtained in the literature search based on the full text, and exclusions fell into the following categories: (1) inaccessible (n = 14), (2) study species was not a turtle (n = 116), (3) hatchling sex www.nature.com/scientificdata www.nature.com/scientificdata/ ratios were not reported (n = 269), (4) hatchling sex ratios were estimated based on incubation temperatures or durations (n = 48), and (5) hatchling sex-ratio data were previously reported elsewhere (n = 63). After exclusion, 400 sources remained for data extraction (Fig. 1).
From each source, we extracted data on incubation conditions and offspring sex measurements, including additional variables such as hatching success, incubation duration, and sexing methodology (Online-only Table 1) 31 . When variable values (mean incubation duration, sex ratio, etc.) were not provided in the text, tables, or figure legends, we extracted values from figures using WebPlotDigitizer (v4.4 32 ). For a number of sources (n = 42), we contacted the corresponding authors to request relevant materials to clarify sample sizes or other questions about the data. We examined all data and exclusions twice to ensure accuracy and, to avoid data replication, we examined data and manuscripts from lab/author groups to determine whether multiple sources analyzed the same information. When sources shared data, we excluded measurements from the more recent source(s) unless (1) additional samples were included, or (2) data were presented in a different format (e.g., sex ratio per shelf in each incubator vs. sex ratio per whole incubator).
We gathered chelonian SDM information in a stepwise manner. First, we compiled classifications from existing databases 4,29 , which were next evaluated for accuracy and supplemented with SDM classifications based on the offspring sex-ratio data collected as described above. Finally, we performed extensive online searches to identify sources supplying SDM classifications for additional species. Where possible for each species, we include relevant citations, SDM classifications, and classification confidence based on a combination of available data, data in closely related species, and author expertise.
Our offspring sex-ratio compilation contains data on 32.9% (117/356) of recognized chelonian species (Fig. 2), though the taxonomic distribution is highly skewed; just 7 species from 3 families (Chelydridae: Chelydra serpentina; Cheloniidae: Caretta caretta, Chelonia mydas, Lepidochelys olivacea; Emydidae: Trachemys scripta, Chrysemys picta, Emys orbicularis) were the focus of nearly half of all studies (45.2%; 241/533; note that some sources contain data on multiple species, hence the difference between total sources [n = 400] and total studies [n = 533]). The geographic distribution is likewise biased with most studies on wild populations of North www.nature.com/scientificdata www.nature.com/scientificdata/ American species, starkly contrasting the sampling of African populations ( Fig. 3; but note that several African species are represented in captive colonies located outside the continent). Study design is likewise biased, with most (269/400) sources focusing solely or partly on the results of constant-temperature incubation, whereas 85 employ other forms of controlled or semi-controlled incubation conditions (e.g., fluctuating temperatures, temperature switch experiments, room temperature incubation), 124 contain results from natural regimes, and 16 do not define incubation conditions. In addition, 71 studies investigate the influence of chemical applications on offspring sex ratios. In all, the database contains over 7,000 individual measurements of offspring sex ratios, ranging from data on individual eggs to a whole population's nesting season and representing the sexing of nearly 200,000 turtle hatchlings and embryos.
Our SDM database contains confident SDM classifications for 149 chelonian species (Fig. 2) and unsupported or unlikely classifications for an additional 13 species. Of those with confidently assigned SDM, 24% (36/149) exhibit GSD, 19 of which are also represented in the sex-ratio database (Fig. 2). Besides one species with an unconfident SDM classification (Chitra chitra), the remaining species in the sex-ratio database (n = 97) comprise a subset of the 113 species confidently known to exhibit TSD. Overall, our collection of chelonian SDM represents a near 50% increase in taxonomic coverage relative to recently published summaries (149 vs 101 species 29 ).
As indicated by the name, we plan to expand ROSIE to encompass all non-avian reptile species. In our next update, we will incorporate data from the remaining reptilian orders (Crocodylia, Rhynchocephalia, and Squamata) following the methods described herein, including all data published through the end of 2020. Once ROSIE has reached this final taxonomic scope, we will push updates every other year to include newly available data and maintain the up-to-date nature of this resource.

Data Records
This database is hosted by GitHub (https://github.com/calebkrueger/ROSIE), and the raw data can be accessed via a unique, stable DOI through Zenodo 31 . The database consists of csv files of (1) extracted offspring sex ratio and incubation environment data with complete references, (2) SDM classifications, (3) excluded sources with complete references and exclusion criteria, and (4) metadata.

Technical Validation
The data have been thoroughly checked for accuracy by C.J.K. prior to release. The authors urge users to report errors or submit additional data and updates by emailing the corresponding author. Any errors identified can readily be corrected in future updates, which will occur biannually.

Code availability
No custom code was used in the creation of this database.