Main

Science is facing a reproducibility crisis. A recent Nature survey of 1,576 researchers from various disciplines found that more than 70% of researchers were unable to reproduce research by others, and 50% were not even able to reproduce their own results1. Indeed, the issue of reproducibility has been raised across many fields of science. For instance, the estimates of non-reproducible studies are as high as 89% in cancer research2 and 65% in drug research3, and even high-profile, ‘landmark’ studies are not free of reproducibility issues4. New scientific research builds on previous efforts, allowing methods for testing hypotheses to evolve continually5. Therefore, research results must be communicated with enough context, detail and circumstance to allow correct interpretation, understanding and, whenever possible, reproduction. Reproducibility is a cornerstone of the scientific process and must be emphasized in scientific reports and publications. Although best-practice guidelines have been published and adopted for areas such as computer science6 and clinical research7,8, for various reasons, guidelines for ensuring reproducibility are still largely absent in many (even large) research communities.

Along these lines, the issue of reproducibility may be especially difficult to address in ecology, given the less-controlled aspects of many studies (for example, natural community surveys, field experiments). The issue of reproducibility has been noted only recently in ecology9,10, but is likely prominent11,12. Because ecological studies often encompass uncontrollable or unaccountable factors13, it is especially important to report in detail the circumstances and methods that apply. Furthermore, ecological studies often depend on statistical models, such that reporting specific modelling methods and decisions and how they are intended to reflect biological knowledge or assumptions holds particular importance for reproducibility in ecology14,15. More than ever before, it has become critical to report these aspects, as the data and analytical tools underlying ecological studies are accumulating and evolving at an unprecedented rate in the age of big data16; ecological niche modelling (ENM) is a prominent example.

Ecological niche modelling

Also known as species distribution modelling (SDM)17,18,19, ENM uses associations between known occurrences of species and environmental conditions to estimate species’ potential geographic distributions. Although ENM and SDM are often used interchangeably in the literature20, ENM typically has a stronger focus on estimating parameters of fundamental ecological niches, whereas SDM is more focused on geographic distributions of species. ENM is widely applied across many aspects of ecology and evolution, and is increasingly incorporated in decision-making regarding land use and conservation21. ENM studies are proliferating rapidly; in particular, a popular ENM algorithm, Maxent22, has been cited in tens of thousands of research papers in the past decade alone. Though methods and assumptions in these studies vary greatly, to our knowledge, no evaluation of reproducibility of ENM or SDM studies has been conducted to date (but see ref. 21 for scoring key model aspects for biodiversity assessments). Furthermore, no guidelines on reporting essential modelling parameters exist, hindering accurate evaluation (for example, scoring21) of model methodology and reuse of published research. It is concerning that such a fast-growing and fast-evolving body of literature lacks assessment and guidelines for reproducibility.

Typically, ENM analyses take biodiversity data and environmental data (such as point observations of a species and climate) as input and use correlative or machine-learning methods to quantify underlying relationships, which then are used in making spatial predictions. This typical workflow of ENM — obtaining and processing data, model calibration, model transfer and evaluation — is shared widely across disciplines that rely on statistical models. Therefore, the fast development, broad use and application, and existence of a rather established workflow for ENM makes it an excellent and representative example to tackle the challenges of reproducibility. Here, we assess the reproducibility of ENM studies via a comprehensive literature review and introduce a checklist to facilitate reproducibility of ENMs that can be extended to other areas of ecological research or other disciplines.

A checklist for ecological niche modelling

Although the role of ‘methods’ sections of scientific publications is to provide information that makes the study replicable, they are often highly condensed and lacking details needed for reproducibility, owing in large part to space limitations in journals. What is needed is a standardized format for reporting the full suite of details that comprise the critical information to ensure reproducibility. Therefore, a compendium of crucial parameters and qualities — in effect a metadata standard for ecological niche models — would be highly useful. A metadata standard establishes a common use and understanding through defining a series of attributes and standardized terminology to describe them. Such standards have been applied in various fields, such as GeoTIFF for spatial rasters23 and Darwin Core and Humboldt Core for biodiversity data24,25. A metadata standard can provide a straightforward way to balance efficiency and accuracy in facilitating research reproducibility26 in ENM, as well as scientific studies in general27,28,29.

Here we present a checklist for ENM, to demonstrate how to define general and flexible reproducibility standards that can be used across a wide range of sub-fields of ecology. We compiled a list of essential elements required to reproduce ENM results based on the literature to date, and organized the elements into four major topics: (A) occurrence data collection and processing, (B) environmental data collection and processing, (C) model calibration and (D) model transfer and evaluation (labels correspond to elements in Table 1). We justify the design of the checklist briefly, and provide detailed definitions, examples of reporting for each element, and related literature, in Table 1. We do not distinguish the relative importance among the checklist elements, as all are necessary to assure full reproducibility. We provide a template of the checklist for easier use (Supplementary Table 1). We envision this checklist as a dynamic entity that will continue to be developed and refined by the ENM/SDM community to keep pace with the state of the art in the field. We also provide access to the checklist on Github, as an open-source project where users can comment and suggest changes (https://github.com/shandongfx/ENMchecklist or https://doi.org/10.5281/zenodo.3257732).

Table 1 Details of the ENM checklist and representation of its elements (percentage) in a review of recent ecology and evolution literature (2017–2018; 163 papers)

Occurrence data (A)

Across many fields, online databases are growing and changing rapidly30, such that reporting data versions or providing complete datasets used in analyses is crucial to reproducibility. Occurrence data are increasingly available owing to mass digitization of museum specimens and increased interest and participation in observational data collection by citizen scientists31. Because the quality of occurrence data can vary significantly among data sources, data types and taxa32,33,34, it is vital to record data curation details to assure consistent quality and accuracy. The first attribute to report is the source of the data (A1; labels correspond to elements in Table 1 hereafter). If the occurrence data were the result of an online database query, the Digital Object Identifier (DOI), query and download date, or the version of a database must also be reported (A2), as online biodiversity data are accumulating rapidly and these data are often edited, corrected, improved or excluded over time35,36,37. The final dataset (that is, after editing and quality control), with the exception of sensitive information (for example, specific locations of endangered taxa), should be deposited in a data archive when reserving rights allow it, thereby assuring reproducibility in case of changes to the original data source.

Whenever available, the ‘basis of record’ (A3) as used in Darwin Core, a community-developed standard for sharing biodiversity data24, should be reported. This field describes how records were originally collected, and thus can indicate different levels of quality and different auxiliary information available. For instance, ‘MachineObservation’ via automated identification may be more prone to error compared with a ‘PreservedSpecimen’ collected and identified by an expert and deposited in a museum. Further, with a deposited specimen and catalogue number, researchers have the opportunity to examine the specimen physically to verify the identification38,39, whereas an observation may not be verifiable. Spatial uncertainty (see A6-3) can vary with the type of occurrence data, as well as the time when the data were collected. For example, coordinates associated with older ‘PreservedSpecimens’ are usually georeferenced from descriptions of administrative units (for example, township, county or country), thus involving higher spatial uncertainty, whereas coordinates linked to recent ‘HumanObservations’ may have been directly reported from GPS devices, making them more accurate. Information regarding the uncertainty of occurrences can also facilitate evaluation of whether the spatial resolution of environmental data utilized is appropriate (see B3). The spatial uncertainty in biodiversity data has long been recognized40,41, though the quantification of such uncertainty has not been implemented systematically at large scale (thus A6-3 was excluded from our literature review; see below); this task could be facilitated by recently developed informatics tools42,43.

Increasingly, ecological research uses data from large-scale data aggregators (for example, the Global Biodiversity Information Facility (GBIF)). As with many sciences relying on observational, rather than design-based data collection, biodiversity data used in ENM have generally not been collected explicitly for this purpose. Thus, the spatial and temporal attributes of occurrences, and how they have been parsed or filtered in preparation for modelling, are essential details required to model ecological niches adequately17. Checking the extent of occurrences (A4) against expert-defined distributions (for example, regional floras) may reduce errors in identification or data transcription. Underrepresentation of the known distribution may suggest inadequate or biased sampling of occurrences, whereas spatial outliers may represent recent range expansion44,45, occasional or vagrant occurrences46, sink populations47, or errors of identification or georeferencing. The collection date of occurrence records may influence spatial accuracy; in general, records from before the 1980s will lack precise point location data (that is, GPS coordinates) and are often georeferenced by hand from locality descriptions and with less precision33. Also, because environments change over time (for example, seasonal change, climate change, land-use changes), the temporal range of the occurrence data (A5) must be specified to connect it appropriately to the temporal dimension of environmental conditions48. Often, occurrence data are processed before modelling (A6). Common procedures include removing duplicate coordinates, excluding spatial and/or environmental outliers, and eliminating records with high spatial uncertainty43 or erroneous coordinates49. Additionally, scholars have proposed various ways to address the well-known issues of sampling bias50,51,52 and spatial autocorrelation53, often by imposing distance-based filters on occurrence data or incorporating spatial structure as a component in the modelling process54 (A7).

Environmental data (B)

Similar to occurrence data, sources for environmental data are numerous, and data often require processing before inclusion in ENM analyses. The source (B1), and database query/download date or version of the database must be reported (B2), as environmental data may be updated periodically (for example, WorldClim55,56) or may accumulate new data regularly through time (for example, PRISM57). Such information is also important for environmental variables derived from remotely sensed data (such as MODIS, Landsat). For example, NASA conducts regular quality assessments of MODIS data products and reprocesses data that may have been influenced by algorithm or calibration issues58.

The spatial resolution of the environmental variables used (B3) can affect ENM results, as different ecological processes occur at different spatial scales59. It has been hypothesized that at broad scales, abiotic conditions have a more dominant role in determining species’ distributions than biotic conditions60,61, though increasing numbers of reported exceptions suggest that this pattern is context dependent62,63. In practice, using different spatial resolutions of environmental variables can produce different results64,65,66. Reporting the spatial resolution of environmental variables can also facilitate checking the match or mismatch with the spatial uncertainty of occurrences, given that coordinates are at times georeferenced from county centroids at coarse resolution33. In addition to reporting the spatial resolution used for modelling, aggregation or disaggregation methods used to align the spatial resolutions of variables (for example, if they came from different data providers) should also be reported.

Providing the temporal range covered by the environmental variables (B4) is important for two reasons67,68. First, shorter temporal ranges can capture finer variation of environments (for example, extremes of daily temperature69), whereas longer temporal ranges capture longer-term trends in environmental conditions (for example, temperature seasonality). Second, it is helpful to evaluate how the temporal range of environmental data relates to the temporal range of occurrence data. For instance, associating occurrence data with environmental data from completely different time periods (for example, Last Glacial Maximum versus present) could be problematic, though the environmental data may need to include time lags to correspond to the life history of particular species70. The same reporting should be applied to information on future or past environments, as appropriate (D9–12). Similarly, the details of methods for processing and resampling of environmental data in temporal dimensions should also be reported.

Model calibration (C)

Typically, an ENM study first has to determine the geographic domain of interest (C1). Delimitation of the domain requires both ecological and practical justification, such as focusing on areas that have been accessible to a species71,72, and areas that have been sampled. Many ENM algorithms make use of background points22 that represent environmental conditions contrasting those known to be occupied by the taxa of interest. Several aspects of background point selection can influence model outcomes, including the number of points (C2)73,74 and the algorithms used to select these points75,76 (C3).

The suite of environmental predictors that are used in ENM should be directly relevant to a species’ distributional ecology19, and the rationale for selecting those variables should be transparent (C4). However, as mechanistic relationships are often unknown, justification of variable selection procedures is necessary. Further, collinearity of environmental variables, a well-recognized issue in regression models, affects parameter estimation during model calibration77; one common strategy is to remove highly correlated environmental variable pairs following rule-of-thumb thresholds (for example, |r| > 0.4 or 0.7)77,78. Selecting one variable from a pair of variables can be subjective (for example, based on expert knowledge), objective (for example, using variable contribution to model fit79) or random; hence justification is required to ensure accurate interpretation and reproducibility of variable selection.

The version of the ENM software or algorithm used (C5 and C6) also needs to be provided, as these tools are often updated80 to include bug fixes or revised default settings. For instance, the default transformation method of Maxent raw output was changed from ‘logistic’ to ‘cloglog’ between versions 3.3 and 3.480. Dependent libraries for coded algorithms may change over time as well.

Parameterizations or model settings and their justification (C7) are important to understanding how they may affect predictions. Examples of these settings include features and regularization values in Maxent81,82, covariate formulas for regression-based models, link functions in generalized linear models (GLMs)83, learning rate and maximum complexity in boosted regression trees (BRTs)84, and optimizer values in generalized additive models 85. In practice, authors often use the default settings provided by the software or algorithm utilized, which may or may not yield robust models72,82,86, wheareas in other cases, authors fine-tune parameters to get best model performance81,87.

Model transfer and evaluation (D)

Understanding model performance requires model evaluation (D1). A first step is that of assessing model precision and significance — that is, whether the model can correctly predict independent presence (or absence) data and whether the model prediction is better than null expectations. Commonly used indices that measure model performance can be either threshold-independent (D2; for example, area under the receiver operating characteristic curve or ROC AUC88), or threshold-dependent (for example, partial ROC89, true skill statistic or TSS, sensitivity and specificity90); the latter approaches require reporting of thresholds and how they were derived. In addition to model accuracy, information criterion-based indices should be reported if they were used to select among competing models based on predictive performance and model complexity or used to generate ensembles of models. Authors should report whether and how data were partitioned to calculate the evaluation indices (D3), if genuinely independent testing data (that is, different sources and methods of collection) were not available. Common approaches include random partitioning of occurrence datasets into training and testing (for example, the default in Maxent); among other methods, partitioning based on structured blocks (for example, separating occurrences into spatial blocks) is expected to assess model transferability better81,91. Given the variety of options regarding data separation, it is important to specify methods used to ensure better reproducibility.

Once a model is calibrated, it may then be transferred or projected onto another landscape or time. Generally, these predictions are initially continuous (D4) and sometimes are subsequently transformed into binary predictions using a particular threshold (D5). Researchers have proposed different ways of thresholding92,93 for different purposes and under varied assumptions, so these choices need to be reported.

Transferring a model across space and/or time may lead to extrapolation if the projected environments are novel relative to training environments. Several studies have found that environmental novelty48,94,95 (D6) and collinearity shift (D7; changes of collinearity structure of covariates77,96) reduce predictive performance, and recommended quantifying the novelty of the projected environments and the collinearity shift between the calibrated and projected environments96,97. Further, different algorithms use different strategies to extrapolate (clamping, truncation, extrapolation94,98); for example, the default clamping function in Maxent uses the marginal values in the calibration area as the prediction for more extreme conditions in transfer areas22.

Assessing the state of reproducibility in ENM research

To assess the state of reproducibility in ENM research in the context of our proposed checklist, we reviewed current (2017–2018) ENM literature in eight widely read ecology and evolution journals: Global Ecology and Biogeography; Diversity and Distributions; Journal of Biogeography; Evolution; Evolutionary Applications; Molecular Phylogenetics and Evolution; Molecular Biology and Evolution; and Systematic Biology. Additional details of our review criteria are provided in Appendix 1, Supplementary Fig. 1, and Supplementary Tables 2 and 3.

Inclusion of elements of the checklist (32 in total) varied widely, ranging from fully reported (100%; C5 algorithm name) to not reported at all (0%; D7 collinearity shift), though documentation of the importance of this latter element is still limited in the literature77,96. Completeness of information across the checklist also varied among papers, ranging from 24% to 89%, averaging 54% (s.d. = 13%) of checklist elements reported in a given paper (Fig. 1).

Fig. 1: Completeness of checklist elements reported in the current literature.
figure 1

Assessments are based on 163 articles published in eight ecology and evolution journals during 2017–2018. a, Percentage of papers that report individual element of the checklist. b, Frequency of completeness (%) of checklist elements reported in all articles.

Most studies (93%) fully reported sources of occurrence data (A1), but the date of access or version of the data source (A2) was included in only 22% of papers reviewed, and the basis of these records (A3) was described clearly in only 48% of papers. A relatively high number of papers (67%) reported the spatial extent (A4) of the occurrence data, but the temporal range (A5) was mentioned less frequently (26%). Few papers gave details of occurrence data processing, ranging between 18 and 35% in elements A6 and A7.

Although most papers we reviewed reported the source of environmental data (B1), they largely did not include download date or version of the data source: only 27% of papers reported such information for model training (B2) and only 23% for environmental data in model transfer (D10). The spatial resolution and the method of resampling layers with different spatial resolutions (B3) were generally reported (82%), although the temporal range (B4) was less frequently reported (42%). The pattern was opposite for environmental layers for model transfer: temporal range (D12) was almost always reported (94%) but spatial resolution (D11) was less frequently reported (72%).

Only 32% of papers fully reported information regarding modelling domain (C1–3). A high percentage of papers reported the variable selection procedure (C4; 70%). The ENM algorithm or software (C5) was always reported, though less frequently for the corresponding version (C6; 59%). In general, less than half of papers fully disclosed parameters for algorithms (C7; 45%).

Although model evaluation is critical for modelling studies, not all papers (90%) presented information pertaining to model evaluation (D1). Surprisingly, less than half adequately reported how the evaluation dataset was generated (D3; 39%) or mentioned specific values for threshold-dependent evaluation indices (D2; 36%). For model predictions generated, 51% of papers adequately specified output format or acknowledged that default settings were used. Among the papers that converted continuous predictions to binary, 92% specified the adopted threshold. When transferring model to different times and/or regions, few of the papers specified the extrapolation strategy (D8; 36%). The novelty of projected environments (D6) was rarely evaluated (8%).

Lessons from ecological niche modelling

Reproducibility of scientific studies has been under major scrutiny in recent years, and numerous high-profile studies have been found to be irreproducible, in large part because current reporting and publishing practices do not provide sufficient information regarding the methodologies, decisions and assumptions involved. Despite being based on a relatively recently developed toolset, ENM is no exception. For thorough evaluations of proper use of ENM applications (for example, use of ENMs in biodiversity assessment21), a detailed and standardized description of the methods must be provided. The checklist presented here includes the bare minimum of categories and elements necessary to evaluate and replicate ENM analyses. However, the details reported in recent publications varied greatly: on average, papers in our review included only 54% of checklist items, a generally incomplete set of information for reproducibility. This shortcoming may reflect a lack of community expectations on model reporting, or even unawareness of alternative options and underlying caveats in the modelling workflow. We highlight several key areas that were particularly deficient in reporting, and thus need attention to make ENM studies reproducible (Box 1).

Improving reproducibility with software solutions

The rapid development of ENM can be attributed at least in part to increased access to relevant data; with such development, informatics tools offer one route by which to improve reproducibility99,100. Such tools include data management plans101, standardized metadata102,103, programming language resources to record data analysis steps (for example, R and rmarkdown) and version-control tools (for example, GitHub). Open-source programming languages such as R have allowed for development of packages specifically designed for managing and processing large datasets in preparation for analysis. Exemplary packages include biogeo, which directly detects, corrects, and assesses occurrence data quality42, and geoknife, a package designed specifically for United States Geological Survey gridded dataset management104. Other packages help users to create reproducible workflows, such as zoon105, nicheA106 kuenm86, and Wallace107. In particular, the package Wallace provides a graphical user interface to build reproducible workflows, from data download to model output107. Borregaard and Hart11 described how the use of these new software tools is facilitating ecological research that is both robust and transparent, and thus reproducible. The functionality of the software solutions, however, depends on developers monitoring changes in data, modelling algorithms and the software platforms (for example, R), to avoid incompatibility issues. As such, authors should report software versions for all such solutions to ensure reproducibility.

Implications for other fields

The design of the checklist presented here is based on a typical ENM workflow, involving steps of obtaining and processing data, and model calibration, transfer and evaluation. We emphasized reporting data origin and metadata; crucial steps in data processing, modelling decisions and model evaluation; and potential caveats in model transfer. Those concepts and principles are generalizable to other disciplines. Further, the specifics of the checklist that we have proposed for ENM studies could be readily generalized to be adopted by other fields, especially those that involve biological data, environmental data and statistical modelling.

Researchers have proposed similar solutions in other fields, such as climate change research29; however, to our knowledge, our checklist takes additional steps in refining the methodology workflow and is therefore more comprehensive. For example, information pertaining to occurrence data (data source, spatial and temporal range, and data cleaning procedures) can be generalized to other studies that rely on digitized biodiversity data and other categories of ‘big data’. The information regarding environmental data necessary to reproduce studies is similar across biological research, such as in studies of relationships between species richness and environmental gradients108. The modelling algorithm details in the checklist are applicable to other studies that use statistical models, such as linear regression models of abundance as a response to resource availability. The elements of model extrapolation (environmental novelty and collinearity shift) are also common issues for modelling practices that involve forecasting, for example, predictions of biodiversity or community changes under global change. In addition to these generally applicable elements, the checklist can easily be extended to incorporate information particular to a field.

Although the methods sections of most scientific publications lack the formal standardization needed for reproducibility and their length is frequently influenced by journal space limitations, the checklist approach can provide greater detail to ensure repeatability. The usual methods section, combined with a standardized checklist, will make papers easier to review and replicate. Other disciplines can and should design comparable checklists with similar concepts and levels of detail.

Closing remarks

ENM is increasingly used in ecological studies and incorporated into conservation decisions. Our literature review revealed numerous gaps that undermine reproducibility of these studies. We recommend researchers developing ENM studies in the future to consider our checklist, extend and adjust it to meet study needs, with particular focus on elements that are commonly neglected (Table 1), and include this more structured metadata in publications (see checklist template in Supplementary Table 1). This checklist provides an important tool for both understanding and replicating previous studies, and also provides editors and reviewers with an efficient way to gauge and promote ENM reproducibility29. As a general metadata framework linking observational data and statistical modelling, our checklist provides a starting point for adopting similar standards in other fields, both within and beyond ecology that rely on these methods.