The dataset is comprised of leafing and flowering data collected across the continental United States from 1956 to 2014 for purple common lilac (Syringa vulgaris), a cloned lilac cultivar (S. x chinensis ‘Red Rothomagensis’) and two cloned honeysuckle cultivars (Lonicera tatarica ‘Arnold Red’ and L. korolkowii ‘Zabeli’). Applications of this observational dataset range from detecting regional weather patterns to understanding the impacts of global climate change on the onset of spring at the national scale. While minor changes in methods have occurred over time, and some documentation is lacking, outlier analyses identified fewer than 3% of records as unusually early or late. Lilac and honeysuckle phenology data have proven robust in both model development and climatic research.
Background & Summary
Although phenology is now understood to be a key indicator of climate change impacts1,
The western program ended in the mid-1990s, except for a few dozen sites reactivated in 1997 (these data are also available at http://meteora.ucsd.edu/cap/lilac.html#Clim)5. The eastern network was terminated in 1986, re-initiated in 1988 and then expanded into a broader nationwide online phenology monitoring effort in 2009 (ref. 3).
The resulting dataset has been used for applications far beyond the original vision, from understanding vegetation feedbacks to climate6, to measuring large-scale variations plant climatic adaptation variations7. The Extended Spring Indices8,
The observational dataset is unique in both its geographic and temporal coverage, and has considerable potential to support additional research and applications. Beginning in 2009, data collection on common and cloned lilacs continued alongside new data collection for hundreds of native species following very similar protocols15. The deep temporal record of the lilac and honeysuckle dataset has the potential to extend our understanding of trends among native species9. Long-term, continental-scale datasets are both rare and critically important to understand the causes and consequences of changing phenologies among cloned plants, ornamentals and native species16,
In the historic Eastern19 and Western20 programs, participants were directed to plant new lilac or honeysuckle clones, and/or to observe established common purple lilac shrubs in unshaded, flat, convenient locations, away from roads and away from microclimatic pockets (e.g., cold air drainages). A single clone line for lilacs (Syringa x chinensis ‘Red Rothomagensis’) and two clone lines for honeysuckles (Lonicera tatarica ‘Arnold Red’ and L. korolkowii ‘Zabeli’) were planted and monitored throughout the period of record. Observers were asked to record on paper cards the dates that each of five phenological events, such as first leaf or full bloom, occurred on each of their selected or planted lilac or honeysuckle individuals (Table 1). Early program coordinators noted that in some cases all individuals of a species at a site were aggregated for reporting (i.e., the observer reported a single date representing an ‘average’ for all individuals at the site for each phenological event); however, there is no documentation to indicate for which sites this occurred.
After all five phenological events had occurred for the year, observers mailed the cards to the program coordinators. Data were collated on paper by program coordinators, and isobar maps of spring arrival were developed and shared with observers (e.g., Fig. 1)21,22. The duration of record at each site is mapped in Fig. 2. The Western and Eastern programs differed in number of active sites (those with at least one observation record) by decade, as shown in Fig. 3.
The data were digitized and curated by Mark D. Schwartz, University of Wisconsin—Milwaukee, until the establishment of the USA National Phenology Network (USA-NPN) and the launch of its broader native plant monitoring program, Nature’s Notebook 3,23. The remaining active lilac and honeysuckle observers and their observation sites were incorporated into this program in 2009.
In 2009, the lilac and honeysuckle protocols were changed from their original event-based format (e.g., ‘What was the date of first bloom?’) to a status-based format (e.g., ‘Do you see open flowers today?’) to be consistent with the new USA-NPN plant protocols15. The rationale for this change in approach is that recording of the phenophase status on each day that the plant is observed enables detection of repeated phenological events within a single year (e.g., a second flush of breaking leaf buds after a killing frost), as well as calculation of the uncertainty in the date of phenological events (e.g., open flowers were first observed on April 6, but the plant had not been checked since April 3)15. Definitions provided to observers for the five leaf and flower phenophases, along with their changes over time between 1956 and 2014, are in Table 1.
Nature’s Notebook observers are still encouraged to plant and observe cloned and common lilacs as part of a campaign (https://www.usanpn.org/nn/campaigns), with electronic messages that provide real-time feedback on the progression of spring, information-rich species profile pages, and digital merit badges for tracking lilacs23. The cloned honeysuckle cultivars, however, are considered invasive and their planting is no longer encouraged.
The lilac and honeysuckle phenological dataset is an Excel workbook, stored in the Dryad Digital Repository (Data Citation 1: Dryad 10.5061/dryad.0262m) and USGS ScienceBase (Data Citation 2: ScienceBase https://www.sciencebase.gov/catalog/item/5499b905e4b093dfafda3575). The first tab (observation_data) contains 116,662 records for the period 1956 to 2014 across the United States. Each record includes a unique identifier, a site identifier and geographical information (latitude, longitude and elevation), species name, individual plant identifier, phenophase identifier and description, date of onset for each phenophase, and quality control flags. The second tab (site_data) details the number of years in record, and years missing, by site.
For observational data prior to 2009, phenophase onset dates (‘events’) were the only records submitted for each of the five phenophases on each individual plant in a given year. Beginning in 2009, status records could be submitted for each phenophase on each individual plant15. For temporal consistency in this dataset, we converted status data for the period 2009–2014 to onset dates by using the date of the first ‘Yes’ record of the calendar year for each phenophase on each individual plant. Where one or more ‘No’ records precede this first ‘Yes’ record, the number of days since the last prior ‘No’ is also provided, from which the uncertainty in the onset date can be quantified. Because this conversion can only be applied to status data, this feature is available for roughly 5% of the records in the dataset.
Although 2009 marked significant changes to the program (including the conversion from a paper to digital database, and the conversion from ‘event-based’ to ‘status-based’ monitoring) the meaning of each phenophase definition remained unchanged from the Eastern program definitions, in terms of pinpointing an onset date. However, there are some differences to note between the Eastern and Western program definitions for a few phenophases. ‘First leaf’ was defined by occurrence in a single location on the plant in both programs until 2005, when it was changed to three locations in the Eastern program only. ‘First bloom’ was defined by occurrence of a single open flower on the plant in the Western program, and, In the Eastern program, by the ‘date when at least 50% of the flower clusters have at least one open flower’ for lilacs and the ‘date when about 5% of the flowers are open’ for honeysuckle. Although the differences in ‘First leaf’ definitions probably represent an insignificant difference in reported onset date, the differences in ‘First bloom’ definitions could potentially represent a few days difference in reporting onset of the phenological event.
After 2009, observers were also able to provide additional information, including the absences of a phenophase prior to onset date (preceding ‘No’ records) and dates of phenophase presence after onset (subsequent ‘Yes’ records; e.g., indicating that open flowers were still present on the plant), as well as ancillary information about individual plants (e.g., shade status) and sites (e.g., degree of development). Descriptions of ancillary data are described at http://www.usanpn.org/files/shared/files/USA-NPN_suppl_info_tech_info_sheet_v1.0.pdf. Ancillary data and raw status data (for the period 2009–2014) are available at https://www.usanpn.org/results/data.
Dataset identifiers are also provided to distinguish data from the Eastern, Western and Nature’s Notebook sets. The geographic division between the Eastern and Western programs falls along the Great Plain states, with the Dakotas, Nebraska, Kansas and Oklahoma in the Eastern program, and Texas in the Western program.
While information on monitoring frequency, observers and data management practices prior to 2009 is not available, the dataset is otherwise well-documented and has been proven reliable in several publications. Early work with the dataset explored the relationship between biological and climatological spring6,24,25. Further demonstration of the utility and validity of the lilac and honeysuckle dataset is found in the Spring Indices8. These models, developed using the dataset presented here, predict lilac leafing and blooming dates based on the strong relationship between day of year, heat accumulation, number of high-energy synoptic events and the timing of these phenological events8. Recent work has extended these indices across the continental United States, using both weather station data9 and gridded climate products26, and found strong relationships between Spring Indices (SI) predictions and the timing of phenological events for native species and crops (Fig. 4)9.
Minor cleaning of the Eastern and Western program data was conducted by the authors in 2013, and included the removal of duplicate records (4 records), those with null location information (35 records), and those with phenophase onset dates reported in an implausible order (72 records). For the publication of this dataset, we additionally excluded onset dates where status records conflicted (21 records; i.e., a ‘Yes’ and a ‘No’ status were reported on the same day; these flagged records are available via download at https://www.usanpn.org/results/data).
In the field, under natural or managed ecosystems and for a variety of plant species, it is possible to have multiple onsets of the same phenophase on an individual plant in a calendar year (e.g., a second flush of breaking leaf buds after a killing frost, or a second round of flowering in the fall). These multiple onsets are detected in the status data when the first series of consecutive ‘Yes’ records is concluded with at least one ‘No’ record, then a subsequent ‘Yes’ record is present. The second onset date is the date of this subsequent ‘Yes’ record. A flag field (‘Multiple_FirstY’) in the data file indicates those cases where more than one onset for a phenophase occurs in a calendar year; this occurs in less than 1% of the records in the data file.
Occasionally, at sites where several observers monitor an individual plant, multiple onsets are calculated from the raw status data as a result of different observers ‘leapfrogging’ over each other with slightly different phenophase interpretations. For example, one observer might report a ‘Yes’ for ‘Open flowers’ when they judge conditions on individual lilac have just passed the threshold for occurrence. On a subsequent day, another observer might report ‘No’ if they judge conditions on the same individual have not quite passed the threshold for occurrence. If the first observer subsequently reports a status of ‘Yes’ there will appear to be two distinct onsets. Where these likely false onsets occur in the data (i.e., where associated ‘MultipleFirst_Y’ records are separated by very short time periods), it is reasonable to consider the first onset of the calendar year as the true onset, unless quality control flags indicate that the first onset is implausible. Groups are encouraged to resolve these issues and the raw status data can be explored for comments from observers, to confirm the occurrence of true multiple onsets within a calendar year.
We identified outliers in phenophase onset dates for individual plants, across the period of record, using, Tukey boxplots27. We flagged records with onset day of year values greater/less than 1.5 times the interquartile range for each individual. We excluded individual plants with less than 10 years of data, and used the first onset of the calendar year in cases where there were multiple first ‘Yes’ records. Of 67,068 tested records (57% of the dataset), 0.88% were flagged as unusually early (‘Individual_Outlier’ of −1 in the dataset) and 0.58% were flagged as unusually late (‘Individual_Outlier’ of 1).
We applied a more complex outlier analysis28 for the dataset as a whole, to identify unusually early or late phenological events, based on the location of the observed plant and the interannual variation in climatological conditions at that location. This analysis relies on the assumption that plant phenology is influenced by environmental conditions, and therefore the variability of phenological observations made under similar environmental conditions must be relatively small (given unknown variation in microclimate, or genetic variation among organisms).
To apply this quality-control check, we used DAYMET, which is the highest spatial resolution, gridded dataset freely available for the United States. Because DAYMET is only available since 1980 and was not yet available for 2014 at the time of this analysis, we were able to apply this check to 39,791 records (34% of the dataset); this quality check could be applied to earlier data if additional fine-scale climatological data become available. For each observation year and location where observations were made (including all multiple first ‘Yes’ dates), we calculated cumulative values from 1 January to phenophase onset date (expressed as day of year; DOY), for the following meteorological parameters: surface minimum and maximum temperatures, precipitation, humidity, shortwave radiation, snow water equivalent, and day length. Using the daily maximum and minimum temperatures, average temperatures were calculated and then summed up in a similar way. To this set of meteorological data, we added the latitude, longitude and elevation of each observation. Then, we applied the t-SNE dimensionality reduction algorithm29 to project the climatological and geographic variables into a two-dimensional space. This allows visualization and interpretation of the results. After that, we applied a model-based clustering30 to group the observations into sets made under similar environmental conditions. Finally, the Tukey boxplot27 was applied to highlight the outliers present in each of the clusters. These outliers (with values greater/less than 1.5 times the interquantile range of each cluster) are highlighted as inconsistent observations. We flagged 0.74% of the records as unusually early (‘Inconsistency_Flag’ of −1 in the dataset) and 1.35% of the records as unusually late (‘Inconsistency_Flag’ of 1). The impact of inclusion or exclusion of these observations has yet to be explored.
The raw status data used to produce this dataset, and information about the Nature’s Notebook sites, plants and observers is housed in the National Phenology Database and is available for download via the USA-NPN website (https://www.usanpn.org/results/data). Lilac observations submitted after 2014, as well as honeysuckle fruiting data from the Western program for the period 1968–2009, can also be downloaded from this website.
How to cite this article: Rosemartin, A. H. et al. Lilac and honeysuckle phenology data 1956–2014. Sci. Data 2:150038 doi: 10.1038/sdata.2015.38 (2015).
USA National Phenology Network ScienceBase https://www.sciencebase.gov/catalog/item/5499b905e4b093dfafda3575 (2015)
We appreciate the many contributors to this dataset, both for the contemporary Nature’s Notebook program and as part of prior programs. The participation of individuals from the National Weather Service Cooperative Observer network made the depth and breadth of the dataset possible. The development of this dataset was supported by Cooperative Agreements G09AC00310 and G14AC00405 from the United States Geological Survey to the University of Arizona. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. We also wish to acknowledge the multiple curators and phenological researchers who have contributed to this dataset over the years, especially: Byron O. Blair, Joseph M. Caprio, Dan Cayan, W. L. Colville, Mike Dettinger, Pierre Dubé, Charles Holetich, Richard J. Hopp, William Kennard, Helmut Lieth, Leonard Perry, Morrie T. Vittum, and Robert Wakefield. We are grateful to Katharine Gerst for assistance with outlier analyses, Theresa Crimmins for assistance with figures, and to Harold Shanafield (Oak Ridge National Lab) for contributions to data integration. We dedicate this manuscript to the memory of Joseph M. Caprio (1923–2011; https://www.usanpn.org/mem-caprio).
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0 Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.