The Ontario Climate Data Portal, a user-friendly portal of Ontario-specific climate projections

An easily accessible climate data portal, http://yorku.ca/ocdp, was developed and officially launched in 2018 to disseminate a super ensemble of high-resolution regional climate change projections for the province of Ontario, Canada. The spatial resolution is ~10 km × ~10 km and temporal resolution is one day, UTC. The data covers 120 years from 1981 to 2100. This user-friendly portal provides users with thousands of static and interactive maps, decadal variation trend lines, summary tables, reports and terabytes of bias-corrected downscaled data. The data portal was generated with an emphasis on interactive visualization of climate change information for researchers and the public to understand to what extent climate could change locally under different emission scenarios in the future. This paper presents an introduction to the portal structure and functions, the large extent of the datasets available and the data development methodology.


Introduction
The impacts of climate change are global in scope and unprecedented in scale. Governments and enterprises are making plans to mitigate and adapt to the changing climate. It has become critically important for researchers and policy makers to have easy access to climate change information at the right spatial and temporal scales for their studies and practices. Many users do not have the necessary expertise to collate the information into a form for their needs 1 . Since 2013, we have developed several versions of the data portal to disseminate Ontario-specific climate change data. We have been updating the portal with our newest results and improving the presentation and content based on feedback from users. In general, larger ensembles generate more robust conclusions. The conclusions based on a small ensemble of climate projections usually contains uncertainties that cannot be ignored and are prone to various potential errors caused by imperfections in models and downscaling methods 2 . Therefore, in addition to our data, some other limited data sources available for climate change projections for the Province of Ontario, Canada, are also used to develop the latest version of the data portal. Considering climate data/projections are one of the critical inputs for any climate change related risk/vulnerability assessment, the development of the user-friendly Ontario Climate Data Portal (OCDP) with the robust Ontario-specific data would better support practitioners' climate change related risk/vulnerability assessments.
An effective web data portal for climate change will significantly facilitate the dissemination of information. For example, web data visualization provides a powerful tool for presenting climate change information, which makes abstract climate change information easier to understand; a web-enabled data warehouse also provides faster and easier access for different users. Currently, the web has been widely used to disseminate climate change information. Examples include the World Climate Research Program's Coupled Model Intercomparison Project Phase 5 (CMIP5) 3 , the National Oceanic and Atmospheric Administration (NOAA) 4 , the National Center for Environmental Prediction (NCEP) 5 , the European Centre for Medium-Range Weather Forecasts (ECWMF) 6 , the Pacific Climate Impacts Consortium (PCIC) Archive Downscaled GCMS Portal 7 , the Coordinated Regional Climate Downscaling Experiment (CORDEX) data portal 8 , the Environment and Climate Change Canada (ECCC) data portal 9 and the Climate Change Data Portal (CCDP 10 ). This paper describes the latest version of our data portal, the OCDP. In this study, we use the IPCC endorsed multiple model/scenario approach to address uncertainties in future projections. We developed a super ensemble (collectively 209 members) and high resolution (~10 km × ~10 km) regional climate projections for Ontario based on all available and creditable data sources at the time this study occurred; the source data for the supper ensemble covers the entire Province of Ontario and are generated by credible developers using methods published in peer-reviewed journals. These projections were developed based on the IPCC Fifth Assessment Report (AR5) data under all four Representative Concentration Pathways (RCPs), namely RCP 2.6, RCP 4.5, RCP 6.0 and RCP 8.5 11 ; and this super ensemble serves as a common set of the most current and relevant projections. It will be used in future climate change related research and practices, and therefore will greatly improve the consistency and comparability among climate risk assessments for Ontario. To further improve data communication and access, the OCDP is developed with intuitive visualization for the general public and policy makers, and extensive data (over 10 TB) downloadable for climate academia and practitioners. Because the sizes of data linked to most web pages are large, the primary target users are desktop users. OCDP differs from other available portals in the following aspects: (1) it disseminates a set of super ensemble projections which permits addressing the uncertainty in assessments; (2) projections are based on a super ensemble (209 members) which combines members generated by both statistical and dynamical downscaling; (3) the dynamical downscaling components (42 members from 7 RCMs) in our projections are expected to better account for the impacts from local geophysical features (such as the Great Lakes, and Niagara Escarpment), which are critical to Ontario's local climate.

Results
OCDP contains five major components: Introduction, Maps, Time series, Data and Documents that cover different types of climate change information for Ontario. The Introduction page provides brief descriptions of methodologies and data sources used for generating the super ensemble of Ontario-specific climate change projections. We next describe the remaining four components of OCDP: Maps, Time series, Data and Documents (Fig. 1).
Maps. The Maps component includes both static and interactive climate maps over Ontario (left column of Fig. 1). These maps are based on summarized data over the Province (represented by 8964 grid points at ~10 km Data for downloading. Many scientists and climate change adaption policy makers need more detailed information and data to carry out additional analyses on their own. To meet this increasing demand, we provide the portal with more than 10 terabytes (TB) of data, including daily, monthly, annual and long-term averages of minimum, maximum and average temperatures and precipitation from each model under all RCPs; annual values of extreme climate indices are provided as well. All of these data are downscaled and bias-corrected so that users can directly make use of the data in their analyses and practices (e.g. climate change impact assessment for specific sectors). We also provide simple codes in popular program languages such as Python, Matlab and R for reading the data available from OCDP. The units for temperature and temperature change are in °C, precipitation are in mm, and for precipitation change maps are in %, unless a specific description is provided.

Documents.
In the Documents section, we present a description for each section of OCDP, a large number of summary tables for Ontario and some special reports. The description clarifies how to use the OCDP. The summary tables include a table for Ontario averages, 50 tables for census divisions and 151 tables for municipalities. Each table provides changes of the four basic variables and the 38 climate indices for the 2050s and 2080s under the 4 RCPs. Values in the tables for Provincial and regional averages are spatial averages over the grids within Ontario and each region, respectively. Values in the tables for municipalities are changes at the corresponding weather stations. In addition, we have created many specific reports to answer some common questions from the public and government agencies, and some examples of application of data from the portal. Examples include: "What are the projected temperature and precipitation over the Great Lakes Basin?"; "What will be the trend in frost free days in the future?"; and "How will climate change impact future building code updates?", etc.
Query tools and usage statistics. As shown in Fig. 1, the collapsible tree menu provides an easy way for users to reach different pages/data on the OCDP. Another query tool is a free version of the Google custom search engine on the OCDP. We input a web address and 3-5 key words for each page into the engine database. Thus, the customized engine can quickly find our pages on the OCDP. Google analytics is a valuable tool for monitoring web site usage. We have been using this tool to monitor usage of OCDP. OCDP has been under development since 2016 and officially published online on June 1, 2018. Since then, we have had 9,051 unique visitors access the site about 51,269 times. We have been constantly improving the portal based on feedback from visitors. www.nature.com/scientificdata www.nature.com/scientificdata/

Discussion
OCDP provides rich Ontario-specific climate change information; it has helped and will continue help policy makers, researchers and the public to better understand the local climate impacts in Ontario under global warming. Meanwhile it will become a valuable open repository of high resolution regional and local climate change data for academic users and other climate change practitioners. For example, Environmental Commissioner of Ontario 17 has referred to many results in a report. Some local governmental agencies such as the Haliburton, Kawartha, Pine Ridge District Heath Unit 18 and the Simcoe Muskoka District Health Unit 19 have used the OCDP data for their health risk and vulnerability assessments; and City of Orangeville 20 has used the OCDP projections to guide its climate change policy and adaptation plan development. www.nature.com/scientificdata www.nature.com/scientificdata/ The ready-to-use summary tables for policy makers at provincial, sub-region and municipal levels, will be very helpful for users with limited resources or capacity to derive climate change information from raw data. The maps and time series figures will help users to understand spatial and temporal distribution of climate change. The targeted users of this portal are primarily practitioners and academic researchers who most commonly access OCDP from desktop computers; therefore, we have designed web pages to be as compact as possible. To provide rich information on each page, generally the interactive pages are linked to one or several large datasets, which will be automatically loaded to the user's device and displayed by the user's browser. However, this type of design has some display limitations for mobile phone users when loading the interactive maps, as it is slow, making the interactive table screen untidy. Therefore, we have blocked interactive maps to mobile users at present. We plan to improve the OCDP in future to make it more platform-friendly for smart phone users. Some future improvements for mobile device users include, but are not limited to, (1) providing an option for these users to turn on or off the interactive content on their devices; and (2) making the display less screen-size dependent.
Recently, ECMWF published a new generation reanalysis ERA5 21 to replace the ERA-Interim reanalysis, which stopped being produced on 31 August 2019. The strength of ERA5 over ERA-Interim has been demonstrated in some comparison studies 22,23 . It is therefore necessary to update ERA-Interim with the ERA5 in the next version of the OCDP. It is expected that the AR6 will be published in 2022 and the CMIP6 data will be available shortly thereafter when we will begin updating the portal with this new information.
The data portal is designed to disseminate Ontario specific climate change data and targeted users include climate impact researchers, government policy makers and other stakeholders who are interested in climate change while having some underlying knowledge about climate systems. Consequently, the design of the data portal follows other professional data portals focusing on climate change information presentation. We have endeavored to prepare and present data using standard formats to facilitate further analysis. We anticipate that some of the products from the portal may still be difficult to understand for users who do not have basic climate change knowledge, but we strive to continue improving the portal to expand its use to as wide a range of users as possible. Some examples include, describing SI energy equivalents using indicators which are more straightforward, like water level change, growth in ice thickness or change in durations of periods like snow cover. At present, the portal focuses on future climate change projections, but some users are interested in historical climate change so we are planning to include historical analyses of climate change information as well.

Methods
Data sources. The super ensemble of climate projections was developed using multiple credible sources of data, including conventional weather station observations, comprehensive reanalysis, and statistical and dynamical downscaled data. Brief descriptions of each of these data sources are provided in the following sub-sections.
Observations at conventional weather stations in ontario. Daily 15-year complete precipitation data (0 day with missing data) and 101 stations that have at least 15-year complete temperature data. In this study, following the World Meteorology Organization (WMO) "3 and 5 rule" (https://climate.weather.gc.ca/glossary_e.html), we consider a year as having complete data if both the total number of days with missing precipitation and the total number of days with missing temperature are less than or equal to 36 (i.e. 3/month × 12 month) in that year; consequently there are 151 stations meet this requirement. These observations are used to validate the modelled data and correct biases for each of the 151 municipalities. The reason for extending data from the 20-year standard reference period (1986-2005) to 25-years is boost the robustness of bias correction because there are usually not enough complete data in the 20-year for effective use in bias correction. The 15-year criterion sounds short but is still comply with WMO rules (https://climate.weather. gc.ca/glossary_e.html), and is the most appropriate for Ontario based on data availability.
Comprehensive reanalysis data. The critical information needed in our downscaling model development is comprehensive reanalysis data. Two major advantages of comprehensive reanalysis data are that (1) it is of high resolution, and (2) it accounts for multiple sources of meteorological/climatological data (as opposed to only observations at conventional weather stations) via assimilation algorithms [24][25][26] ; these advantages are more profound in vast regions like Northern Ontario, which has a very sparse network of conventional weather stations. There are currently multiple sources for high resolution daily reanalysis datasets that cover the entire province, including the ECMWF reanalysis climate data 24 , the NCEP North America Regional Reanalysis (NARR) 25 and the NCEP Climate Forecast System Reanalysis (CFSR) 26 . The ECMWF ReAnalysis (ERA) data, ERA-interim, is a widely used reanalysis production, which provides surface data in 11 different resolutions. In this study, we used the 0.125° grid reanalysis data which is the highest spatial resolution provided by the ECMWF web applications server.
Previous studies show that this dataset fits observational data very well in Ontario and is suitable for Ontario climate change studies 27,28 . The daily data in this study is calculated based on 6-hourly average temperature and 12-hourly precipitation, and minimum and maximum temperature from ERA-interim reanalysis data. (162)  www.nature.com/scientificdata www.nature.com/scientificdata/ Bias Correction/Constructed Analogues with Quantile mapping reordering (BCCAQ) 30 . The PCIC data covers only three of the four IPCC RCPs: RCP2.6, RCP4.5 and RCP8.5. The LAMPS daily temperatures and precipitation are downscaled from GCMs to the ERA-Interim grid points (0.125° Lon × 0.125° Lat) using a combination of localized ensemble optimal interpolation (EnOI) and bias correction 28 . The LAMPS data cover all four IPCC RCPs (96 members).

Dynamically downscaled datasets.
To better account for the impacts from local geophysical features (i.e. the Great Lakes and the Niagara Escarpment) on Ontario's climate, our super ensemble includes 47 dynamically downscaled projections; since the cost of dynamical downscaling is expensive, research institutes often only run several models under one or two scenarios; these dynamically downscaled members are provided by the North America node of the Coordinated Regional Downscaling Experiment (NA-CORDEX, 22 members) 31 , University of Toronto (UofT, 16 members) 32 and the University of Regina (CCDP, 9 members) 33 . The NA-CORDEX data archive contains outputs from regional climate model (RCM) runs over a domain covering most of North America using boundary conditions from global climate model (GCM) simulations in the CMIP5 archive. These simulations run from 1950-2100 with a spatial resolution of 0.22° (~25 km) or 0.44° (~50 km). Temperature and precipitation at daily and longer time scales are available 31,34 . The UofT members cover the entire Province of Ontario and the Great Lakes Basin, generated using the US Weather Research and Forecasting (WRF) model driven by different GCM simulations in the CMIP5 archive. UofT has results for the RCP8.5 scenario 32 only. The data from the University of Regina are generated using the Regional Climate Model system (RegCM) and another model system PRECIS (Providing Regional Climates for Impacts Studies) at a resolution of 25 km under RCP4.5 and RCP8.5 emissions scenarios, driven by boundary conditions from CMIP5 archive 33 . These studies show their methods are valid in downscaling temperature and precipitation for Ontario [28][29][30][31][32][33] .
Ultimately, our super ensemble is developed using the 209 members from the five credible academic institutes, including 40(19.1%), 64(30.6%), 15(7.2%) and 90(43.1%) members for RCP 2.6, 4.5, 6.0 and 8.5, respectively (Fig. 3). This is by no means a simple combining procedure. To facilitate comparison and analysis of the ensemble data, all downscaled data are re-gridded to the common high-resolution (0.125°) ERA-Interim grid points and bias-corrected with the Quantile-Quantile mapping (QQ-mapping) method 35,36 . QQ-mapping equates cumulative functions of observed data and downscaled data in the reference period. Then biases are corrected in downscaled data under the assumption that this relationship applies to the future downscaled data. For grid data, we assume the reanalysis variables represent observations, and correct the bias grid by grid, variable by variable. For station data, we use available observations within the 25-year window (1981-2005) and corresponding downscaled data within the 25-year window to construct the relationship, and then correct biases in downscaled variables with this relation. The downscaled station data are interpolated from grid data by K-Nearest-Neighbors (KNN) algorithm 37 . A strict quality control is carried out to guarantee integrity of the data (Fig. 4), which will be described in the following sub-section.
Data re-gridding and bias correction. In general, the following bias correction methods are used to adjust downscaled results: (1) linear transformation, (2) local intensity scaling (LOCI), (3) power transformation, www.nature.com/scientificdata www.nature.com/scientificdata/ (4) variance scaling, (5) distribution mapping and (6) the delta-change approach. In order to reduce model biases in frequency and amplitude, we apply different bias correction methods to different downscaled variables, i.e. temperatures (Tm, Tx and Tn) and precipitation data. For long term (monthly or longer time scale) averages, the linear scaling method (LS) [38][39][40] and LOCI 41 are used. These simple methods aim to perfectly match the average monthly mean of corrected values with that of observed ones within the reference periods. We then employ this relationship to correct downscaled future data. For daily data, we use the QQ-mapping method to adjust the amplitude and frequency 35,36 . Since QQ-mapping equates cumulative functions of observed data and downscaled data in the reference period, it has the benefit of accounting for biases in all statistical moments of downscaled temperature and precipitation 42 .
We chose the ERA-Interim ~10 km (0.125°) resolution grid network as the common grid network for the projections because more than 85% of our data are close to the grid points (i.e., LAMPS data are on the grid points and PCIC and UofT data are very close to the grid points). We used the simple K-Nearest-Neighbors (KNN) algorithm (K = 4 for this study) 37 to re-grid all of the data to the common grid network and then carry out bias correction using the QQ-mapping method 35,36 . After bias correction, the total wet days and the total wet day precipitation of downscaled data exactly equals that in ERA-Interim data for the historical period (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005). For daily extreme temperatures (minimum and maximum), we correct biases in the differences between the maximum/minimum and the average, then add the corrected differences to the corrected averages to get the full values. This method can avoid unrealistic situations in which minimum (maximum) temperature is higher (lower) than average temperature due to using the QQ-mapping directly to minimum (maximum) temperature. When downscaled values in the future are beyond the range of downscaled data for the reference period, a constant transfer function beyond the highest observed quantile was assumed 43,44 . The difference between the future downscaled value and the maximum value for the reference period is directly added to the highest observed value for the reference period.

Indices calculations.
To characterize climate at a location, variables describing averages of weather, as well as the indices describing other aspects of weather patterns such as anomalous, rare and extreme events are essential 45,46 . These climate indices allow a statistical study of variations of the dependent climatological variables, including analysis and comparison of time series, means, extremes and trends. Therefore, annual, seasonal and monthly averages and 38 climate indices based on the four basic variables from each simulation are carried out at each of the 8964 grid points. The climate indices include the commonly used 27-core climate indices defined by the Expert Team (ET) on Climate Change Detection and Indices (ETCCDI) 45,46 and 11 additional indices defined by Canadian researchers. We provide a table for the definition of these indices (http://lamps.math.yorku. ca/OntarioClimate/index_app_data.htm#/indexDefinationTable). Downscaling models are generally evaluated with data for historical periods 47,48 . It is difficult to do effective validation on individual model results after bias correction based on the relationship between data utilized for the historical period. Fortunately, a large ensemble of multiple model results helps to reduce these concerns. The final product from this project, posted on OCDP, www.nature.com/scientificdata www.nature.com/scientificdata/ represents a common set of probabilistic projections of both long-term averages and extreme indices at a spatial resolution of ~10 km and various temporal scales (annual, seasonal, monthly and daily) for Ontario, Canada. Projections are developed for all four IPCC RCP emission scenarios (RCP 8.5, 6.0, 4.5, and 2.6). Most practitioners, including policy makers, would like climate change information specific to their own regions or communities. For these users we have prepared spatially averaged data. This includes, averages over the entire Province of Ontario, for each of the 50 census regions (http://lamps.math.yorku.ca/OntarioClimate/assets/Locations/loca-tionMap_bk.html), for each of the 151 municipalities, and at each of the 8964 grid points across the province, which are available in corresponding sections on OCDP.
The values in the tables posted in the Document/Factsheet section on OCDP are determined following these steps: (1) calculate annual values for each ensemble member (model run); (2) calculate the temporal averages for 1990s, 2050s and 2080s, respectively; (3) average over the runs for each of the four RCPs and estimate the 50th percentile and the likely range (5-95th percentile range) of the values. For the 2050s and 2080s, only changes relative to the reference period are presented.
Data portal design. We use several advanced web development tools to organize our data and make them available online in various formats, i.e., maps, pictures, tables or files. The software includes the basic web development languages such as hypertext markup language (HTML), cascading style sheets (CSS), and JavaScript. In addition, some professional software to implement specific functions, such as the highcharts.js and highmaps.js (https://www.highcharts.com/), are used for web data visualization; easyUI (https://www.jeasyui.com/index.php) is used for tree menu; W3.css (https://www.w3schools.com/w3css/) and Bootstraps (https://getbootstrap.com/) are used for improving user experience (UX) design; and Angular (https://angular.io/) is used for routing and navigation. All of these programs are open-source and free for academic use.
One picture is worth a thousand words. Data visualization is the presentation of data in a graphical format 49 . Spatial distribution of climate variables and indices can help users understand a local climate change signal but also the change relative to their neighbouring regions, because climate change is spatially correlated among neighbouring regions. Maps are the most effective tool for presenting the spatial distribution of climate change information. Therefore, in the OCDP, we currently provide about one thousand climate maps that offer ease-of-use for a wide range of users. The goal of the interactive maps is to show summarized climate change information at a specific grid point or location with tooltips as mouse-over-grid-point. Since there are many variables for different periods, it is difficult to show all the information on one tooltip. In this portion of the portal, we show an ensemble mean accompanied by the corresponding 5-95th percentile range of the variable over the three periods: the reference (Ref) period, 2050s and 2080s.
The base maps are generated using the geographic information system ArcGIS 49 . Since the portal is designed with client-side web development language, the data should be sent to the user's computer to display; online data transmission speed is an important factor that affects UX. The major problem with interactive data portals driven by large volumes of data is data transmission from the portal server to the client's browser. We have designed our portal to minimize data transmission at the user's initial access to the portal.
In addition, since more than 85% of users of our earlier versions of OCDP were desktop users; in the updated data portal, higher priority is given to desktop users. Some functions of the portal are blocked for mobile users to avoid unexpected costs due to inadvertently large data requests. We have stored summary tables in the widely used comma-separated values (.csv) format, and large raw data in Matlab data (.mat) format with efficient file compression to reduce file sizes. For the convenience of users, we provide sample codes to read the downloaded data in three popular computer languages (Python, R and Matlab), found in the frequently asked questions (FAQ) section.

Data availability
The datasets of the projection ensemble have been deposited in public repositories. They can be found in In each sub dataset, there are many.csv files. See online documentation for more details about the dataset.

Code availability
All codes associated with the data portal are open source; they are available on GitHub under the MIT license. We created the codes in two popular languages, JavaScript for the formal OCDP website and Python to facilitate data availability and interpretation. The GitHub repository hosts these codes. The 12 Python programs are designed for data reading, plotting, mapping and exporting on local server (https://github.com/LAMPSYORKU/ OntarioClimateDataPortal/tree/master/pythonCode). All of these Python files have also been uploaded to the Figshare data repository 50 . Users can directly clone and download the repository to their local machine and run the Python programs in the Jupyter notebook environment to manipulate the data. HTML5 and JavaScript files for the formal data portal can be found in Figshare 50 and GitHub (https://github.com/LAMPSYORKU/OntarioClimateDataPortal/tree/master/JavaScripts). Users, who are interested in web application development, can download and use them for their own projects.