A global 0.05° dataset for gross primary production of sunlit and shaded vegetation canopies from 1992 to 2020

Distinguishing gross primary production of sunlit and shaded leaves (GPPsun and GPPshade) is crucial for improving our understanding of the underlying mechanisms regulating long-term GPP variations. Here we produce a global 0.05°, 8-day dataset for GPP, GPPshade and GPPsun over 1992–2020 using an updated two-leaf light use efficiency model (TL-LUE), which is driven by the GLOBMAP leaf area index, CRUJRA meteorology, and ESA-CCI land cover. Our products estimate the mean annual totals of global GPP, GPPsun, and GPPshade over 1992–2020 at 125.0 ± 3.8 (mean ± std) Pg C a−1, 50.5 ± 1.2 Pg C a−1, and 74.5 ± 2.6 Pg C a−1, respectively, in which EBF (evergreen broadleaf forest) and CRO (crops) contribute more than half of the totals. They show clear increasing trends over time, in which the trend of GPP (also GPPsun and GPPshade) for CRO is distinctively greatest, and that for DBF (deciduous broadleaf forest) is relatively large and GPPshade overwhelmingly outweighs GPPsun. This new dataset advances our in-depth understanding of large-scale carbon cycle processes and dynamics.


Background & Summary
Gross primary production (GPP) is a vital component of the terrestrial carbon budget and plays a prominent role in the global carbon cycle [1][2][3][4] . Accurate estimation of terrestrial GPP is critical for understanding the interaction between the terrestrial biosphere and the atmosphere in the context of global climate change 5,6 , projecting future change 7 , and informing climate policy decisions 8 . Therefore, characterizing the spatiotemporal variation of GPP 9 is a key issue in the climate change study.
GPP is closely related to vegetation types [10][11][12] , meteorological factors [13][14][15][16][17] , soil moisture 18,19 , and other factors. In particular, GPP is affected by vegetation canopy structures 12,20,21 , e.g., sunlit and shaded leaves. Sunlit leaves can absorb direct and diffuse radiation simultaneously, and light saturation is easy to occur when the radiation is high, while shaded leaves can only absorb diffuse radiation and the absorbed radiation intensity is generally between the light compensation point and the light saturation point [22][23][24] . The two components of GPP derived from sunlit (GPP sun ) and shaded leaves (GPP shade ) have drawn increasing attentions recently due to two reasons. First, commonly used sun-induced chlorophyll fluorescence (SIF), which is strongly correlated with GPP at various scales [25][26][27][28][29][30] , is mainly emitted from sunlit leaves [31][32][33][34] and GPP sun can also be used to retrieve key photosynthetic parameters such as maximum carboxylation velocity (V cmax ) 35 . Besides, the contribution of GPP shade to the total GPP increased with the increases of leaf area index (LAI) and diffuse radiation ratio 36,37 . Therefore, it is of great

Methods
Model description. The model used in this study is a revised version of the two-leaf light use efficiency (TL-LUE) model 48 . The revised TL-LUE model adds the atmospheric CO 2 concentration regulation scalar and modifies air temperature regulation scalar. GPP is divided into GPP shade and GPP sun 52 . It is described as Eqs. (1-3): Fig. 1 Workflow of the study. CCI represents European Space Agency Climate Change Initiative Land Cover. CRUJRA represents the Climatic Research Unit and Japanese reanalysis. VPD is vapor pressure deficit. T a is air temperature. T s and C s are regulation scalars for temperature and CO 2 concentration. The dswrf is downward solar radiation flux. TL-LUE is two-leaf light use efficiency model. GPP sun and GPP shade are GPP derived by sunlit and shaded leaves.
www.nature.com/scientificdata www.nature.com/scientificdata/ sun m su su s s s where ε msu and ε msh are the maximum light use efficiency of sunlit and shaded leaves (detailed in Table 1), respectively. APAR su and APAR sh are the absorbed PAR of sunlit and shaded leaves, respectively. They were expressed as: + . × where α is albedo; θ is the solar zenith angle; β is the leaf angle, which is set as 60°; Ω is clumping index (detailed in Table 1); C is multiple scattered radiation (unit: W m −2 ); PAR dir and PAR dif (unit: W m −2 ) are the incoming direct and diffuse photosynthetically active radiation above the canopy; PAR dif,u (unit: W m −2 ) denotes diffuse PAR under the canopy 48,53 ; LAI su and LAI sh are the LAI of sunlit and shaded leaves; R represents the sky clearness index and equals to S/(S 0 cosθ), where S is solar radiation in W m −2 , and S 0 is the solar constant (1367 W m −2 ); θ is the representative zenith angle for diffuse radiation transmission 53 . The regulation scalars of temperature (T s ) 52 , water stress (W s ) 48 , and atmospheric CO 2 concentration (C s ) 54 55 (details in Table 1). VPD max and VPD min are the VPD when GPP reaches the maximum and minimum, respectively 38 . If the value of VPD is greater than or equal to VPD max , W s is equal to 0 and if the value of VPD is less than or equal to VPD min , W s is set to 1 48 . Γ * is the CO 2 compensation point in the absence of dark respiration (calculated by Eq. 16) 54 ; C i is the intercellular concentration of CO 2 (ppm). where C a is the atmospheric CO 2 concentration (using NOAA global monthly mean CO 2 concentration at the unit of ppm), and χ is the ratio of leaf-internal to ambient CO 2 56 . K is the Michaelis-Menten coefficient of Rubisco and η* is the viscosity of water relative to its value at 25 °C (0.8903) 57 . K c and K o are the Michaelis-Menten coefficients of Rubisco for CO 2 and O 2 , respectively, and P o is the partial pressure of O 2 (21 kPa) 56 . R is the molar gas constant (8.314 J mol −1 K −1 ).
Input data and processing. Eddy covariance flux measurements from the FULLSET daily product in the FLUXNET2015 dataset (https://fluxnet.org) were used for model calibration and validation. We selected site years data according to the following requirements: the missing observations of air temperature (TA_F), vapor pressure deficit (VPD_F), CO 2 mole fraction (CO2_F_MDS), incoming photosynthetic photon flux density (PPFD_IN), or shortwave radiation (SW_IN_F), gross primary production (GPP_DT_CUT_MEAN) in one year are less than 2 months. A linear interpolation was applied to fill the individual missing values, which accounted for about 2% of the total measurements. About 75% of the sites were randomly selected to calibrate the revised TL-LUE model parameters for each vegetation type, and the remaining sites were applied for parameters validation. The sites and years selected for calibration and validation are detailed in Table S1. The shortwave-to-PAR conversion parameter in global was estimated to vary between 0.39 to 0.53 [58][59][60] . With the comparison between measurements of PAR and shortwave radiation in sites, 0.43 is most suitable for this study. The spatial distribution of calibration and validation sites is shown in Figure S1.
The land cover dataset we used is European Space Agency Climate Change Initiative Land Cover (ESA-CCI land cover) at a 300 m spatial resolution for every year from 1992 to 2020 (https://cds.climate.copernicus.eu/ cdsapp#!/dataset/satellite-land-cover). We resampled the raw data to 0.05°×0.05° using nearest neighbour resampling. ESA-CCI land cover dataset uses the United Nations Land Cover Classification System (LCCS), thus, we converted them to match the International Geosphere-Biosphere Program land cover scheme (IGBP) 61 . In particular, ESA-CCI land cover provides land cover data for the years before 2000, which makes it possible to study changes in GPP caused by changes in land cover types over long-term series.
The Climatic Research Unit and Japanese reanalysis (CRUJRA) version 2.2 data 69 provided 6-hourly at 0.5° resolution meteorological variables, such as downward solar radiation flux (dswrf, unit: J m −2 6 h −1 ), specific humidity(spfh, unit: kg kg −1 ), the temperature at 2 m (tmp, unit: K), pressure (pres, unit: Pa), from 1992 to 2020. Daily dswrf used in this study (unit: J m −2 d −1 ) was converted from the CRUJRA dswrf dataset by summing four 6-hourly data per day. Temperature (unit: K) and vapor pressure deficit (VPD, unit: kPa) by taking the mean of the 6-hourly data of each day in CRUJRA. The daily dswrf was corrected with site shortwave radiation data. The global monthly CO 2 concentration (unit: ppm) is available on www.esrl.noaa.gov/gmd/ccgg/trends/. VPD was calculated using specific humidity, pressure, and temperature as Eqs. ( where t air is the air temperature at the unit of °C. VP sat means the saturated vapor pressure (kPa); spfh represents specific humidity(kg kg −1 ); press is pressure(Pa).
Calibration of model parameters. The maximum LUEs of the sunlit (ε msu ) and shaded (ε msh ) leaves spatially differ due to the changes in vegetation canopy structures and vegetation species types 70 , which leads to the distinct ε msh and ε msu of different vegetation types. To reduce the GPP simulation bias caused by ε msh and ε msu of sunlit and shaded leaves, the daily data of 68 sites (480 site years) in the FLUXNET2015 dataset (details in Table S1) were used for parameter optimization with the shuffled complex Evolution-University of Arizona method 49,71 . The agreement index (d) was used as the optimization criterion. This index was widely used in parameter optimization 49,72 . It ranges from 0 (complete disagreement) to 1 (complete agreement). Parameter values identified when d maximizes are optimization results. The calculation of d is as Eq. (25):  Table 1. The parameters of deciduous needleleaf forests (DNF) were consistent with DBF settings in modelling. The R 2 of observed GPP (GPP EC ) and estimated GPP (GPP) in WSA and SAV were 0.39 and 0.41, respectively, and the R 2 values of other vegetation types were all above 0.5, in which the R 2 of DBF was the highest (0.89). The calibration results of different vegetation types are shown in Figure S2.

Data Records
The dataset provides global gridded 0.05° GPP, GPP shade and GPP sun at three temporal resolutions (8-day, monthly, annual) from 1992 to 2020. The units are g C m −2 (8day) −1 , g C m −2 month −1 and g C m −2 a −1 , respectively. We divided the dataset into 29 folders by year, where each folder contains the data for the year at three temporal scales. The files were named as "GGGG_v21_TTTT.tif " and stored in the GeoTiff format, where GGGG represents GPP, GPP shade and GPP sun . For the 8-day scale, TTTT represents year and DOY (e.g. Shade_GPP_v21_1999_249.tif). For the one-month scale, TTTT represents year and month (e.g. Sun_GPP_ v21_1999_01.tif). For the annual scale, TTTT represents only year (e.g. GPP_v21_1999.tif). The scale factor of the monthly data is 0.1, that of the 8-day data is 0.01. The data type of the monthly and 8-day data is 16-bit integer, and that of the annual data is double. All datasets are publicly available from the DRYAD repository https:// doi.org/10.5061/dryad.dfn2z352k 73 .
The high annual average GPP values were mainly distributed in the Amazon, Indonesia and Congo Basin in the low latitude regions, which was about 3500 g C m −2 a −1 . The relatively high GPP (~2000 g C m −2 a −1 ) occurred in Southeast Asia, the southeast of North and South America, and south central Africa, and the GPP that ranged from 0 to 1500 g C m −2 a −1 accounted for 74.7% of global vegetation cover. The shaded and sunlit GPP (GPP shade and GPP sun ) had similar spatial distribution with GPP, but the value of GPP sun was lower than GPP shade near the equator (Fig. 2b,c). GPP shade that ranged from 0 to 1000 g C m −2 a −1 accounted for 78.7% of global vegetation cover, and GPP sun that ranged from 0 to 750 g C m −2 a −1 accounted for 80.8% of global vegetation cover.
www.nature.com/scientificdata www.nature.com/scientificdata/ From 1992 to 2020, the GPP of Malaysia, Southeast Asia, Indian Peninsula, central Africa, north and southeast South America, and central Europe all showed an obvious increasing trend, and the rate was close to 20 g C m −2 a −2 . The GPP scattered in central South America, east Africa, Central Asia, and Southeast Asia showed a significant decreasing trend, and the change rate was near to −20 g C m −2 a −2 (Fig. 3a). For www.nature.com/scientificdata www.nature.com/scientificdata/ GPP shade , eastern and southern Asia, central Europe, central and western Africa, northwest and southeast South America showed a significant increasing trend, and the change rate was above 10 g C m −2 a −2 , which was consistent with the trend of GPP. Besides, areas with reduced GPP shade (trend < −10 g C m −2 a −2 ) were sporadically distributed in Central Asia, Central South America, Eastern Africa, and Southeast Asia (Fig. 3b). For GPP sun , central and southeastern South America, southeastern Asia, and Europe showed the most obvious increase, with a rate of change of around 10 g C m −2 a −2 , which was similar to the spatial distribution of GPP and GPP shade trends. However, the value of GPP sun was lower than GPP shade (Fig. 3b,c). The vast majority of global vegetation cover with significant change exhibits an increasing trend (GPP trend >0: 92.0%, GPP shade trend >0: 91.2%, and GPP sun trend >0: 88.7%). In general, most global GPP, GPP shade , and GPP sun revealed increasing trends (Fig. 3).
The mean annual totals of global GPP, GPP sun , and GPP shade from 1992 to 2020 are 125.0 Pg C a −1 , 50.5 Pg C a −1 and 74.5 Pg C a −1 , respectively. Overall, the GPP proportions of individual vegetation types to total were similar for global GPP, GPP sun , and GPP shade . Among the 11 vegetation types, EBF contributed most GPP, followed by CRO. These two types accounted for more than half of the total GPP, GPP sun , and GPP shade (Fig. 4). The GPP of SAV, MF, WET, and WSA were 1.0 Pg C a −1 , 2.0 Pg C a −1 , 2.8 Pg C a −1 , and 2.9 Pg C a −1 , respectively, which were relatively low (Fig. 4a). The GPP sun were close to GPP shade for CRO, GRA, and WET, while the GPP sun were higher than GPP shade for SAV and WSA. The GPP shade of ENF, EBF, DNF, DBF, MF, and OSH with relatively complicated canopy structures were much higher than their GPP sun (Fig. 4b,c). Overall, GPP shade played a key role in GPP for forest types, while GPP sun was greater than GPP shade for non-forest types. In total, GPP shade contributes more to GPP than GPP sun . In addition, the GPP of all vegetation types showed an increasing trend. With one exception that the GPP sun of SAV showed a decreasing trend, the GPP sun and GPP shade of other vegetation types showed an increasing trend, among which GPP (also GPP sun , and GPP shade ) of CRO showed the distinctively greatest increasing trend. Except WSA, the increasing trend of GPP shade of other vegetation types is greater than that of GPP sun (Fig. 4d). It's worth noting that for DBF, the increasing trend of GPP is relatively large, and GPP shade overwhelmingly outweighs GPP sun . MF shows a similar phenomenon, but the overall trend is smaller.

Technical Validation
Validation of model parameters. Carbon flux data of 25 sites (170 site years) from the FLUXNET2015 dataset were selected (detailed in Table S1) for model validation. The comparison between GPP that estimated by the revised TL-LUE model with optimized ε msu and ε msh , and GPP measurements (GPP EC ) at each flux site is shown in Fig. 5. The revised TL-LUE model performed well in estimation of GPP for all vegetation types. All sites have R 2 above 0.5, except for AU-Gin (WSA) (R 2 = 0.49) and BR-Sa3 (EBF) (R 2 = 0.36).

Comparisons with other global GPP products.
Previous studies have shown that the differences in GPP estimation are usually caused by different model structures, parameter settings, and input data [74][75][76] . Here, we compare our global annual GPP with several global GPP products derived from remote sensing models, including data-driven models and LUE models (Fig. 6).
For data-driven GPP products, Li et al. 77 used SIF-GPP relationship and SIF observed by the Orbiting Carbon Observatory-2 (OCO-2) to obtain global GPP, which named GOSIF GPP (135.5 ± 8.8 Pg C a −1 , 2001-2017). And the WECANN product was produced using an artificial neural network (ANN) with SIF and other data sources as inputs 78 . The WECANN-GPP showed an obvious decreasing trend between 2007 and 2015, with a range of 110.1 to 118.2 Pg C a −1 . FluxSat GPP 79 , which was generated using satellite data-driven models based on the LUE framework, ranged from 148.1 to 156.7 Pg C a −1 during 2001 to 2019. In addition, a few recent studies have identified that there is a strong spatio-temporal correlation between near-infrared reflectance (NIRv) and GPP, suggesting that NIRv can be employed to estimate the GPP of vegetation 13,79 . The global total of GPP estimated from NIRv by Wang et al. 80 was 130.5 ± 3.2 Pg C a −1 during 1992-2018, close to our estimate. For the LUE models, the range of GPP obtained by the improved EC-LUE model 81 (2001-2020). The global annual GPP of FluxSat was the highest, and that of MOD17A2 was the lowest. Our estimated global GPP ranged from 120.02 to 132.65 Pg C a −1 , placing at the middle of the various GPP products. Anav et al. 76 suggested that previous observation-constrained estimates of global GPP 45,83 (e.g. based on either δ 18 O measurements of atmospheric CO 2 or eddy covariance flux upscaling) was at 120 Pg C a −1 for the period before 2010. Thus, our estimate agreed reasonably well with such estimates.
In addition, our estimated global GPP showed an overall increasing trend, which is consistent with most other GPP products. Only WECANN-GPP and EC-LUE GPP (after 2000) showed a significant declining trend. The declining trend of WECANN obviously associated with the degradation of GOME-2 SIF sensor, and the GOME-2 SIF data used for training was not corrected 84 . Uncertainties. Previous studies have indicated that different LAI products lead to clear differences in estimated GPP 85 , and the uncertainties of various LAI products are higher in low-latitude areas 86,87 . Since the land cover used by GLOBMAP-LAI is different from that used in this study, the corresponding LAI values of a small amount pixels (<0.01%) for SAV and WSA in low latitude areas are relatively high, which lead to abnormally high estimated GPP. These anomalies are not processed because of remaining the authenticity of data.
The model parameters varied in different areas, due to the plant species included in the same vegetation type. In particular, the LUE of C3 and C4 crops was greatly discrepant as many previous studies proved [88][89][90][91]  www.nature.com/scientificdata www.nature.com/scientificdata/ resolution of 500 m cannot completely match the flux tower data scale, which cause uncertainty in the parameter optimization. Simultaneously, the eddy covariance measurements also have some uncertainties, which inevitably affected the parameter optimization. www.nature.com/scientificdata www.nature.com/scientificdata/ Potential benefits and usages of this dataset. Facilitate researches on the relationship between SIF and GPP. As is known, SIF signals come mainly from sunlit leaves 31,92 . Most of the current studies on the relationship between SIF and the photosynthesis of sunlit leaves are at canopy and leaf scales, while similar studies at large scales are currently rare due to the lack of publicly available global or regional GPP datasets for sunlit leaves. In addition, it is known that there is a link between SIF and GPP across biome types 34 , but the relationship is not well quantified. The GPP sun and GPP provided in this study may help to explore the relationship between SIF and GPP sun or GPP in different ecosystem types at large scales.
Facilitate researches on the interactions between solar radiation and terrestrial carbon cycling. Compared with direct radiation, the increase of diffuse radiation can effectively promote carbon fixation 93 . Shaded leaves make more effective use of diffuse radiation 48 , and GPP shade plays a major role in the vegetation areas with more sheltered leaves or cloudy conditions 94 . The GPP shade and GPP sun cannot be directly measured over large regions. The GPP shade and GPP sun estimated by the revised TL-LUE model can capture the contribution of sunlit and shaded leaves to GPP in long-term and at large scales, and make it possible to quantify the carbon fixation increase (or decrease) influenced by the change in radiation fraction over a long period and at large scales, which facilitates further investigations on the interactions between radiation and terrestrial carbon cycling.
Facilitate researches on the dynamics, processes and drivers of GPP at large scales. This study provides 8-day and monthly GPP from 1992 to 2020, which allows for studying the changes in seasonal cycles (e.g. amplitude and phase changes of growing season) of GPP and its processes (GPP sun and GPP shade ) over many years. In addition, we employed a long-term ESA-CCI land cover data since 1992 (while most other dataset uses MODIS land cover since 2000) with the consideration of year to year land cover changes, enabling it to characterize the impact of land cover change on GPP. The dataset would help to dig the underlying mechanisms of climate and human impacts on global terrestrial GPP. www.nature.com/scientificdata www.nature.com/scientificdata/