A consistent and corrected nighttime light dataset (CCNL 1992–2013) from DMSP-OLS data

Remote sensing of nighttime light can observe the artificial lights at night on the planet’s surface. The Defense Meteorological Satellite Program’s Operational Line Scan (DMSP-OLS) data (1992–2013) provide planet-scale nighttime light data over a long-time span and have been widely used in areas such as urbanization monitoring, socio-economic parameters estimation, and disaster assessment. However, due to the lack of an on-board calibration system, sensor design defects, limited light detection range, and inadequate quantization levels, the applications of DMSP-OLS data are greatly limited by interannual inconsistency, saturation, and blooming problems. To address these issues, we used the power function model based on pseudo-invariant feature, the saturation correction method based on regression model and radiance-calibrated data (SARMRC), and the self-adjusting model (SEAM) to improve the quality of DMSP data, and generated a Consistent and Corrected Nighttime Light dataset (CCNL 1992–2013). CCNL dataset shows good performance in interannual consistency, spatial details of urban centers, and light blooming, which is helpful to fully explore the application potentials of long time series nighttime light data.


Background & Summary
Remote sensing of nighttime light (NTL) can detect weak artificial light at night and obtain surface information completely different from that in the daytime, so it is widely used to monitor various information and changes related to human activities 1,2 . In recent years, there have been some remote sensing satellites of nighttime light and the data products, such as Luojia-1, JL1-3B, etc 3,4 . However, the human activity progress reflected by the change of NTL in a long time series can only rely on the Defense Meteorological Satellite Program's Operational Line Scan (DMSP-OLS) data available since 1992 and the Suomi National Polar-orbiting Partnership's Visible Infrared Imaging Radiometer Suite (NPP-VIIRS) data available since 2012. The long history of temporal NTL data is widely used to study the urbanization process 5 , demographic changes 6 , economic changes 7 , power consumption 8 , and other research. Therefore, obtaining high-quality NTL data for research on the change in these long-term human activities is of great significance.
The stable lights dataset from Version 4 DMSP-OLS Nighttime Lights Time Series is the most common and most commonly applied DMSP-OLS data product. However, they suffer from the problems of interannual inconsistency, saturation, and blooming [9][10][11][12][13] . (1) Interannual inconsistency. DMSP-OLS NTL annual composite from 1992 to 2013 is acquired by sensors onboard six different satellites without calibration mechanism 14 . The lack of onboard calibration, sensor degradation, and satellite orbit drift result in interannual inconsistency for the sum of NTL digital number (DN) values at global and regional scales 15 . (2) Saturation. Since stable light products are acquired under low moonlight illumination conditions, the sensors need to be set at a high gain to detect weak ground light, which leads to oversaturation problems in areas of high brightness, especially in urban centres 16 . Due to the 6-bit quantization and low dynamic range of OLS data, the DN value no longer increases with the increase of ground light intensity when it reaches 63. (3) Blooming. The possible reasons for the blooming effect can be summarized as sensor field of view changes during scanning, accumulation of geographic bias in data synthesis, data resampling during onboard data storage, and atmospheric effect 12,17,18 .
Researchers have proposed several methods to address these problems. For the interannual inconsistency of the DMSP-OLS stable light data, one of the common relative calibration methods is the pseudo-invariant feature (PIF) 19 .  14 took Sicily as a PIF region and F12-1999 as the reference image and adopted the second-order polynomial function model to correct the interannual inconsistency of other images. Li et al. 20 proposed a stepwise calibration method, which used prior knowledge to judge the anomalies in NTL time series curves and processed images from multiple sensors in turn. In response to the saturation problem of DMSP-OLS data, some studies have used auxiliary data such as vegetation index 21 , surface temperature 22 , and DMSP-OLS radiance calibrated data 23,24 to restore light information in saturated areas. In eliminating the blooming effects of DMSP-OLS data, Abrahams et al. 25 and Zheng et al. 18 took the blooming effect as an image blur problem and that the blooming brightness of an image pixel can be fitted with a Gaussian surface. Cao et al. 26 developed the self-adjusting model (SEAM) based on the spatial response function to correct the blooming effect without using other ancillary data. Zhuo et al. 27 proposed an improved SEAM model (iSEAM) considering spatial heterogeneity of effective blooming distance while introducing land cover data. Based on the above methods, some global or regional NTL data products were generated to overcome one or more of these problems 14,28-31 . However, a global data product for all the solvable issues has yet to emerge.
Therefore, this study aims to address the three problems of interannual inconsistency, saturation, and blooming of DMSP-OLS stable light data and to produce a consistent and corrected nighttime light dataset (CCNL 1992-2013) from DMSP-OLS data. The CCNL dataset produced in this study will lay the foundation for creating complete sequence (1992-present) NTL data and provide valuable data for the applications of historical DMSP-OLS data.

Methods
Data collection. The datasets utilized in this study include two categories, as shown in Table 1. The first one generates the global consistent and corrected nighttime light (CCNL) dataset, including DMSP-OLS stable light product and the radiance calibrated nighttime light product. The second category is the auxiliary datasets to assess the quality and accuracy of the CCNL dataset, which contains other types of NTL products, urban land products, and socio-economic statistics.
(1) Stable light product Nighttime stable lights product is one of the datasets belonging to Version 4 of Global DMSP-OLS Nighttime Lights Time Series (1992-2013), which has been applied to various areas. It is a cloud-free annual composited product that collects all the available archived DMSP-OLS smooth resolution data for calendar years from six satellites, F10, F12, F14, F15, F16, and F18. The stable light products are composited cleaned up average visible band digital number values containing the lights from cities, towns, and other sites with fires have been discarded 7,32 . The background noise was identified and replaced with values of zero. Data values range from 1 to 63. Areas with zero cloud-free observations are represented by the value 255. The products are 30 arc-second grids, spanning −180 to180 degrees longitude and −65 to 75 degrees latitude. The product is free and available at https://eogdata.mines.edu/dmsp/downloadV4composites.html.
(2) Radiance calibrated nighttime light product Global radiance calibrated nighttime lights were produced without sensor saturation by combining sparse data acquired at low gain settings with the operational data obtained at high gain settings which can be related to radiances based on the pre-flights sensor calibration 7,16,22 . This product has the exact resolution and coverage as the stable light product. Due to limitations in the acquisition of low gain data, the radiance calibrated product is only available in 7 different years (circus 1996, 1999, 2000, 2003, 2004, 2006, and 2010). The product is free and available at https://eogdata.mines.edu/dmsp/download_radcal.html. Radiation Calibrated data significantly eliminates the saturation effect by synthesizing data at different gains and has substantial advantages in spatial analysis. However, the number of images is relatively small to meet the requirements of some studies. Figure 1 shows that the radiance calibrated data portrays the change of brightness in the urban core than the stable light data, which is conducive to studying spatial and temporal changes within the city.
(3) NPP-VIIRS Version 1 VIIRS Day/Night Band Nighttime Lights is a superior NTL dataset than DMSP-OLS data, mainly providing daily composites and monthly composites products since April 2014 (https://eogdata.mines.edu/ nighttime_light/monthly/v10/). Considering the temporal overlap of NPP-VIIRS data and DMSP-OLS data, the annual composite product was synthesized using the median of monthly composite products of 2013, which reduces the impact of noise and improves data quality. www.nature.com/scientificdata www.nature.com/scientificdata/ (4) EVI data for calculating EANTLI The EANTLI proposed by Zhuo et al. 21 can reduce the saturation effect in urban centers. It can be expressed mathematically in the equation: where EVI is the annual maximum value of EVI (enhanced vegetation indices), NTL is the DMSP-OLS nighttime light intensity, while nNTL indicates the normalized NTL. EVI data is provided by the MODIS MOD13A2 V6 product (https://lpdaac.usgs.gov/products/mod13a2v006/), containing 16-day vegetation index maps at a 1 km spatial resolution. Considering the inconsistency in spatial resolution, we resampled the MODIS EVI product to 30 arcsec resolution using bilinear interpolation.

(5) Global urban data and socio-economic data
Since the source of stable light at night is mainly artificial light in urban areas, extracting city-wide is one of the most common applications of NTL data. To further validate the potential of the CCNL dataset for urban studies, the global urban boundaries dataset and global urban land dataset were used to evaluate the quality of CCNL. Li et al. 33 developed an automatic delineation framework to generate a 30 m resolution global urban boundaries (GUB) dataset in seven representative years (i.e., 1990,1995,2000,2005,2010, 2015, and 2018) using 30 m global artificial impervious area (GAIA) data 34 . The GUB dataset can be freely downloaded from http://data.ess.tsinghua.edu.cn/gub.html. He et al. 35  Framework. The DMSP-OLS NTL product suffers from three main problems, i.e., interannual inconsistency, saturation, and blooming effect, which will affect the accuracy of urban extraction and the estimation of the social-economic indexes. The study first adopted three correction methods to rectify interannual inconsistency, saturation, and blooming effects, as illustrated in Fig. 2. We then used auxiliary datasets to evaluate the CCNL dataset in terms of the transect, socio-economic statistics, and urban extraction. Fig. 3, there is an apparent interannual inconsistency in the sum of NTL DN values at global and national scales. For example, we can observe a significant decrease in F15 satellite data from 2002 to 2003 in all regions, with this phenomenon present throughout the stable light products. Penny et al. 19 proved that globally applicable NTL calibration minimizes interannual bias to a higher extent than regionally applicable NTL calibrations. Zhang et al. 37 pointed out that Wu 30 's and Zhang 31 's respective methods have good performance in global-scale applications. Based on geographical location, uniform range distribution, and distance from the mainland, Wu et al. 30 selected Mauritius, Puerto Rico, and Okinawa as PIF regions, with the radiance calibrated data in 2006 as the reference image and the power function model as the correct model. Because of the simplicity of Wu's method, we selected it for interannual correction of the original images in this study. The regression model is as follows:

Interannual inconsistency correction. As shown in
where DN c is the pixel value after correction, DN m is the original DN value, a and b are the unknown coefficients in the model. Wu et al. 30 provided correction factors for the years 1992 to 2010, and we calculated the correction factors for the remaining years according to their method. The model coefficients are shown in Table 2. When there are two interannual calibration results for the same year, for cases F142000 and F152000, we took the average of these two results as the final result. www.nature.com/scientificdata www.nature.com/scientificdata/ Saturation correction. Recently, Hu et al. 23 proposed a saturation correction method based on regression model and radiance-calibrated NTL data (SARMRC) by using discrete radiance calibrated data to correct the saturation effect in the annual stable light data. Compared to other saturation correction methods, SARMRC methods do not require different types of data and perform very well, so we used the SARMRC method for saturation correction on a global scale. We identified the region with a DN value of 63 as the saturation region. Sample pixels were selected based on the difference between the DMSP/OLS stable light data and the radiance calibrated data from neighboring years. Saturation zone DN values can be obtained from the corresponding area of the radiance calibrated data and the logarithmic model. The specific equation is as follows: where DN LM is the DN value corrected by the logarithmic model, DN R is the radiance calibrated data DN value of the saturated zone, a and b are the coefficients of the regression model. Some stable light data have radiance calibrated data from two adjacent years, for example, the stable light data in 2008 can be corrected by the data from 2006 or 2010, respectively. Therefore, for this type of stable light data, we used the weighted average of its two correction results as the final correction result, as follows: where DN LMDA is the double-year adjusted DN value, DN LM1 and DN LM2 are the correction results obtained from the radiation correction data of different years, R 1 2 and R 2 2 are the correlation coefficients between the stable light data and the radiation correction data, respectively.
Blooming effect correction. We chose the SEAM model proposed by Cao et al. 26 which does not require auxiliary data and works well. SEAM model assumes that a pseudo light pixel (i.e., a bright pixel adjacent to the background) should have no light, and its value is contributed by the blooming effect of other bright pixels around it. The blooming effect can be quantitatively described by a spatial response function with pseudo light pixels as samples, and the specific equation is as follows: where R′ is the value of brightness change due to the blooming effect, R i represents the pixel value of the neighboring pixels in the moving window, N is the number of neighboring pixels, d i is the Euclidean distance from the pseudo-pixel, a and b are coefficients describing the blooming effect. Using the pseudo-pixel as a sample, the coefficients a and b were obtained by regression analysis. After obtaining the coefficients, we can estimate the luminance value due to the blooming effect for any bright pixel. The final result was obtained by subtracting the brightness value caused by the blooming impact from the original brightness value. To alleviate saturation's influence on blooming effect correction, we performed the saturation effect correction before blooming effect correction on DMSP-OLS NTL images 27 .
Calculation platform. Google Earth Engine (GEE) is a cloud-based platform for geospatial analysis with many publicly available image datasets 38 . GEE has played an essential role in the fields of resource mapping, disaster www.nature.com/scientificdata www.nature.com/scientificdata/ monitoring, public health, and environmental protection [39][40][41][42] . We leveraged the superb computing power and the rich public datasets of the GEE platform to complete the entire data production process.

Data Records
The consistent and corrected nighttime light dataset (CCNL 43 ) from DMSP-OLS data (CCNL 1992-2013) in the WGS84 coordinate system with a spatial resolution of 30 arcsec (~1000 m) can be freely accessed at the Zenodo repository (https://doi.org/10.5281/zenodo.6644980), which is stored as the GeoTIFF format (~300 MB) for each year. The current version of products spans the globe from 75 N latitude to 65 S.

technical Validation
The CCNL dataset was produced to apply to Spatio-temporal analysis at both global and local scales, overcome important problems in DMSP data and unlock the potential of NTL products. We evaluated the quality of CCNL in three aspects. Firstly, the spatial information of the CCNL dataset at the local scale was measured by cross-sectional analysis. Secondly, the effectiveness of city-scale extraction from the CCNL dataset was assessed at spatial and temporal scales. Finally, the performance of CCNL data in the temporal analysis was verified at a large scale by using socio-economic data such as GDP, population, etc. (

1) Comparison of transects on NTL images
It is well known that the DMSP-OLS stable light data suffers from severe saturation and blooming effect problems, which seriously affect the application of the data. The method we proposed in this research can effectively   www.nature.com/scientificdata www.nature.com/scientificdata/ address these challenges. We compared the stable light data, EANTLI, CCNL, and NPP-VIIRS data by visual and randomly selected data transects to evaluate the quality of the produced CCNL data. Nine cities around the world were selected to assess the quality of CCNL data, i.e., Beijing, Tokyo, New Delhi, Sydney, Sao Paulo, Johannesburg, Dallas, Paris, and Moscow (Fig. 4). These cities were selected because of their large urban extents, dramatic spatial variabilities of NTL, and the uniformity and representativeness of their global distribution.
Visually, NPP-VIIRS data can monitor the spatial pattern of nighttime lights in urban interiors thanks to its high spatial resolution and large dynamic range, which can be considered as 'ground truth' . The EANITLI data www.nature.com/scientificdata www.nature.com/scientificdata/ mitigates blooming and saturation effects by combining nighttime light data and enhanced vegetation indices (EVI). The CCNL dataset produced in this research recovers the light intensity values in saturated regions by taking advantage of the radiance calibrated data. As shown in Fig. 4, we can observe roads and landmarks in CCNL and EANTLI. For example, one can see Beijing's famous Chang'an Street and Tokyo's complex road network.
We selected a random latitude in each city's center and extracted the latitude transects for each dataset. The NPP-VIIRS data were taken as the reference data, and the correlation coefficients R with other data were calculated separately. In the unsaturated region, the data show a similar pattern of variation. While in the saturated area, the DMSP-OLS stable light data reaches a maximum value of 63 and remains constant due to the saturation effect, which makes the data unable to provide spatial differences in lighting within the city. Both EANTLI and CCNL reduce the saturation effect with different magnitudes. The stable light dataset had the worst correlation, with a mean value of the correlation coefficient R of 0.49 (Fig. 5), while the mean values of R for the CCNL and EANTLI were 0.74 and 0.70, respectively. Paris has a spatial correlation coefficient R of up to 0.89 for CCNL. However, the R-values for CCNL are not always higher than EANTLI. In five of the nine cities, the R-values for CCNL are higher than EANTLI. CCNL has the lowest R-value of 0.56, while ENTLI is 0.57. The R-values of CCNL data are smaller than EANTLI data in some cities, but the R-values of the two datasets are very close.
(2) Evaluation of urban extent extraction Urban area extraction is the most common application field for NTL data, as most of the stable lights at night come from artificial lights in urban areas. We extracted urban areas using a fixed threshold method and evaluated the results' overall accuracy (OA). The GUB data 33 was used as reference data for the qualitative analysis, and UrbanLand 35 was used as reference data for quantitative evaluation.
The GUB dataset has a high resolution of 30 m, which can be used to verify the effectiveness of the CCNL dataset in urban extent extraction. We choose Beijing, Shanghai, and the Pearl River Delta region as visual test areas, as shown in Fig. 6, which have al experienced significant urban expansion over the past few decades. From the visual point of view, the light intensity values in the stable light data near the city boundary of the GUB data do not change significantly, and the blooming effect can be observed in different areas. Due to the severe spillover effect, the stable light data is not a good indicator of the true extent of the city. Compared with the stable light data, the spatial pattern of CCNL is more similar to GUB data. The values of CCNL vary with dramatic changes near city boundaries, which can be easily visualized. Even some small cities and towns can be observed, while only some tiny towns are not extracted, suggesting that the CCNL dataset is effective in eliminating the blooming effects.
We used the UrbanLand dataset as reference data to quantitatively characterize the effect of CCNL's urban extraction. This study extracted city contours using a simple fixed-threshold method with thresholds derived from visual interpretation. The UrbanLand dataset extracts city limits with greater precision in six discrete years, i.e., 1992, 1996, 2000, 2006, and 2010. Table 3 shows the fixed thresholds for extracting city limits for different years in three cities), i.e., Paris, Tokyo, and Chicago, and the OA value for each dataset and each year, using the UrbanLand dataset as the reference dataset and fixed threshold method. Results show that the overall accuracy of the CCNL dataset for extracting city ranges reaches over 93% in Paris and Chicago and 88% in Seoul. In Paris, Seoul, and Chicago, CCNL's OA increased by 1.94%, 1.43%, and 1.20%t, respectively, compared to the stable light data.  www.nature.com/scientificdata www.nature.com/scientificdata/ in Fig. 7. In Fig. 7, we can observe Beijing's urbanization, the city's continuous expansion, and the suburbs' development. The UrbanLand dataset was used as reference data to calculate the OA of the extracted results ( Table 4). The OA is greater than 90%, with an average value of 94.2%.
(4) Performance of CCNL time series The density of the use of regional lighting facilities can, to a certain extent, reflect the economic situation, energy consumption, and population of the region, and a large number of statistical studies have proved that there is a high correlation between the intensity of nighttime light and this socio-economic data 6   www.nature.com/scientificdata www.nature.com/scientificdata/ provides the time series of NTL from 1992 to 2013 and reduces the temporal inconsistencies in the original time series through the relative correction methods. To demonstrate the effectiveness of the interannual correction method, we selected ten countries and performed a correlation analysis using economic, population, and energy consumption data from the World Bank (Table 5). These ten countries were selected from developed and developing countries on different continents.
Regarding power consumption, the correlation coefficient (R) of CCNL is higher than that of stable light, and the average R-value is 0.84 and 0.56, respectively, which are greatly improved. The correlation coefficient between urban population and NTL intensity is higher than that of the total population because artificial lights at night are mainly concentrated in urban areas and less distributed in the suburbs. In terms of the total population and urban population, the average R-value of CCNL increased by 0.17 and 0.14, respectively. While in terms of GDP, CCNL performed even worse than the stable light data in some countries, possibly because the change in GDP data does not fully reflect the change in light intensity, especially in some developed countries. Another reason is that a global model for interannual correction may result in overcorrection in some regions.

Usage Notes
Similar to DMSP-OLS stable light data, the pixel value of CCNL data is the digital number, not a physical quantity. The auxiliary data used to eliminate the saturation effect is the DMSP-OLS radiance calibrated dataset (1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010). For years beyond the period (1992-1995 and 2011-2013), the correction of a saturated region relies on the same data, and the processed data have the same spatial structure, which may not reflect the spatial change of the region. When using CCNL dataset for temporal change analysis, it is recommended to analyze the changes of sum of NTL values at regional or national scale (window size equivalent or over 200 pixels), while pixel scale and small statistical regions may have large fluctuations. This dataset can be used for monitoring human activities at local and global scales and for historical time series analysis.

Code availability
The source code for processing the DMSP-OLS Stable Light dataset to produce the CCNL dataset is available at https://doi.org/10.5281/zenodo.6100284 44 .  Table 5. The correlation coefficients (R) between NTL intensity and GDP, electricity, urban population, and total population in the 10 selected countries.