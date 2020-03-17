From light intensity to GDP levels

We convert the NPP/VIIRS luminosity data by subnational region (Chagang-do, Hamgyong-bukto, Hamgyong-namdo, Hwanghae-bukto, Hwanghae-namdo, Kangwon-do, P’yongan-bukto, P’yongan-namdo, P’yongyang-si, Yanggang-do) to GDP in US dollars making use of the empirical relationship between night-time light and GDP described in Li et al. (2013), which provides estimates for the elasticity of GDP to night-time light data measured by the NPP/VIIRS for subnational regions in China. The range of NPP/VIIRS values observed for North Korean regions roughly corresponds to that in the lower part of the distribution of night-time lights for Chinese prefectures, and given the robustness of the linkage found in Li et al. (2013), the estimates of total GDP by region in North Korea can be considered reasonably reliable. We make use of the data on population by region from the 2008 census in North Korea and extrapolate them to the year 2018 using the national-level population growth rate of population implied by the figures from the 2017 revisions from the United Nation’s World Population Prospects (United Nations, 2017). We do not make an attempt to model cross-regional migration in the period under scrutiny, which may lead to a bias in our assessment of income convergence trends if internal migration took place from poorer to richer parts of the country during these years. The assumption of constant population growth across regions may also have a significant effect on the estimates of the dynamics of GDP per capita, but due to the lack of information on internal migration patterns in the country, it is impossible to complement the projections with a model for internal population mobility.

Estimating poverty in North Korean regions

In order to assess quantitatively the potential range of poverty rates both at the national level and in North Korean regions, we compute the share of people living in extreme poverty (as measured by those earning <$1.90 a day) implied by imposing the income distribution corresponding to those economies, which are most similar to North Korea in terms of income per capita, educational level and demographic and sectoral structure. For this purpose, we construct the Euclidean distance between the vector composed of normalized measures of (a) employment share by economic sector (agriculture, manufacturing, services), (b) share of population by age (in 5-year groups) and gender, (c) share of persons by educational attainment level (primary, secondary, tertiary), and (d) GDP per capita, for North Korea and all countries of the world for which these data are available. We select the seven closest economies and anchor their respective income distributions (based on estimating a Beta-Lorenz curve) on the average income per capita of North Korea as obtained by transforming the luminosity data to GDP per capita figures.

The data for education and age structure are sourced from the Wittgenstein Center for Demography and Global Human Capital (http://www.wittgensteincentre.org/dataexplorer). GDP per capita in constant 2011 PPPs comes from the World Economic Outlook (WEO) and employment by sector is sourced from the World Bank’s World Development Indicators (WDI) database. For North Korea, GDP per capita is estimated using luminosity data and the rest of the variables are from the North Korean 2008 census.

Similarity between economies is defined based on different methods, using as input the four vectors that summarize the variables described above (age, employment, education, and GDP per capita). These methods are alternatively based on identifying the closest k economies using Euclidean distance and employing averages of these or using weighted averages employing the Euclidean distance between these vectors as weights. We start by calculating the Euclidean distance between the vectors of variables for North Korea (and its subnational regions) and those of all country/year pairs for which data are available. We normalize the distance vectors to fall in the range 0 and 1 and sum over the four vectors (age, employment, education, and GDP per capita) to obtain a ranking of similarity that allows us to select the the k countries whose characteristics are the closest to those of North Korea.

Data availability restrictions leads to a group of 34 countries spanning 40 country/year observations, which can be used for validation of our poverty estimation method. For our validation exercise we apply values of k ranging from 3 to 10 and assess the predictive power of the method for reconstructing income distributions for country/years where data are available. For a given k, the median poverty rate of the countries with the smallest Euclidean distance to the vectors of socioeconomic characteristics (age, employment, education and GDP per capita) is used as an estimate. Alternatively, we also employ a weighted average of the poverty rates of all potential comparator countries weighted with the inverse of the similarity measure given by the corresponding Euclidean distance to the economy of interest. Versions of this weighting scheme based on second, third, and fourth powers of these weights are also applied and compared in the cross-validation exercise. Our metric of evaluation is the mean-squared prediction error of the estimate when compared to the actual survey-based poverty rate figures. For the construction of the vectors of characteristics, two alternatives are used depending on whether the distance is based on the level of GDP per capita or its logarithm. This leads to 24 potential variants when implementing the method based on the choice of (a) a value of k = 3, …, 10 or a weighting scheme (first to fourth power of the inverse Euclidean distance) and (b) whether GDP per capita is used in levels or in log-levels.

We conduct the validation exercise using the predictions of poverty rates implied by all 24 variants for the 40 country/year available observations. The evaluation of predictive power is carried out making use of the root-mean-squared prediction error and the results are presented in Table 1. The validation results confirm that the approach based on the median estimates from the k closest country/year observations is superior to the method based on weighting for all powers of the distance measure. In addition, estimates based on log-transformed GDP per capita figures also tend to provide more precise poverty predictions within the group of fixed k models. The distributions of prediction errors across choices of k and by country/year observation are depicted in Figs 6 and 7, respectively. The model based on the choice of k = 7, which corresponds to the specification that performs best in the validation exercise, does not deliver large outliers in the tail of negative prediction errors, which tend to be endemic for other choices of k.

Table 1 Poverty rates: root-mean-squared errors by model choice. Full size table

Fig. 6: Distribution of poverty rate prediction errors across and within choices of k. Validation sample composed by 40 country/year observations. Full size image

Fig. 7 Distribution of poverty rate prediction errors across and within validation country/year observations, for different modeling choices. Full size image

For our choice of k = 7, the most similar economies to North Korea in terms of age structure, sectoral composition, educational attainment and average income per capita are Romania in 1999, Albania in 2002, Madagascar in 2002, Georgia in 1996, Bangladesh in 2016, Vietnam in 2010, and Armenia in 1996. For this set of economies, we calculate the share of mean expenditure to GDP per capita using data sourced from PovcalNet, UNU-WIDER and Poverty Equity, and employ the mean expenditure per unit of GDP per capita for North Korean regions in order to anchor the income distribution. Finally, we combine the mean expenditure estimate in North Korean regions with the corresponding Lorenz curves estimated for each matched economy (Crespo Cuaresma et al., 2018) in order to obtain poverty rates. Our final estimate for a given region is the median of those rates. Overall, the set of possible choices of income distributions is composed of 1559 Lorenz curve estimates.

We also assess the robustness of our estimates applying alternative methods based on clustering. For each one of the four vectors of variables, we use agglomerative (bottom-up) hierarchical clustering to find similar economies. This method starts with each country/year observation in a different cluster. At each step, the most similar pairs of clusters are merged, and this is repeated until only one cluster is left. Cluster similarity is determined based on the most pairwise dissimilar elements in each cluster (complete linkage). The resulting dendrogram can be used to select country/years sharing a cluster with North Korea. This method, however, delivers predictions that are inferior in terms of predictive power to those obtained by setting k = 7, which is our choice for the estimation of poverty rates.