On the forecastability of food insecurity

Food insecurity, defined as the lack of physical or economic access to safe, nutritious and sufficient food, remains one of the main challenges included in the 2030 Agenda for Sustainable Development. Near real-time data on the food insecurity situation collected by international organizations such as the World Food Programme can be crucial to monitor and forecast time trends of insufficient food consumption levels in countries at risk. Here, using food consumption observations in combination with secondary data on conflict, extreme weather events and economic shocks, we build a forecasting model based on gradient boosted regression trees to create predictions on the evolution of insufficient food consumption trends up to 30 days in to the future in 6 countries (Burkina Faso, Cameroon, Mali, Nigeria, Syria and Yemen). Results show that the number of available historical observations is a key element for the forecasting model performance. Among the 6 countries studied in this work, for those with the longest food insecurity time series, that is Syria and Yemen, the proposed forecasting model allows to forecast the prevalence of people with insufficient food consumption up to 30 days into the future with higher accuracy than a naive approach based on the last measured prevalence only. The framework developed in this work could provide decision makers with a tool to assess how the food insecurity situation will evolve in the near future in countries at risk. Results clearly point to the added value of continuous near real-time data collection at sub-national level.

2 Model performance as a function of data availability Supplementary Figure 6: Model performance as a function of different dimensions. Panel (a) shows the averaged differences between the MSE of the naive approach and the MSE of the forecasting model across the different splits, as a function of the number of training points obtained by fixing the same time series length for all countries. Error bars correspond to the relative standard deviation. The green area indicates where our model outperforms the naive one, the red area indicates the opposite. Panel (b) shows the case in which we instead considered the full time series but fixed the number of considered areas to be the same for all countries.

Data Sources
In this Section, we describe in details all the data sources used in this study.

Prevalence of people with insufficient food consumption
The prevalence of individuals with insufficient food consumption is estimated from one of the most commonly used food security indicators, the Food Consumption Score (FCS), which represents households' dietary diversity and nutrient intake [1].
The FCS is based on household surveys containing questions about quantity and frequency of consumption of food items from different food groups during a 7-day reference period. For each household, food groups consumption frequencies are then summed up in a weighted fashion, with more nutritious foods having higher weights: where w i is the weight of food group i (see Table 1) and x i represents the number of days the household has consumed items from food group i.
Each household is considered to have poor, borderline or acceptable food consumption based on the value of its FCS: households with FCS ≤ 21 are labelled as having a poor food consumption profile, households with 21 < FCS ≤ 35 a borderline profile, and households with FCS > 35 have an adequate food consumption profile. These thresholds have been defined based on assumptions on dietary patterns, but might be adapted on a per case basis to better reflect local contexts [1].
In order to obtain the prevalence of people with insufficient food consumption for a given geographical area and time window, a representative number of surveys is performed and the prevalence of households with poor or borderline food consumption is estimated. The data used in this study have been collected by WFP by means of daily remote phone surveys. The prevalence of people with insufficient food consumption for a given date is obtained by considering households interviewed during the previous d days, where d assumes different values in different geographical areas, as reported in Table 2.
The start and end dates of the time series analyzed in this study are reported, for each country, in Table 3. Missing daily values are inferred through linear interpolation (i.e. interpolating between the value for the previous day for the area under consideration and the corresponding value for the following day) but sub-national time series presenting more than 7 consecutive missing values (i.e. one week) are discarded. Table 4 reports the final list of first-level administrative units considered for each country, the overall population that they cover, and what share of the country population it corresponds to.

Prevalence of people using crisis or above crisis foodbased coping
The prevalence of people using crisis or above crisis food-based coping is derived from another popular food security indicator, the reduced Coping Strategy Index (rCSI) [2].
Supplementary    The rCSI is based on household surveys containg questions about how often households had to adopt one of more strategies to cope with not having enough food or enough money to buy food, during a 7-day reference period. For each household, coping strategies are then summed up in a weighted fashion to determine the corresponding rCSI. The five inquired strategies and their relative weights are reported in Table 5.
In order to obtain the prevalence of people using crisis or above crisis food-based coping for a given geographical area and time window, a representative number of surveys is performed and the prevalence of households with rCSI ≥ 19 is considered, as defined by the Integrated Food Security Phase Classification framework [3]. The survey data used in this study for the prevalence of people using crisis or above crisis food-based coping are the same as for the prevalence of people with insufficient food consumption, hence all the information reported in the previous section on the time series spatial and temporal resolution apply for this indicator too. Furthermore, Yemen's March 2019 values were all replaced by means of linear interpolation (i.e. interpolating between the value for the previous day for the area under consideration and the corresponding value for the following day) because of some anomalies in the original data. Fig. 7 shows all the food-based coping time series analyzed in this study.

Supplementary
Supplementary Table 5: Coping strategies asked to build the rCSI indicator and their corresponding weights [2].
Coping strategy Severity weight Rely on less preferred or less expensive food 1 Borrow food or rely on help from friends or relatives

Conflict-related fatalities
The Armed Conflict Location & Event Data Project (ACLED) is a publicly available repository providing real-time and historical data on political violence and protest events in nearly 100 countries [4]. Data about conflicts are made available each week after curation from individual researchers collecting information from available reports. Information is also provided with respect to the date (when the event happened), type of violence (what happened), actors (who is involved), and location (where the event happened). Moreover, for each event, the estimated number of fatalities is reported (ACLED does not independently verify reported fatalities, and includes this information as an estimate only, reflecting the content of media reports).
From this data, we build daily time series for each analyzed region and date by summing all the reported fatalities in a given region during the previous d days. If on a given day no event is recorded by ACLED for a given region then we assume zero fatalities in that region for that day. The time series thus obtained are shown in

Market prices
WFP monitors commodity prices in local markets on a monthly basis, and makes them available through its Economic Explorer (https://dataviz.vam.wfp.org/ economic_explorer/prices).
Commodities are categorized in eight groups ('cereals and tubers', 'meat, fish and eggs', 'milk and dairy', 'miscellaneous food', 'non-food', 'oil and fats', 'pulses and nuts' and 'vegetables and fruits'). In this study, we consider commodities in the 'cereals and tubers' category (which include rice, bulgur, millet, sorghum, etc.), since they are widely consumed food across all the different geographical contexts analyzed.
As some administrative regions can include multiple markets, we first average the price of a given commodity across all markets in the region under consideration. We then normalize all prices between 0 and 1 to discard differences in prices across different commodities (since the relevant information is whether prices are increasing or decreasing, rather than their absolute value) and finally average all normalized prices across the different commodities in each region. This allows us to have a final unique time series describing cereal and tubers price variations within each region.
Missing monthly values (up to a maximum of 3 months) are inferred through linear interpolation (i.e. interpolating between the value for the previous month for the area under consideration and the corresponding value for the following month). For Cameroon, data gaps were too extensive and the variable was therefore discarded. The time series thus obtained are shown in Fig. 9. Cameroon is not included due to the presence of large data gaps.

Weather-related variables
WFP provides a publicly available platform called Seasonal Explorer (https:// dataviz.vam.wfp.org/seasonal_explorer/rainfall_vegetation/visualizations) allowing users to assess sub-national near-global information on the current and past rainfall seasons such as the timing and intensity of drier or wetter than average conditions and their impact on vegetation status. The primary data sources are CHIRPS gridded rainfall dataset produced by the Climate Hazards Group at the University of California, Santa Barbara [5] and the MODIS NDVI CMG data made available by NOAA-NASA [6].
Data are reported for each administrative region for each 10-day window of the month ('dekad') for the following indicators: rainfall amount (in mm), 1-month rainfall anomaly, 3-month rainfall anomaly, normalized difference vegetation index (NDVI), and NDVI anomaly. The latter is meant in comparison with the long term average: for example, a value of 73% of the 3-month rainfall anomaly in the second half of August means that the amount of rainfall for the three month period ending on August 20th has been 73% of the average amount of rainfall measured in the same area during the same period of the year. The corresponding time series are shown in Figs

Permutation Entropy
A preliminary analysis of how predictable insufficient food consumption time series are can support the decision regarding the most suitable forecasting approach. To this aim, we use the Permutation Entropy (PE) as a model-free measure of time series predictability [7]. This approach is based on measuring the Shannon entropy of a time series by means of estimating the probabilities of observing trend patterns within the time series itself. The PE approach thus categorizes the continuous time series X in a small set of symbols or alphabet according to their trends.

Symbolization procedure
The first stage of the PE converts a continuous time series X into a sequence of discrete symbolsX by adopting a permutation technique. Let x(i) i = 1, ..., N denote sequences of observations from systems X.

Shannon entropy
The PE of the time series X is given by the Shannon entropy on the permutation orders: where p π is the probability of encountering the pattern associated with permutation π. An important convenience of symbolic approaches is that they discount the relative magnitude of the time series [8]. This is important in our case because different geographical areas can differ largely in the prevalence of people with insufficient food consumption. The embedding dimension m and the time delay τ are to be set in order to derive a reliable state space. There exist different procedural approaches in order to deal with this setting decision [9,10]. In order to find the appropriate embedding dimension for clustering a set of time series, we follow the instructions proposed by Scarpino & Petri [7]. The time delay is fixed to τ = 1 in order to Supplementary Figure 15: Correlation analysis for Yemen.
get results from continuous intervals. Finally, the metric used is the predictability defined as χ = 1 − H. The closer to 1 the χ is, the more regular and predictable the time series is. Contrarily, the smaller χ is, the more noisy and random the time series is.

Correlation Analysis
In order to determine, for each country, the most representative independent variables among the five weather-related ones (rainfall, 1-month rainfall anomaly, 3month rainfall anomaly, NDVI and NDVI anomaly), we have performed some correlation analysis. To this goal, we use the Pearson's correlation coefficient. These measures are calculated by first converting the time series that are not at daily resolution to a daily resolution: we use different interpolation strategies depending on the indicator (e.g. backfilling in case of cereals and tubers prices) and we aggregate them on the basis of the d parameter (e.g. we average the interpolated price of cereals and tubers of the previous d days). Finally, we average the values of the correlations obtained between the different time series across the regions of the same country. The results of this correlation analysis are shown in Figs. 15, 16, 17, 18, 19, 20. All cases where the pairwise correlation was above 0.45 were then considered. In each case, one of the two variables was discarded to avoid collinearity, as listed in Table 6. This table also reports the Variance Inflation Factor (VIF) was computed for all remaining variables. All values are all below 3, indicating no significant multicollinearity. Supplementary

Forecasting
The forecasting effort focuses on predicting daily sub-national prevalence of people with insufficient food consumption up to 30 days in advance. There exists a variety of different statistical and machine learning algorithms that can be adopted to reach this goal, ranging from simple strategies that assume little or nothing about the nature of the phenomenon to more complex ones that involve multiple features. Our approach to time series forecasting has been to privilege machine learning algorithms because they do not require any prior knowledge on the nature of the data distribution [11] and allow for more flexibility to integrate additional variables as predictors. There are several studies that confirm how machine learning algorithms have become increasingly popular in dealing with time series forecasting tasks [12,11,13,14,15].
In this study, we chose to use the eXtreme Gradient Boosting (XGBoost) algorithm [16] (https://xgboost.readthedocs.io/en/latest/) that has been widely used in many fields to achieve state-of-the-art results on famous data challenges (e.g. Kaggle competitions). XGBoost belongs to a category of so called ensemble learning approaches, which is a branch of machine learning methods that train and predict with many models at once to produce a single better output. In the case of XGBoost, the base model is a decision tree that is considered best-in-class for handling small to medium-sized data. The XGBoost algorithm deals with supervised machine learning tasks, like classification and regression. In this regard, the time series forecast process must be adapted to the regression setup. Since the concerned configuration does not support a multi-output design, we employ 30 different regression models in order to cover all our prediction horizons.

Modeling framework
Supervised learning is a machine learning paradigm for acquiring relationship information between input features (X) and output targets (y). The goal of supervised learning is to build an artificial system that can learn the mapping between the input and the output. These points with which the algorithm is trained are called training points.
The core of forecasting is knowledge of the past. Therefore, the formulation of the related supervised learning problem causes the input features and the output targets to look back in history, inspecting how the past has affected the near future. The basic concept is that an appropriate subset of temporal lags, starting from a historical reference date, explains the close evolution of the phenomenon investigated in a particular system.
In this study, we deal with systems representing the administrative regions of different countries. Each of these regions is described by different time series concerning food insecurity, as described in the Data Sources section. We need to pay attention to the heterogeneity of our systems: the multiple indicators translate into multivariate time series with different temporal resolutions.
The construction of the input-output samples relies on rolling a temporal window over the time series in order to collect a set of inputs based on predefined temporal lags and a set of outputs based on prediction horizons.
Let's suppose to analyze a system described by three short time series (A, B and C) of which we want to predict one of them (A). These time series have different temporal resolutions: 1 day, 3 days and 5 days respectively. As described above, we need to define some temporal lags for these time series (e.g. A : [1,2,3,5]; B : [1,2]; C : [1,2,3]). Following the rolling scheme, we can obtain a set of training points for the first prediction horizon (h = 1), as shown in Figs The creation of the training points related to the prediction horizon h = 3 follows the same criteria. The main difference regards the temporal gap between the latest date of each input sample and the corresponding output sample, as shown in Fig. 21c.
We apply this method to our task by analyzing each administrative region of our countries. More precisely, each region is characterized by a common set of indicators that translate into training points with the same structure. The heterogeneity of the systems is managed by preserving the temporal granularity of the time series among the various indicators during the rolling phase. At the end of the process, we obtain Supplementary Figure 22: Training points. These tables show the input features (a) and output targets (b) for the prediction horizon h = 1.
(a) Input features X.
a set of training points for each prediction horizon and region. The choice of the lagged values for our indicators is reported in Table 7 and is led by the knowledge that long history lengths do not benefit the prediction.

Model Evaluation
The scheme of our approach is the following: 1. Suppose we go back in time and find ourselves at the end of January 2020 with knowledge of the data history limited to that date. Our current aim is to predict food insecurity for the following 30 days; 2. On the basis of what was known until the end of January 2020, XGBoost can be trained to predict food insecurity in the following 30 days; 3. Suppose that time passes and we reach the end of February 2020. With the updating of new real data, we can test the generated prediction; 4. Now, we would like to predict food insecurity in the following 30 days from the end of February 2020; 5. And so on until we reach the end of August 2020.
In this way, we get 7 unbiased splits that simulate an application of our solution over time for each administrative region, as shown in Fig. 23.
Supplementary Figure 23: Nested cross-validation. The illustrative scheme of nested cross-validation used in this study.
The validation phase fits into this process by simply splitting the training points of each region into two sets: the first 80% samples are used for training and the remaining ones for validation. Validation is performed independently between the splits ensuring the use of an unbiased approach, as shown in Fig. 24.
Our validation scheme aims to optimize the prediction framework by both tuning model hyper-parameters and performing feature selection. The aim of this optimization is to find the configuration that returns the best performance as measured on a validation set. It can be performed following four common methods: manual, grid search, random search and bayesian model-based optimization. All these techniques are based on the minimization of an objective function. In simple terms, we want to find the configuration that yields the best objective score on the validation set. The problem with the optimization is that evaluating the objective function to find the score is extremely expensive. Each time we try different configurations, we have to train a model on the training data, make predictions on the validation data, and then calculate the objective score. Grid search and random search are the most popular optimization techniques. However, these methods are relatively inefficient because they do not explore the space of the configurations evaluating previous iterative results. Grid and random search are completely uninformed by past evaluations, and as a result, often spend a significant amount of time evaluating bad configurations. For this reason, we focus our attention on the bayesian approach that keeps track of past evaluation results. In this regard, we integrate into our model the Hyperopt library [17] which intelligently explores the search space while narrowing down to the estimated best configurations. The objective function that we decided to use is a trade-off estimation between the R 2 metrics on the training and validation sets: where w 1 = 0.7 and w 2 = 0.3. The first part of the equation is meant to select the model that minimizes the difference in R 2 between training and validation, in order to minimize overfitting. The second part allows to ensure that only models with high enough R 2 are selected.
Moreover, an early stopping occurs when no more advantage are measured on the validation set in order to prevent overfitting and computational costs.
More in details, our hyper-parameter optimization explores the set of XGBoost parameters and corresponding values reported in Table 8.
Supplementary In regards to the indicators selection, we explore different configurations of the independent variables deciding to keep or exclude some based on the results on the validation set. For each split, the starting configuration uses the default parameters of the XGBoost model and the univariate setting where the only predictor is the target variable. The total number of hyperopt configurations is 600 for each country.