Using aircraft location data to estimate current economic activity

Aviation is a key sector of the economy, contributing at least 3% to gross domestic product (GDP) in the UK and the US. Currently, airline performance statistics are published with a three month delay. However, aircraft now broadcast their location in real-time using the Automated Dependent Surveillance Broadcast system (ADS-B). In this paper, we analyse a global dataset of flights since July 2016. We first show that it is possible to accurately estimate airline flight volumes using ADS-B data, which is available immediately. Next, we demonstrate that real-time knowledge of flight volumes can be a leading indicator for aviation’s direct contribution to GDP in both the UK and the US. Using ADS-B data could therefore help move us towards real-time estimates of GDP, which would equip policymakers with the information to respond to shocks more quickly.


Data cleaning
We use the altitude and timestamp fields from the ADS-B messages to estimate how many flights each aircraft makes per month. Figure S2 shows that, for some aircraft, there are clear errors in altitude data. These errors could reduce the accuracy of our estimates of aircraft activity, so we clean the altitude data using median filtering.
To median filter, we first set a window size k = 5. For each altitude observation a t , we calculate the median of observations in the window [a t−2 , a t−1 , a t , a t+1 , a t+2 ]. If the altitude is different from the median then we replace it with the median.
To give some numeric examples, suppose we observe the following altitude vector for an aircraft A 1 = [10000,10100,10200,2000,10400,10500,10600]. There are 3 points that have enough neighbours to apply the median filter: a 3 , a 4 , a 5 .
The final vector A medf ilt 1 = [10000, 10100, 10100, 10200, 10400, 10500, 10600] is a more realistic take-off trajectory. Figure S2 depicts the impact of median filtering on 3 aircraft over a given day. The upper panel is an aircraft with fairly clean data, so filtering changes only 0.8% of observations. The middle and lower panels show aircraft with less clean data, so filtering changes 1.2% and 2.8% of their observations respectively.  Figure S2: Impact of median filtering on aircraft altitude profile. The left panel shows the raw data, and the right panel shows data after median filtering.

Flight-counting algorithm
The raw data contains over 25 billion messages. Initially, there is one row for each message containing, among other fields, the aircraft's latitude, longitude, altitude and timestamp. To estimate monthly airline performance, we do not need such a large volume of data. To facilitate our analysis, we therefore reduce the data to one row for each unique flight. This contains the aircraft's take-off time and location, landing time and location, aircraft ID and the airline.
Counting unique flights from the ADS-B messages is a non-trivial task. Firstly, each aircraft can make multiple flights per day, and there is a lot of variation depending on journey length.
We show the distribution of daily flight counts per aircraft in Figure S3.
There is a field in the ADS-B message identifying the aircraft, but no reliable field to identify separate flights. We therefore created an algorithm to identify separate flights from the raw altitude data, henceforth referred to as the flight-counting algorithm. The flight-counting algorithm uses the aircraft's altitude to identify the take-off and landing of each flight. It crawls through the time-ordered altitude observations and creates a take-off if the aircraft ascends above a certain threshold. Similarly, it creates a landing if the aircraft descends below a given threshold.
The below figures shows how the algorithm identifies take-offs and landings for given aircraft over the course of a day. The counting algorithm reduces the data from having a row for each ADS-B message to a row for each flight. We identify over 67 million flights this way. The upper panel shows that the range is wide: many aircraft make just one flight, but some make up to ten flights. Most of this variation is explained by journey length, which varies systematically by airline. The lower panels show that Easyjet aircraft, which fly short haul, make far more flights on average than Virgin Atlantic, which also fly long haul.
Second, the raw data is not very clean, as shown in Figure S2. This can reduce the flightcounting algorithm's accuracy. Figure S4 depicts the impact of cleaning the altitude data. The left panel shows counts carried out on the raw data, and we can see that they are often wrong.
The right panel shows more accurate counts carried out on the cleaned data.
Even after filtering, the altitude data still has occasional errors. To minimise their impact, we introduce some heuristics to the algorithm. If a take-off (landing) is recorded, we stipulate a lag of 30 minutes before the aircraft can land (take-off) again. We select this lag as a reasonable minimum journey time for commercial flights. Figure S5 illustrates how adding the lag heuristic makes the counting algorithm more robust to any residual noise in the data.
Finally, we set the landing altitude at 10,000 feet which is much higher than the take-off altitude of 1,000 feet. This is to minimise the impact of missing data on the algorithm accuracy. Figure S6 shows that higher landing thresholds are much less vulnerable to missing data, although they record landing times slightly too early. We set the take-off threshold to pick up as many take-offs as possible while accounting for the issue that some airports are above sea level. We would not want to record planes that are moving on the ground, while sending ADS-B messages, as take-offs.  Figure S4: The left panel shows the counting algorithm on the raw data for given aircraft over a day. Noise in the data causes the algorithm to identify false positives, so it counts too many flights. The right panel shows the counting algorithm operating on the same data after median filtering. Visual inspection suggests that the algorithm is more accurate after cleaning.   Regularising models for out-of-sample forecasts We cannot use a random train-test split to assess out-of-sample performance because time series data is not independently and identically distributed (i.i.d.). A random train-test split would put data in the training set that occurs after the data in the testing set. We would therefore use data from the future to fit a model that estimates the past, which would not be a valid measure of out-of-sample accuracy.
Instead, we use adaptive nowcasting to measure out-of-sample accuracy. For each period t in our dataset, we use periods ∈ [1, t − 1] as our training set. The trained model then estimates the flight volumes for each airline in period t. We record the mean absolute error (MAE) across airlines, and that is the test score for period t. Each time we increase t, we re-fit the model to add new training data (which is why we call it "adaptive"). This procedure only uses past data to predict the present, so we know performance is out-of-sample.
The models with airline and time dummies have many parameters. This could lead to overfitting, which would reduce out-of-sample performance. We therefore regularise our adaptive nowcast models using LASSO regression. Let β be the vector of parameters in a linear forecasting model for T time periods and N airlines: where y i,t is the flight count for airline i in period t, and X i,t is their feature vector. LASSO applies a linear penalty λ to the magnitude of each coefficient, which punishes more complex models. It also allows automatic variable selection as the linear penalty results in many zero parameters. The weight to penalise complexity is determined by λ. We tune using 5-fold crossvalidation across all data from period 1 to t − 1 to find the optimal values of λ and β. Next we record the tuned model's predictions for period t, and measure the error for each of the N airlines. The performance score in period t is the MAE across the N airlines. Figure S11 depicts the distribution of nowcast errors over time for both the baseline and ADS-B models.

Different dummy choices for nowcasting airline performance
In the main text, we display results using only airline dummies. We made this choice because fitting month dummies with a short time dimension is difficult. Suppose we are nowcasting for December 2016. Our training data would include July to November 2016. As we had no observations for December, we would not be able to include the month dummies in the predictive model. In this section we show our results are robust to all other choices of dummy variables. Table   S2 shows results from these choices. Adding ADS-B data reduces MAE in all specifications.
The reductions range from 20% to 26% for the UK, and 11% to 21% for the USA, which is consistent with the results in the main text.