Using aircraft location data to estimate current economic activity

Aviation is a key sector of the economy, contributing at least 3% to gross domestic product (GDP) in the UK and the US. Currently, airline performance statistics are published with a three month delay. However, aircraft now broadcast their location in real-time using the Automated Dependent Surveillance Broadcast system (ADS-B). In this paper, we analyse a global dataset of flights since July 2016. We first show that it is possible to accurately estimate airline flight volumes using ADS-B data, which is available immediately. Next, we demonstrate that real-time knowledge of flight volumes can be a leading indicator for aviation’s direct contribution to GDP in both the UK and the US. Using ADS-B data could therefore help move us towards real-time estimates of GDP, which would equip policymakers with the information to respond to shocks more quickly.


ADS-B data. We retrieve aircraft data from the ADS-B Exchange
. Commercial aircraft in Europe have been required to broadcast ADS-B data since 2017 21 , and it has been mandatory for US aircraft since January 2020 22 . These broadcasts include the aircraft's speed and location, specified as their altitude, latitude and longitude alongside a timestamp. Each ADS-B message also includes a six-digit hex identification code assigned to the aircraft by the International Civil Aviation Organisation (ICAO). Amongst other things, the ICAO code makes it possible to link an aircraft to its operating airline by looking up the ICAO code in a corresponding database. This pre-processing is carried out by ADS-B Exchange and the operating airline is included in each of the resulting ADS-B records.
ADS-B messages are unencrypted, in order to be receivable by other aircraft, which means they are available to anyone with an ADS-B receiver. The ADS-B Exchange collects data from thousands of receivers 20 . The resulting database covers global flight activity (Fig. 1). We analyse the period from July 2016 to December 2018. We note that coverage has improved over time, as the number of receivers feeding the database has grown (see Supplementary Fig. S1).
The raw data we analyse contains roughly 25 billion messages. First, we reduce this data from one row for each message to one row for each flight. Figure 2 shows how we identify take-offs and landings by analysing the altitude of an aircraft over time (see also   Published aviation statistics. Both the UK 16 and US 18 aviation authorities publish monthly airline statistics. They contain a range of performance indicators, such as flight volume and capacity utilisation, but are currently released with a three month delay. Figure 3 depicts the monthly percentage change in both the airline statistics and in flight volumes calculated using the ADS-B data. Visual inspection suggests there is a strong correlation. However, there is clear seasonality for both countries (see Supplementary Figs. S7 and S8). We account for this seasonality in our later analysis.
Finally, we collect economic data from the UK Office for National Statistics (ONS) 17 and US Bureau of Economic Analysis (BEA) 19 . Both the ONS and BEA publish a GDP series that is split by industry, from which we consider air transport. The UK series is a monthly time series dating back to 1997, and the US series is a quarterly time series dating back to 2005. When analysing these series, we consider key time series properties, such as stationarity, to avoid drawing misleading conclusions (see Supplementary Figs. S9 and S10).

Results
Estimating airline flight volume. For each airline, we aim to generate rapid estimates of flight volume across time. Some airlines are much larger than others. To ensure comparability between airlines, we therefore normalise the flight volume data by indexing the first period to each airline for 100. We then re-scale subsequent periods so they are measured relative to the first. An airline whose original flight counts were (5000, 6000, … 6500) would be normalised to (100, 120, … 130). A normalised flight volume of 120 reflects a flight volume 20% higher than the first period.
A reasonable baseline model would be an autoregressive (AR) model where we estimate normalised airline flight volumes with their own history: , 3 , where y i t , is the number of flights and ε i t , is a noise term for airline i in month t. Due to the three month publication lag for the official flight volume statistics, when nowcasting the flight volume for month t we only have official data from month t−3. The baseline therefore includes an AR(3) term, − y i t , 3 and β is the weight on the AR(3) term. We also derive binary ("dummy") variables from the longitudinal data structure. γ t are coefficients for dummy variables for each month (12 in total), which proxy for seasonality. A positive value for γ t would reflect that flight volumes are usually higher than average in month t. α i are coefficients for 28 airline-specific dummy variables, which capture the airline's average growth over time. A positive value of α i would reflect an increase in the mean flight volume for airline i across the time period.
To measure the performance boost from ADS-B data, we add this data to the baseline model. Denoting x i t , as the ADS-B flight count for airline i in period t, and δ as the weight on the ADS-B term:  Table 1 shows that adding ADS-B data boosts the in-sample accuracy of all baseline models, regardless of whether month and airline dummies are included (see Supplementary Table S1 for similar results when the model includes month dummies or airline dummies alone). This shows that ADS-B data can help estimate dynamic airline-specific changes in flight volume, and does not only proxy seasonality or differences in airline growth.
Our results so far suggest that ADS-B improves in-sample estimates of airline flight volume. However, in-sample scores may overstate true predictive accuracy. To assess out-of-sample performance, we use one-step-ahead adaptive nowcasting 23 , originally developed to help generate rapid indicators of flu incidence using Google search volume. For each month t in our dataset, we train the model with data from months ∈ − t [1,1]. We penalise the model's coefficients, to avoid overfitting, using LASSO regression with 5-fold cross-validation (see Supplementary Methods for regularising out-of-sample forecasts). The penalised model then estimates the test month t. We record the estimate for each airline, and use the mean absolute error (MAE) as the score for month t. As we move through the analysis period, thereby increasing t, we re-fit the model to add new training data (hence "adaptive"). This procedure only uses past data to predict the present, so we know performance is tested out-of-sample.  Table 1. Estimating airline flight volume: in-sample results. In-sample adjusted R 2 scores from models built to generate rapid estimates of airline flight volume. All models are unpenalised linear regressions. The baseline model is autoregressive: it estimates the change in each airline's monthly flights using the most recently available flight count statistics. The ADS-B model additionally includes the ADS-B estimate of each airline's monthly flights as a predictor. The simple model does not include any further predictors. The complex model includes binary variables for each airline and month, to capture seasonality and differences in airline growth across the period. ADS-B data boosts performance across all model specifications, including against the more complex baseline. This shows that ADS-B data can help estimate dynamic airline-specific changes in flight volume, and does not only proxy seasonality or differences in airline growth.   Supplementary Fig. S11. These results hold across a range of dummy specifications and training windows (see Supplementary Tables S2 and S3). Therefore ADS-B data also improves out-of-sample estimates of airline flight volume.
Estimating economic activity. We next analyse whether ADS-B data may help estimate aviation's direct contribution to GDP. Both the UK and US aviation GDP series are non-stationary based on Augmented Dickey-Fuller tests (UK: Dickey-Fuller = −1.4; US: Dickey-Fuller = −2.9; both ps > 0.05). Therefore their distributions are not constant over time, so we cannot use them for regression. Instead, we use the rolling annual percentage change in GDP, which deals effectively with both non-stationarity and seasonality (see Supplementary Figs. S7 to S10).
Our baseline specification for the annual percentage change in aviation's direct contribution to GDP ∆z t is Figure 5. Adaptive nowcasting of aviation's direct contribution to GDP. We build adaptive nowcasting models [23] to generate rapid estimates of aviation's direct contribution to GDP in the UK and the US. We investigate whether models enhanced with real-time flight volume data would deliver more accurate estimates than nowcasting models based on historic GDP data alone. Given a training window w, for each period ∈ The in-sample results are promising: adding ADS-B data boosts adjusted R 2 from 31% to 55% for the UK and 12% to 42% for the US. However, there are only 18 monthly periods for the UK and 6 quarterly periods for the US due to the limited time series of ADS-B data. These sample sizes are clearly too small to assess out-of-sample performance with an adaptive nowcasting model.
We previously showed that ADS-B data could help estimate official flight volumes. To obtain greater insight into whether ADS-B data can improve nowcasts of aviation's direct contribution to GDP, we therefore substitute the official airline flight volume series in place of the ADS-B data. The official airline series are available for the full period for which we have aviation GDP data for both the UK (from 1997) and the US (from 2005). We again use adaptive nowcasting, but with fixed training window lengths of 60 months for the UK and 8 quarters for the US (see Supplementary Table S6 for results using other training window lengths).  Supplementary Table S5). This may be because the baseline AR model performs worse during volatile economic times.

Discussion
We have assessed whether ADS-B data can help nowcast aviation statistics, which are currently published with a three month delay. We first show that ADS-B data can accurately estimate airline performance, as measured by their flight volume. Second, we show that real-time knowledge of flight volume is a leading indicator for aviation's direct contribution to GDP. We find that this indicator is of greatest value during volatile periods, such as the crises between 2008 and 2012. Crisis periods are when rapid estimates for GDP are most crucial for policymakers, as they must take decisions quickly. In certain crises, such as disease outbreaks, real-time information on flight volumes may also be important beyond the economic domain.
The main limitation of our analysis comes from the novelty of ADS-B data. We do not have a long enough time series to determine whether ADS-B data, which we only had access to from July 2016 onward, can directly nowcast GDP out-of-sample. Future work will have access to a longer ADS-B time series and could therefore better evaluate out-of-sample performance. Continued monitoring of ADS-B data will also be important in case its value as an economic indicator changes. For example, airlines who knew ADS-B data were being used to assess their performance may instruct pilots to fly differently. This however seems unlikely given the probable costs of flying more erratically. ADS-B data is therefore likely to be less prone to manipulation than other nowcasting data, such as internet search activity.
Finally, our analysis is restricted to aviation which comprises only 3-5% of GDP in total, including indirect contributions which are not analysed here. However, our methods could be extended to other sectors of the economy where data is shared at a similar level of granularity. As the availability of real-time data increases, we could develop more accurate estimates of a large enough number of economic sectors to build a complete, real-time picture of the economy. In turn, policymakers would be able to respond more effectively to future crises.

Data availability
This study was a re-analysis of existing datasets that are publicly available at the locations described in the Methods section.