Background & Summary

The classic approach for estimating trade flows in the literature starts from the largest spatial scale - the global trade networks - studied through Multi-Regional Input-Output (MRIO) tables and global trade-flow databases. However, this coarse level of analysis does not provide direct insights about the domestic and cross-border trade flows at the sub-national level1,2. This situation creates a gap between the information that describes the structure of the sub-national economy, and the top-down information of Inter-Country Input-Output (ICIO) tables used to understand these regional economic characteristics. Constructing a finer-grade representation for the scale and channels through which each regional industry interacts with the others is one way to address this issue. This information can provide a detailed description of the linkages between regions and sectors, along with their implications for a broad range of societal, economic and ecological repercussions. For example, an understanding of the criticality of a region for domestic and global supply chains can help us prevent or mitigate the impact of future disruptions, predict regional demand with labor or demographic mobility and trace its trade-flow environmental footprint.

Despite these benefits, the existing global input-output databases like WIOD3, OECD-ICIO, EXIOBASE4 ESA FIGARO5 and Eora6 rarely provide information for trade flows at the sub-national level. To address this, researchers have attempted to construct datasets that provide estimates of the trade flows for each region. For example, there is a comprehensive database for an EU MRIO at the NUTS-2 level (Nomenclature of Territorial Units for Statistics, level 2) covering the period 2000–20107, and a further dataset with estimated EU regional trade flows for 2013 only8. While these are both credible efforts, these datasets have not been updated, and the changes in regional classifications over time (due to mergers across NUTS-2 regions and redefinitions) have made some of the findings less relevant for policy-makers9.

A regional MRIO table would need to include data on intermediate goods used by firms in different sectors or the goods consumed by households, however such data are not readily available at that level. Moreover, regional information about exports and imports is also missing in most cases, as national statistical authorities neglect them and regional producers can not easily build a comprehensive dataset themselves. Statistical agencies and policy-makers often turn to surveys to fill the existing secondary data gaps which are often unrepresentative and expensive10,11. Therefore regional table construction activities have shifted away from survey-based tables to datasets based on the so-called non-survey or hybrid (partial survey) methods12. The most adopted non-survey methods are Location Quotient methods (LQ)13,14,15, in which regional input-output tables are measured by the sectoral employment distribution, and adjusted national input-output tables by means of regional location coefficients. Other established methods include the commodity balance method (CB)16,17, GRAS18, the cross-entropy method (CE)8 and the cross-hauling adjusted regionalization method (CHARM)19. Methods that include the use of regional information collected from surveys (hybrid methods or partial-survey methods), follow a procedure very similar to the ones described above. In particular, they only substitute the modification of national coefficients on LQ measurements with estimates based on information collected through the survey.

In the MRIO we provide in this paper, there are 272 regions and 10 sectors. The regions are classified at NUTS-2 level, and the sectors are classified at the statistical classification of economic activities in the European Community (NACE) level 1. For each year, we estimate inter-region and inter-sector trade data. Using existing data we are able to provide these estimates for the period 2008–2018.

Methods

In this study, we combine survey and non-survey methods to construct the database. There are 3 steps involved in constructing MRIO tables: (1) Estimating marginal accounts for each NUTS2 region; (2) Estimating regional IO values based on marginal accounts and the IO coefficients; (3) Estimating the inter-regional trade matrix. Table 1 lists the inputs used in each of these steps.

Table 1 Data list and their sources.

The MRIO tables at NUTS-2 level could be regarded as linking region SRIO tables (colored in yellow) together with trade matrices (colored in green) in Fig. 1. The MRIO tables at NUTS-2 level were constructed by a hybrid method, which combines the micro transport survey data and the modelled outcomes. SRIO tables are produced using National IO tables and Eurostat regional accounts. Trade flows between regions are estimated from road freight flows20 that are anchored to the 20138 trade data. In this study, the cross-entropy approach is employed to ensure maximum similarity between the target and the prior distribution.

Fig. 1
figure 1

A MRIO at NUTS-2 level.

Estimating marginal accounts for each region

There is a Single Region Input Output (SRIO) table in Fig. 1, whose regional accounts including taxes less subsides, value added, imports and final demand need to be disaggregated from the national level using the commodity balance approach. The reference relationship is shown in Table 2. Regional gross value added was used to disaggregate taxes less subsides, value added and output for each region by sector. Regional income statistics were used to distribute the demand categories (household demand and government demand) over regions, which includes Household Final Consumption Expenditure (HFCE), Non-Profit Institutions Serving Households (NPISH) and General government Final Consumption (GGFC). Gross capital formation is divided into three items: gross fixed capital formation (GFCF), changes in inventories and changes in valuables (INVNT)7. The formula used for estimating regional accounts is the following:

$${X}_{r}={X}_{n}\cdot \frac{{I}_{r}}{{\sum }_{r\in {S}_{n}}{I}_{r}}$$
(1)

where Xr is the element of the SRIO for region r, Xn is the corresponding element of the national IO table of country n, Ir is the used indicator, Sn is the set of NUTS-2 regions of country n.

Table 2 Disaggregating regional accounts.

Estimating regional input-output table based on marginal accounts and input-output coefficients

Once the regional accounts are confirmed, intermediate demands are derived from Eq. 2.

$${Z}_{j}^{r}={X}_{j}^{r}-TAXSU{B}_{j}^{r}-V{A}_{j}^{r}$$
(2)

The Location Quotient (LQ) approach is used to add heterogeneity when estimating SRIO. According to the literature, the LQ approach is based on the assumption that regional and national technologies are identical and that regional trade coefficients differ from the national input coefficients based on their respective labor inputs. The LQ is defined by the following equation:

$$L{Q}_{ri}=\frac{em{p}_{ri}/{\sum }_{i}em{p}_{ri}}{{\sum }_{r}em{p}_{ri}/{\sum }_{r,i}em{p}_{ri}}$$
(3)

with empri indicating regional employment of region r in industry i. LQri describes the relative significance of regional employment in industry i compared to the national employment level in the same industry. If LQri≥1, it is assumed that the region is specialized in industry i. This implies that the regional industry can meet the regional demand requirements for its goods or services and therefore the regional coefficient is assumed to be equal to the national coefficient. However, if LQ < 1, it is assumed that the regional specialization is lower than the national average14:

$${\widehat{Z}}_{ij}^{r}=\left(\begin{array}{cc}L{Q}_{ri}\times {a}_{ij}^{nation}\times {X}_{j}^{r} & L{Q}_{ri} < 1\\ {a}_{ij}^{nation}\times {X}_{j}^{r} & L{Q}_{ri}\ge 1\end{array}\right.$$
(4)

where \({\widehat{Z}}^{n}\) is the preliminary intermediate transaction matrix from sector i to sector j of region n. The hat accent indicates a preliminary variable; \({a}_{ij}^{nation}\) is the national technical coefficient from sector i to sector j.

By means of these modifications to national technical coefficients, new coefficients should represent intermediate demands produced locally within the region.

Once intermediate demands and regional accounts are established through the above steps, we balance the SRIO with the commodity balance method. That is, when a region has a surplus supply, it is expected to export to other regions or countries, and when a region’s demand cannot be met by itself, it is expected to import from other regions or countries. This could be described as the following equations:

$$e{x}_{i}^{r}=max({X}_{i}^{r}-{\sum }_{j}{Z}_{ij}^{r},0)$$
(5)
$$i{m}_{j}^{r}=max({X}_{j}^{r}-{\sum }_{i}{Z}_{ij}^{r},0)$$
(6)

After this step, we derive 272 balanced SRIOs.

Estimating inter-regional trade matrix

The essence of estimating trade pattern is to transfer the probability between regions and sectors. We apply regional road freight flow data and trade data within EU regions in 2013 to build a prior distribution, and use the cross-entropy method to minimise the difference between them. We apply regional freight transportation data with sector information to estimate the share of freight flows between regions8,19.

$$\begin{array}{c}{\rm{m}}{\rm{i}}{\rm{n}}{\rm{C}}({p}_{ij};{q}_{ij})=\sum _{i}\sum _{j}{p}_{ij}\cdot {\rm{l}}{\rm{n}}\left(\frac{{p}_{ij}}{{q}_{ij}}\right)\\ s.t.\sum _{{\rm{i}}}\sum _{j}{p}_{ij}=1\\ \sum _{i}{p}_{ij}\times v={\rm{I}}{{\rm{M}}}_{{\rm{j}}}\\ \sum _{j}{p}_{ij}\times v={{\rm{EX}}}_{i}\end{array}$$
(7)

where qij is the prior distribution gained from previous MRIO works, and pij are the estimates we are after and v is the total trade volume within regions available from the OECD-ICIOs.

We want to minimise the cross entropy distance between two distributions, while three constraints need to be satisfied: (1) The sum for whole elements should equal to 1; (2)/(3) The regional imports/exports have to satisfy the marginal trade derived from SRIO tables.

Data Records

All input data and the output dataset are available on Zenodo21.

The data set of the European Multi Regional Input Output Data for 2008–2018 contains 11 data files for each year in XLSX format. Each file contains transactions between sectors within regions, as green and yellow segments show in Fig. 1. Figure 1 presents the structure of the environmental data for each year by region and sector. Each matrix includes 272 regions (deposited in Zenodo) and 10 sectors (Table 3). In total, 10 matrices are included in the database. The measured units for all environmental data are million dollars($). The metadata information for the datasets including abbreviations of regions, countries, sectors, acronyms of variables can be found in “Metadata” deposited at Zenodo Tables 4,5.

Table 3 Data records description for 10 NACE-1 sectors.
Table 4 Data records description for 28 EU countries.
Table 5 Concordance of sectors for Eurostat and ICIO.

For some of the input data included in Zenodo, we have made adaptations to match their classification on regions or sectors with the final results. Specifically, there is no regional gross value added data for the UK in Eurostat statistics because of Brexit. We use statistics from the UK Office for National Statistics as an alternative, and add their mapping on our NACE-1 sectors. This information is stored as “UK rgva.xlsx” in the “Regional account” folder. For the NUTS2 regional trade flows, which are not publicly accessible, we place them as “Trade Data EU 2013 ref.xlsx” under “REGIO” folder.

For the technical validation, we use regional input-output table estimations provided by local governments. For Austria, we requested the data from the authors of estimation12 and placed them in the “Austria” folder. For Finland, we relabelled the region names in NUTS-2 codes for each sheet in “io reg 2014.xlsx” and placed them in the “Finland” folder. For Scotland, we relabelled the sector names at NACE-1 level as “Scotland 2008.xlsx” and placed them in the “Scotland” folder Fig. 2.

Fig. 2
figure 2

Regionalisation and commodity balance.

Technical Validation

To validate the MRIO we derive, we use data from the most adopted MRIO tables (PBL-MRIO, hereafter)7,8 at NUTS2 level. Following previous work in the MRIO literature, three indicators are used in this process including mean absolute deviation (MAD), the Isard-Romanoff similarity index (DSIM), and Pearson correlation. MAD measures the absolute distance between each element in the two matrices and DSIM measures the relative distance16.

$$MAD=\frac{1}{R\times S}\mathop{\sum }\limits_{r}^{R}\mathop{\sum }\limits_{s}^{S}\left|\hat{{z}_{ij}^{rs}}-{z}_{ij}^{rs}\right|$$
(8)
$$DSIM=\frac{1}{R\times S}\mathop{\sum }\limits_{r}^{R}\mathop{\sum }\limits_{s}^{S}\frac{\left|\hat{{z}_{ij}^{rs}}-{z}_{ij}^{rs}\right|}{\left|\hat{{z}_{ij}^{rs}}\right|+\left|{z}_{ij}^{rs}\right|}$$
(9)

where r, s indicates regions, and i, j indicates sectors.

Table 6 provides the comparison across 10 sectors. It turns out the MADa and DSIM are relatively small and highly significant (p = 0.000) with a linear correlation approximately 0.8.

Table 6 Comparison between two datasets by sector with 95% significance.

Given that the existing REGIO tables overlap with our data only for the period 2008–2010, we compare our trade flow results (MRIO) with REGIO for these three years by using their SRIO tables in Table 7 by Eqs. 8, 9 and Pearson’s Correlation. Since the sectors do not directly match across the two datasets, we reclassify sectors in both datasets into 7 sectors. Overall, DISM is less than 0.5 and the correlation for Finland, Latvia, Netherlands, Malta, Slovakia and UK are around 0.4 and statistically significant.

Table 7 Comparison between EU REGIO and MRIO by country.

Further, we use Regional IO tables for specific countries based on surveys conducted locally as the ground truth with which we compare our data and the REGIO ones. The actual trade-flow data cover SRIOs for regions in Austria12, Finland (https://github.com/pttry/alta) and Scotland (https://www.gov.scot/publications/input-output-latest/) as we could not find readily available trade-flows for more countries. For MAD, both MRIO and REGIO have large differences with the ground truth. For DISM, both MRIO and REGIO capture around 30% similarity of the actual data. For Pearson’s correlation, MRIO has 0.36 for regions in Austria, and over 0.9 for regions in Finland and Scotland (Table 8).

Table 8 Comparison based on ground truth.

Following the best practices in the MRIO literature22, we further provide our estimates for the relative standard error (RSE) for our MRIO data and the ground truth (i.e. the original survey data from local governments). To compute these, we employ a Monte Carlo sensitivity analysis on the MRIOs and the ground truth under three standard deviation (σ) scenarios. Specifically, we generate a vector of emissions intensities randomly from these datasets as we found no empirical data to compare them with. Then we introduce the stressor F = Xs. X, Y, Z, A are known from the datasets. Then at each round, we add a perturbation EZ~N(0, σz), EF~N(0, σF), EY~N(0, σY) on Z, F, Y, and get the outcome C = s(IA)−1Y. After 1000 simulations for each case, we collect the population of C results. From these we obtain the relative standard error (RSE) in Table 9. Here MRIO refers to results from our estimated datasets, while “Ground truth” refers to results from the local surveys. The mean average of RSE is in most cases much smaller than 10% in the table.

Table 9 Sensitivity analysis with Monte Carlo simulation.