Introduction

In the UK, as in many other countries, housing is a significant source of public and political discussion. Between 1997 and 2016, the UK housing affordability ratio, a measure calculated by dividing house prices by earnings, increased from 3.6 to 7.6 (ONS, 2017b). The same research showed that in some parts of London, such as Kensington and Chelsea, the ratio is over 25. This rapid increase in prices relative to earnings is popularly referred to as a “crisis” and has been covered extensively in the British press (Elledge, 2017; Financial Times, 2017; Orr, 2017; Hawkes Steve, 2017; H. Jones, 2017; Steel, 2017; Chan Ping, 2017).

One theory explaining the increase in the affordability ratio is housing financialisation and speculation. These two closely related concepts propose that housing is used as a financial tool rather than as a place to live. Housing financialisation has been discussed in depth in the literature by M. Aalbers (2016), Madden and Marcuse (2016), and Farha (2017), who discuss it in terms of political-economy, sociology and human rights, respectively. Other authors have considered affordability from the point of view of housing policy researchers (Rogers and Koh, 2017), and of course, economists discuss the concept of speculation (Yusupova et al., 2016). Financialisation/speculation is most evident in global cities such as London, New York, Hong Kong and Vancouver. In these cities, house prices have risen over 50% since 2011 (Farha, 2017). Research based on four decades of UK housing market data (Yusupova et al., 2016) concludes that housing speculation influences the market price.

A simulation based on US metropolitan areas (Favilukis and Van Nieuwerburgh, 2017) found that “out-of-town” purchasers increase house prices and decrease residents’ welfare. However, the same simulation also showed that the introduction of a tax turned these losses into welfare gains. This finding supports the actions of several governments which have introduced a property tax or other tax explicitly aimed at speculation (Heywood and Hackett, 2013; Government of Ontario, 2018; Government of British Columbia, 2017).

A specific example of such a tax is by-law 11,674 introduced by the city of Vancouver in 2018 (City of Vancouver, 2017). The so-called Empty Homes Tax of 1% of assessed taxable value has been applied to domestic properties with no regular occupant. This tax, which came into force in 2018, is expected to generate CAD$30 million in its first year of operation from approximately 3300 properties (City of Vancouver 2017). The effort is an attempt to reduce rental costs in the city by encouraging owners to rent out their empty property. However, the long-term effects of this tax are still unknown.

Low-use property in England and Wales

The London housing market is seen as a very attractive investment to foreign buyers (Fernandez et al., 2016). The impact of foreign ownership on the London housing market has been discussed by think tanks (Valentine, 2015; Green and Bentley, 2014; Heywood and Hackett, 2013) and government bodies (Q. Marshall, 2015; Wilson and Barton, 2016). However, as there is little quantifiable evidence, it remains difficult to build a clear picture.

The findings of academic research are split. Work commissioned by the office of the mayor of London (Scanlon et al., 2017; Wallace et al., 2017) found that foreign ownership is unlikely to affect property price, although foreign-owned properties were less likely to be permanently occupied. However, other authors argue that the asset value of the property is the driver of London's housing market (Gallent et al., 2017) and that investment properties increase property prices overall (Fernandez et al., 2016). The effect of international capital on house prices is discussed by Badarinza and Ramadorai (2018). They found that in London, house prices rise in areas favoured by investors from specific countries when that country experiences risk shocks. The authors describe this as a “flight to safety” effect and they note that global cities such as London, New York and Singapore are particularly vulnerable, supporting the findings of Fernandez et al. (2016). Building on previous work, Sá (2016) used a dataset of all property purchases by foreign-owned companies, allowing her to directly measure the impact of foreign investment on the housing market in England and Wales. She found that foreign investment increases house prices and that there is a trickle-down effect to less expensive properties; however, she did not find evidence that more homes are left empty in areas with more foreign buyers.

Whilst in London foreign ownership is perceived as a problem, in rural areas, second home ownership by British citizens is seen as the main issue. Second homes in tourist hot-spots such as Cornwall, the Lake District and Northern Wales can cause tensions that result in local responses such as the referenda banning second homes in several parts of Cornwall (BBC, 2017; Wilkinson, 2017; Hards, 2018) and taxes on second homes in Wales (Welsh Government, 2016). Tensions between local residents and out-of-town owners occur when local residents perceive negative effects of high levels of unoccupied property as outweighing the positive effects. A substantial body of research discusses whether second homes have a net positive or negative impact on local communities. This research dates back to the 1970s (Coppock, 1977), and a recent analysis found that the same themes that were discussed then are still important now (Müller and Hoogendoorn, 2013). Understanding the impact of second homes is complex, which makes developing mitigation strategies for negative effects difficult (Barnett, 2013). Researchers also point to lack of data as hampering work (Oxley et al., 2008). Other researchers discuss the impact in cultural rather than economic terms, describing the concept of “bridging social capital” (Gallent, 2014). Several researchers find defining the terms “second home” and “holiday home” difficult (Paris, 2009; Wallace and Bevan et al., 2005; Oxley et al., 2008). The phrase “irregularly occupied dwellings” has been proposed (Wallace and Bevan et al., 2005) to provide both a name and a definition of the concept.

The phrase “leisure-related investment” is used (Paris, 2009) to describe the spectrum of reasons that people own multiple properties, from purely leisure to purely investment. This term is useful as it discourages thinking about properties in binary terms.

Banning second homes can have indirect consequences. A study of a Swiss village (Hilber et al., 2016) found that after banning the construction of second homes, the price of primary residences fell by 12% due to the subsequently weaker local economy. However, in that study, properties used for holiday homes were distinct from properties used for primary residences, which is not the case in England and Wales.

Supply and demand in house prices

In classical economics the price of goods is controlled by supply and demand; this concept, although effectively as old as trade, was brought to the mainstream by the book The Principles of Economics (A. Marshall, 1890). Unlike some other areas such as commodities, there is no single housing market for an entire country; rather, there is a patchwork of housing markets. In the UK, the concept has been formalised using the term housing market area (HMA) (C. Jones et al., 2010). These HMAs are affected by local commuter patterns, house prices and migration patterns. This local market structure of housing means that supply and demand bottlenecks do not necessarily occur at the national level but at the local level. Because HMAs can cross local government boundaries, they can cause difficulties for strategic planning of housing, although they can also provide greater insight into local housing issues (Hincks and Baker, 2013). The local supply-and-demand effect was seen in a recent study (Sá, 2016) measuring the impact of foreign investment in housing in different parts of London. This effect is supported by earlier work (Guerrieri et al., 2013), which found that people who cannot live in certain areas due to price will prefer a cheaper adjacent area, thus bidding up prices and causing the original inhabitants to leave. A recent literature review (Diappi, 2013) found that supply and demand was included in a large number of house pricing models, including the popular hedonic model. Local markets and the idea of the HMA are important as there can be a small number of vacant properties at the national level that, if concentrated, can cause supply issues at the local level. Research has also shown that house pricing is affected by geographically local factors, including population income (Follain and Jimenez, 1985; Xiao, 2017). A relationship between earnings and house prices has been shown in areas of tight supply (Hilber et al., 2016). Several authors suggest that looser planning regulations would reduce prices by allowing an increased supply of housing (Gyourko and Molloy, 2015; Paciorek, 2013; Malpezzi and Wachter, 2015; Hilber et al., 2016). An increase in housing supply would in theory decrease the cost of housing, but only so long as the demand is driven only in part or not at all by a speculative element.

Research question

In this paper, we ask the following questions: Can we quantify the relationship between low-use property percentage and housing affordability? Can we identify areas where low-use property is more expensive than property occupied by a permanent resident? By answering these questions, we hope to develop the debate around low-use properties by providing a quantitative method to distinguish areas which could be more responsive to increasing supply or decreasing demand as a way to reduce house prices.

Method

In this paper, we call all domestic properties “properties”. The subset of properties without a permanent resident we call ‘low-use properties” (LUPs). The word “property” is used to separate LUPs from the emotional connotations of “home”, as described by Paris (2009). This separation is important as we do not know where a LUP sits on the spectrum between a home and a purely commercial investment. Properties with a permanent resident will be referred to as “homes”. The previous definitions indicate that homes + LUPS = properties.

Two common threads in the literature on both foreign investment and second homes are the difficulty of defining the properties of interest and the lack of data about which properties are empty. We solve the problem of lack of data by utilising the Freedom of Information (FOI) ACT (ICO, 2017) to request data on council tax from local authorities. These FOI combine to form a unique and detailed picture of housing use in England and Wales. Local authorities are a form of local government used in England and Wales, and council tax is a form of property tax used to pay for local services.

We take a pragmatic approach to defining a LUP. We use the council classes provided by local authorities to determine whether a given house is a LUP. Broadly, a LUP is a property that is not registered as the primary residence of any individual. However, this definition comes with caveats, as we include only data that falls under council tax “discounts” and not “exemptions”. Council tax ‘exemptions” cover situations such as the recent death of the resident, incarceration of the resident and so on. Discounts cover situations such as properties empty for longer than six months, second homes and properties that are under renovation. The discounts class implies a degree of agency absent from the exemptions class.

In this paper, we do not distinguish between local and foreign buyers as this is not possible given our dataset (see Section ‘Obtaining data'). We view both local and foreign owners as “out-of-town”, as was used by Favilukis and Van Nieuwerburgh (2017). Although local and foreign owners may have different drivers and occupy a different parts of the spectrum between a long-term commitment to a property and community or a purely commercial investment, we are interested in the overall relationship between LUPs and affordability and tourism, independent of owner type.

Of note, after April 2013, councils were no longer obliged to give a discount on council tax to LUPs (National Archives, 2012), so registration of LUPs may have decreased. This change in the law means that LUP percentages recorded by the local authorities may not reflect the true LUP levels. However, this does not affect whether we class a local authority as having high-value LUPS, as our definition depends on the distribution, not the absolute level (see Section ‘Types of low-use property').

We use three different geographical units, the lower super output area (LSOA), the middle super output area (MSOA) and the local authority. These three geographic units and their boundaries are defined and specified by the Office of National Statistics for census data and other demographic analysis.

An LSOA contains a minimum of 400 properties (ONS, 2018a) and is the smallest publishable geographical unit for which it is possible to obtain council tax data. LSOA are themselves constructed of output areas (OA). OAs are constructed using adjacent postcodes to be as socially and demographically homogeneous as possible (ONS, 2015). This homogeneity extends to dwelling type and tenure and prevents splitting along an urban/rural boundary. Contiguous blocks of LSOA form an MSOA. Some data such as local income statistics are only gathered at the MSOA level. The final geographic group, the local authority, is made up of one or more MSOAs. The local authority provides the FOI data.

Obtaining data

This paper uses exclusively publicly available data provided by various branches of government in England and Wales; all data is available under an open government licence (OGL, 2015). The low-use data was obtained through FOI requests (ICO, 2017); data was received between January and September 2017 from 112 local authorities. This data is available at in an open data repository (Bourne 2018). Previous to the collection of this dataset, vacancy data was only available through the annual long-term vacants dataset from the Ministry of Housing Communities and Local Government (for Communities and Local Government, 2017) or through the Census. The method used to create this dataset allows detailed demographic data to be collected from local authorities at much more regular intervals.

The local authorities were selected to cover the different regions of England and Wales, including both rural and urban areas, as well as areas popular with tourists and those that are not. London regions are over-represented due to the focus on London as a hub of foreign investment, although the remainder of South East England is under-represented. The East Midlands is the only region not present in the dataset. Data was requested from approximately 120 local authorities. Those few that were not included in this analysis either lacked the ability to return the data or did not return data of sufficient quality in a timely manner, necessitating their exclusion from the study before data was received.

Geographical data is from the Office of National Statistics (ONS) (ONS, 2010a; ONS, 2017a; ONS, 2017c; ONS, 2016a; ONS, 2018b). Data on price paid for houses was obtained from the Land Registry for 2003–2007 and 2013–2017 (Land Registry Open Data, 2016). Local area income estimates and population estimates are from the ONS (ONS, 2016b; ONS, 2010b). Company data and data on homes per local authority are from the Valuation Office Agency (VOA, 2018; Valuation Office Agency, 2016). Local authority council tax income data was obtained from the Ministry of Housing Communities and Local Government (of Housing Communities and Local Government, 2017) and StatsWales (StatsWales, 2018).

Types of low-use property

In England and Wales, areas with a shrinking population have typically been affected by financial issues caused by deindustrialisation and associated loss of jobs (Rieniets, 2009). These areas then enter a cycle of district decline (Accordino and Johnson, 2000; Han, 2014), further exacerbating the problem. As such, we expect to see high levels of LUPS in areas that are very affordable due to low demand for housing. However, it should be noted that the number of cities experiencing population declines has become very small since 2000 (Pike et al., 2016). As we saw previously, Sá (2016) proposed that foreign investment disproportionately affects the upper end of the market. The work of Badarinza and Ramadorai (2018) suggests that foreign buyers will cluster together in certain areas. Even if we do not take explicit account of foreign buyers, Guerrieri et al., (2013) found that wealthy prospective property owners tend to want to buy near other wealthy property owners. Taking further inspiration from the literature, we note that second homes bought as luxury items tend to be in scenic or otherwise desirable locations. We therefore hypothesise that we will also see more LUPs at the least affordable end of the market.

Considering both ends of the affordability scale, we expect the number of low-value LUPs to decrease as the affordability ratio increases and the number of high-value LUPs to do the opposite. Together, the low-value LUPs and the high-value LUPs should result in a U-shaped curve, as illustrated by Fig. 1.

Fig. 1
figure 1

Hypothetical LUP vs. affordability curves. Low-value LUPs decrease as the affordability ratio increases, whilst high-value LUPs increase as the affordability ratio increases

Within a single local authority, we assume that the affordability ratio is a proxy for property quality and desirability, although this may not be the case at the national level. As such, when the median price of a LUP is higher than the median price of homes in the local authority, we call the area a high-value LUP area. This is interesting as it means that at least 50% of LUPs are more expensive than at least 50% of the homes; we can then say that the LUPs tend to be more desirable than homes.

Finding the mean value of homes and LUPs

To find the difference in price between homes and LUPs, we use a simple graphical model, with the following attributes:

  1. 1.

    C: the price of each property in the local authority

  2. 2.

    W: the LSOA within the local authority

  3. 3.

    T: the type of property, either LUP or home

The dependency structure of these variables is shown in Fig. 2. The belief network tells us that the distribution of property prices is dependent on knowing the LSOA that the property is in. The LSOA distribution is dependent on knowing whether the property is a home or a LUP. The belief network has this structure because the property types have separate distributions and knowing the property type tells us the distribution over the LSOA, which then tells us the price distribution within each LSOA.

Fig. 2
figure 2

Belief network of the variables. The nodes are price (C), LSOA (W) and property type (T)

We create the joint probability P(T, W, C) using the factorisation shown in Eq. (1) and represented in the belief network. From the data, we know the values of P(T) and P(W | T). However, we do not know the price of each property in the local authority and so do not know P(C | W). However, we assume that the distribution of the value of property sales in W is approximately the same as the distribution of the property values such that \(P(C|W) \approx P(\widehat {C|W})\). This assumption allows us to calculate the joint distribution using empirical data. We can then derive the mean value of homes and LUPs in each local authority, as shown in Eqs. (15). When calculating the mean, Cy is the price of the yth property.

Within an LSOA, the mean price of a LUP and a home are the same, so to find the price difference at the local authority level, we effectively look at geographic concentrations of property types across the local authority. This crucial detail is shown in Eq. (4), where P(C | T = H) = P(C | T = LUP) is true when P(W | T = H) = P(W | T = LUP). In other words, in any local authority, the price of a home and a LUP are equal when they have the same distribution across that local authority.

$$P(T,W,C) = P(C|W)P(W|T)P(T)$$
(1)
$$\mathop {\sum}\limits_W {\kern 1pt} P(T,W,C) = \mathop {\sum}\limits_W {\kern 1pt} P(C|W)P(W|T)P(T)$$
(2)
$$P(T,C) = \mathop {\sum}\limits_W {\kern 1pt} P(C|W)P(W|T)P(T)$$
(3)
$$P(C|T) = \mathop {\sum}\limits_W {\kern 1pt} P(C|W)P(W|T)$$
(4)
$$\left\langle C \right\rangle _{T = i} = \mathop {\sum}\limits_y {\kern 1pt} C_yP(C_y|T = i)$$
(5)

While the theory provides the basis for the sampling, in practice it is more involved than the previous equations would suggest. As we wish to calculate the local authority median, we will sample each LSOA nQ,w times with replacement, where nQ,w is the total number of properties in the LSOA. This will create the vector of the empirical sample of property prices \(\widehat {\bf{q}}\) defined in Eq. (6), where \(\widehat {\bf{C}}_w\) is the observed set of sales prices in the LSOA and nQ,w is the number of homes in the LSOA. The total empirical vector of prices is then \(\widehat {\bf{q}} = \left\{ {\widehat {\bf{q}}_1,\widehat {\bf{q}}_2 \ldots \widehat {\bf{q}}_m} \right\}\), where m is the total number of LSOA in the local authority. The LUPs of any given area are a subset of the properties in that area, i.e., LUP Q. This means the vector of house prices that make up the empirical sample of LUPs is also a subset of the properties vector (see the definition in Eq. (7)). However, while the properties are sampled with replacement from the sales price data, the LUPs are sampled without replacement from the properties vector \(\widehat {\bf{q}}\). The homes are the complement of the LUPs shown in Eq. (8). We can then calculate the mean and median values for properties, LUPs and homes while ensuring that LUPs and homes are a subset of the sample of property prices. We repeat the sampling process 501 times to find the distribution of the sample mean and median. This process is similar to bootstrapping (Efron and Tibshirani, 1993).

$$\widehat {\bf{q}}_w = \left\{ {x \in \widehat {\bf{C}}_w} \right\},\;\# \widehat {\bf{q}}_w = n_{Q_w},\;0 < n_{Q_w}$$
(6)
$$\widehat {{\bf{LUP}}}_w = \left\{ {x \in \widehat {\bf{q}}_w} \right\},\;\# \widehat {{\bf{LUP}}}_w = n_{{\mathrm{LUP}}_w},\;0 \le n_{{\mathrm{LUP}}_w} \le n_{Q_w}$$
(7)
$$\widehat {\bf{h}}_w = \left\{ {x \in \widehat {\bf{q}}_w|x \notin \widehat {{\bf{LUP}}}_w} \right\}$$
(8)

The Vancouver tax

We explore how much tax would be generated in the considered areas by implementing the same 1% value tax as was enacted in Vancouver, Canada (City of Vancouver, 2017). The tax will be compared to the local authorities' total income from domestic council tax.

Models

The data is too noisy to create regression models, so we focus on producing a binary classification. This classification is used for three dependent variables, as follows:

  1. 1.

    High-LUP percentage: The percentage of LUPs in the area is in the top 50% of areas.

  2. 2.

    High-value: The LUP median price is higher than the median price of homes.

  3. 3.

    High-value-high-LUP: The LUP median price is higher than the median price of homes, and the percentage of LUPs in the area is in the top 50% of areas (abbreviated to HVHL).

In this analysis, we use logistic regression, as shown in Eq. (9). Logistic regression is a linear method that allows easy interpretation of the relationship between the coefficients and the outcome variable.

$$Pr(Y_i = 1|X_i) = \frac{{exp\left( {\beta _0 + \beta _1X_i + \beta _2X_2 + \beta _3X_3} \right.}}{{1 + exp(\beta _0 + \beta _1X_i + \beta _2X_2 + \beta _3X_3)}}$$
(9)

As there are only 112 observations at the local authority level, the number of variables that can be used without overfitting is limited. We use three independent variables to predict the three dependent variables. The independent variables are A (affordability ratio), A2 (the centred squared affordability ratio) and T (the tourism density). The formulae that will be trialled in the logistic regression are shown in Table 1, where y is the dependent variable. The models are tested using 20 repetitions of a 5-fold cross validation for 100 models in total per formula per dependent variable.

Table 1 Formulae to be used in the logistic regression

The affordability ratio is smilar to the price-to-earnings ratio. The higher the affordability ratio is, the more years it will take to earn the average property price for the average resident. The affordability ratio takes into account findings (Hilber et al., 2016; Follain and Jimenez, 1985; Xiao, 2017) that property prices are related to income. As LUP purchasers do not usually live in the same area as their LUPs, properties should become less affordable (affordability ratio increases) with more high-value LUPs. Tourism density is the number of companies in the local authority registered as guest houses or hotels per 1000 homes.

Estimating the number of people living in high-value areas

We wish to answer the following question: What fraction of the population lives in a high-value or HVHL area? As we can only infer the median difference between LUPs and properties through resampling, we can at best find an estimated range for the fraction of the population living in such areas. For the local authorities for which we have data, we take the probability of being a high-value area as the fraction of times that the local authority was high-value across the 501 resamples. We assume that the probability of an area being accurately classed as a high-value/low-value area is equal to the accuracy of the model. Thus, if an area is classed as high-value and the model has an accuracy of p, then the probability of being high-value is p, and if the area is classed as not being high-value, it is actually high-value with a probability of 1−p. Thus we will have a vector of the probabilities of high-value local authorities and another vector of the probabilities of HVHL local authorities across the whole of England and Wales. These vectors will be sampled 1000 times each to get the expected range within which the true population fraction exists.

Results

Data was collected from 112 local authorities, representing 32% of total local authorities in England and Wales. The data covers a population of 23.2 million, which is 40% of the total population, and it includes 340,000 LUPs (see Table 2). The LUPs had a total value of £123 billion, equivalent to £363,000 per LUP, which is 18.5% more expensive than the average of £306,000 for the dataset. Data was collected from 9 of the 10 regions in England and Wales; only the East Midlands are not represented. Overall the dataset had 13,717 observations of LSOA, which aggregated to 2869 observations at the MSOA level and 112 observations at the local authority level. London had the best coverage, with data from 91% of local authorities, while South East England had the lowest coverage (after the East Midlands) with 4% coverage. By summing the local authority totals for second home data from the 2011 census (ONS, 2012), we see that the collected data covers an estimated 40% of second homes in England and Wales.

Table 2 Summary of collected dataset

We find that the distribution of LUPs across the LSOA is very skewed. The median LUP percentage by LSOA is 2%, and the mean is 3.1% while the maximum is in an LSOA in the City of London where 54% of properties are LUPS. Across the data, we see that 5, 20 and 50% of the LSOA contain 29, 56 and 82% of the LUPs, respectively. We should note that due to the lack of incentive to register a property as a LUP the figures are likely to be underestimated to some degree.

Based on the current council tax base, and the assumption there would be no substantial change in low-use levels before the tax was introduced, an empty homes tax of 1% of property value would raise an additional £1.2 billion across the dataset. That is equivalent to 11% of the council tax currently collected by those local authorities. However, the empty homes tax is not evenly distributed. The local authority of the City London would gain an additional £2100 per resident, equivalent to 260% of current council tax. Similarly, Kensington and Chelsea would raise an additional £2000 per resident, equivalent to a 201% of current council tax. However, at the bottom of the scale, Barking and Dagenham would raise only £3.6 per resident, equivalent to 1% of the current total council tax. The median value across the dataset was 6%. We should note that although this tax would be unpopular with the owners of LUPs, only residents are allowed to vote in local elections (Electoral Commission 2017).

Visual inspection of mapped data shows that in some local authorities, LUPs are concentrated in the least affordable areas while in others, they are concentrated in the most affordable areas. Figures 3 and 4 show this phenomenon using the London borough of Kensington and Chelsea and the local authority of Bradford as examples. In the overall dataset, just under 60% of local authorities had high-value LUPs.

Fig. 3
figure 3

LUP percentage in Kensington and Chelsea. Kensington and Chelsea have some of the highest registered LUP rates in London. The LUPs are concentrated in the wealthy central and south-east portions of the borough, while the poorer north has lower LUP rates

Fig. 4
figure 4

LUP percentage in Bradford. Bradford has low levels of LUPs but contains a few LSOA that have much higher percentages. These areas are less expensive than the local authority property median

After resampling the mean price of LUPs by local authority, we compared the distribution of LUPs relative to affordability quartiles. If LUPs are randomly distributed, each quartile will contain 25% of the LUPs. Figure 5 shows the distribution of LUPs between the affordability quartiles for Bradford, Kensington and Chelsea and for all the data. The data does not show a randomly distributed pattern; rather, LUPs are over-represented in the upper quartiles of local authorities like Kensington and Chelsea while they are under-represented in the upper quartiles of areas like Bradford. However, over the whole dataset, there are more LUPs in the upper quartile than would be expected given a random distribution. The observed pattern means that LUPs tend to be more expensive than the homes in the majority of the local authorities in the dataset. This suggests that the most desirable properties are being bought for purposes other than use as a home.

Fig. 5
figure 5

LUP percentage by affordability quartile. The data shows that across England and Wales, LUPs tend to be disproportionately more expensive than homes

The middle panel of Fig. 6 shows the relationship between LUP percentage and the affordability ratio for all data. It appears to show a cubic relationship with LUP percentage and affordability. However, upon separating London from the rest of the data, two distinct behaviours emerge. In the left panel, showing England and Wales excluding London, we see that the curve is U-shaped, as would be expected from the hypothesised demand curve made up of high-value and low-value components. In the right panel is the data for London; it shows a situation that would be expected when there is always high demand for homes but variable demand for LUPs. The London panel may point to a situation where local authorities are indirectly affected by high-value LUPs as residents move from high-value LUP areas to low-value LUP areas, thus pushing up demand and consequently prices (Guerrieri et al., 2013).

Fig. 6
figure 6

Observed LUP/affordability curves. The figure shows that outside of London, LUP percentage follows a U-shaped curve, but in London, demand for homes is so high that there are few low-value LUPS anywhere

We calculated growth in property values by estimating the property price a decade earlier, in the time period 2003–2007, using the same resampling method and calculating the percentage change relative to the current price estimates. We then correlated house price growth at the local authority level with affordability. We found a correlation of 85%; however, as we do not have the income estimates for that period, we can only say that in the intervening decade, the most expensive areas the saw the largest price increases.

As we observe a large number of LUPs in desirable and expensive neighbourhoods, we can surmise that many of these LUPs are holiday homes or lets. Thus, we use the density per 1000 homes of the number of hotels and guest houses in each area, obtained from the Valuation Office Agency (VOA, 2018), as a proxy for tourism.

Figure 7 shows the separation between high-value and low-value areas at the local authority and MSOA level. Panel one shows that at the local authority level, the split is relatively clear, with the low-value LUPs generally in the bottom left corner. The lower-left part of each panel describes areas that have low tourism density and are very affordable. However, at the MSOA level, the pattern is much less clear. This difference is due to the relative lack of variation at the MSOA level compared to the local authority level. MSOA are sufficiently small that the entire geography can have a high percentage of expensive LUPs, while this is not the case in a local authority, which will always contain less desirable areas and areas of social housing. The result is that an MSOA that is full of very unaffordable LUPs can contain homes that are even less affordable.

Fig. 7
figure 7

Identifying high-value LUP areas. Log tourism density per 1000 homes; the least affordable homes have percentage rank 1 whilst the most affordable have rank 0

In Fig. 8, we see the quadratic relationship between affordability and LUP percentages, as shown previously in Fig. 6, expressed through the tourism metric.

Fig. 8
figure 8

Identifying high LUP percentage areas. Log tourism density per 1000 homes; the least affordable homes have percentage rank 1 whilst the most affordable have rank 0

Models

The results of modelling the local authorities are shown in Table 3. The table shows the formula used in the ID column and the percentage of times the model was better than chance in the “Beats Null” column. The accuracy column is the probability that any classification is correct, while the columns labelled PosPred/NegPred are the probability that a positive prediction or a negative prediction is correct. The lift column is the ratio of the accuracy to the null model/base rate. All values in the table are the mean after cross-validation.

Table 3 Summary of viable local authority models

For each of the dependent variables, only a single valid model was generated. The other models either failed to beat the baseline a significant number of times, had coefficients that were not significant at the 0.95 level, or that only provided a minimal amount of predictive power. Formula 3 was the best choice for predicting high-value and HVHL areas, while formula 4 was best for predicting high LUP percentage areas. Predicting at the MSOA level was more challenging, and only a single model was valid (see Table 4). The MSOA model was quite good at predicting high-LUP percentage. However, it was not able to provide any meaningful predictive power for identifying high-value or HVHL areas for the reasons described at the end of the previous section.

Table 4 Summary of viable MSOA authority models

The model coefficients, shown in Fig. 9, provide useful insight into the relationship between the independent and dependent variables. Tourism is associated with an increased likelihood of an area having high levels of LUPs, being high-value and being HVHL. The higher the affordability ratio, the more likely an area is to be high-value or HVHL. However, due to the quadratic term, affordability has a negative relationship with high levels of LUPs while the quadratic affordability term has a positive relationship. This quadratic relationship has been observed in earlier Figs 6 and 8. Generally, the coefficients are very stable. The squared affordability ratio term has the widest range of values. However, it is still quite stable, does not appear to be overfitted and has significant p-values at the 0.95 level on average after cross-validation.

Fig. 9
figure 9

Model coefficients. The models show stable coefficient values from the cross-validated testing

A model based on resampled majority voting was also tested. This model used the perturbations in the affordability ratio and the difference in median price to create 501 models from the resampled data and majority vote a single answer. However, it proved no better than the simple model based on the average of the resampled data and did not reduce the standard deviation of the error around the mean upon cross-validation.

We then retrained the successful model formulas for high LUP percentage, high-value and HVHL, at the local authority level, using all the data. These models were then used to predict the dependent variable across all local authorities in England and Wales. The results are shown in Fig. 10. The uncertainty of the percentage of the population living in an area that is high-value or HVHL was calculated using the accuracy of the models from Table 3. The results are shown in Fig. 11. The figure shows that the 95% confidence bound of the population living in a high-value LUP local authority is between 39 and 47%, and between 19 and 27% for HVHL areas.

Fig. 10
figure 10

Model prediction results. Each panel shows the local authority status using the predictive models. Where the local authority provided data, that data is used

Fig. 11
figure 11

Estimating population fractions. The figure shows the estimated distribution of the total percentage of the population living in a high-value LUP or HVHL area

Discussion

Our goal in this paper was to quantify the fraction of the population of England and Wales living in areas where the low-use property is more expensive than a home occupied by a full-time resident; we also wanted to quantify the relationship between affordability, tourism and areas of high-value LUPs. We have achieved this by creating predictive models that use the affordability ratio and tourism density. Repeated cross-validation of the models shows that they effectively predict high LUP levels, high-value LUPs and high-value-high-LUP areas at the local authority level. We estimate, with 95% confidence, that in England and Wales 39–47% of the population lives in a local authority where LUPs are less affordable than homes and 19–27% lives in a local authority where LUPs are less affordable than homes and the fraction of properties that are LUPs is above the median.

The concentration of LUPs in areas that have a high affordability ratio and a high tourism density combined with the theory of supply and demand suggest that house prices could be pushed up due to LUPs. However, our model does not demonstrate causality, and we only interpret the results through the framework of supply and demand.

In a market where the demand for housing is partially fuelled by the demand for LUPs, building new properties as a way of decreasing prices is not necessarily a practical measure. New properties in these areas will attract a relatively large proportion of LUP purchasers, thus reducing the net supply of new homes. In addition, it may not be possible to build in the areas with the highest demand as these areas may already be developed. As for building in areas where the LUPs are more affordable than homes, this may merely displace residents from high-value LUP areas as they move to more affordable areas. This would also mean that the net number of new homes would decrease as the properties of those who moved from high-value LUP areas are converted from homes into LUPs. This may already be occurring in housing markets such as London.

Given the potential issues with increasing supply by building new homes, alternative methods for reducing demand for LUPs, such as the empty homes tax implemented by the city of Vancouver and other governmental authorities, may be more effective. This is not to say that such a tax would be a silver bullet, but it could be a useful policy tool, especially in high-value LUP areas. Also, such a method has the potential to generate a not inconsiderable income for local authorities whilst taxing people who are typically not eligible to vote in local elections.

This paper has shown that it is possible to provide a detailed analysis of housing at the local and national level using publicly available data. It has identified measurable indicators that correlate strongly with high-value LUPs and high levels of LUPS. The method is easy to use and can be implemented for the UK or other countries to help inform housing policy. This method is especially useful for policymakers in housing markets that have high property demand and that are considering measures to reduce house prices.

Further work

An analysis of the data using HMAs may provide a clearer picture of supply and demand that crosses local authority borders. It could also provide an understanding of the displacement of those that work in an area but cannot afford to live there.

Including data on short-term and long-term rental property numbers would possibly explain more of the variance in the models.

With a more complete dataset, MSOA analysis could be improved by using a custom local authority based on an ego network of all MSOAs up to two jumps away. This would allow a local authority-level analysis on individual MSOAs.

Obtaining times series data on LUPs would mean that it may start to become possible to monitor the flow of population in response to changes in LUP percentage and gentrification. It would also be possible to create a causal model.