Investments in energy efficiency are essential for the transition to a low-carbon economy. Most, if not all, developed economies have set ambitious targets for energy retrofits in the building sector. However, energy efficiency targets are difficult to achieve1. This observation is well known—for more than 40 years, researchers have pointed out that society is lagging behind in energy efficiency investments, a phenomenon known as the ‘energy efficiency gap’2. The key question regarding this gap is whether or not systematic barriers and household preferences are impeding the adoption of seemingly cost-effective, energy-efficient technologies.

On the basis of personal narratives combined with natural language processing (NLP), we propose a method to study the energy efficiency gap. To elicit narratives related to energy efficiency investments, we conducted an extensive survey among single-family homeowners in the Canton of Zurich, Switzerland. In particular, we asked respondents to express their thoughts on specific topics in open-ended responses: What are the main barriers and determinants to energy efficiency retrofits? What policy measures would households prefer, and how are these measures related to the barriers and determinants of energy efficiency investments?

Our Article makes two main contributions. First, we add new evidence to the empirical literature on the barriers and determinants of households regarding their energy efficiency investments3,4,5,6,7,8,9,10. As discussed in a review of the literature11, there are important gaps in these data—most studies focus on a predetermined set of explanatory variables due to the limitation of closed-ended questionnaires. This makes it difficult to generalize findings, and this approach is also prone to researchers’ biases. It does not enable the decision-makers’ thought processes to be uncovered about the most relevant barriers and determinants.

Second, we contribute to the emerging literature in economics that uses narratives to explain the drivers of decisions12. Our approach enables us to uncover more nuanced barriers and determinants of technology adoption than closed-ended survey questions (that is, multiple-choice questions). We provide a proof of concept that researchers and policymakers can easily implement, scale to large samples and replicate across contexts. Our method has the potential to examine the energy efficiency gap in a broader context and, more generally, to robustly identify behavioural barriers and determinants related to energy efficiency.

We find that narratives can help understand the personal factors of households, particularly the barriers and determinants to undertaking energy efficiency retrofits. Our main finding states that energy efficiency investments are highly opportunistic. Non-takers (that is, homeowners who did not invest in energy efficiency) often believe that their homes are already energy efficient enough and therefore do not seek such investments. For takers, that is, homeowners who have invested in energy efficiency, a large share did so because a building technology no longer functioned and needed replacing. Financial considerations are a major barrier to renovation, but for those households that have invested, they are not the primary reason. Co-benefits, namely comfort gains and reducing the environmental footprint, play a similar or even more important role.

Survey data

We collaborated with the Statistical Office of the Swiss Canton of Zurich to recruit participants for our research project. We sent personalized invitation letters to a random sample of single-family homeowners, stratified along the following criteria: those owning a home that was constructed before 1990, and where 50% were to those with renovation permits during the 5 years leading up to the survey. We stratified the tenant age and the number of tenants by considering buckets for both variables; in addition, we targeted homeowners with new buildings who adopted the primary certification for energy-efficient buildings in Switzerland, that is, ‘Minergie’ certification.

Of the 16,700 letters sent, the response rate was high (20.8%), with 3,471 respondents starting the survey. A detailed description of the variables is given in Supplementary Tables 5–7. For a detailed overview of the survey procedure, see ‘Survey procedure’ in Methods.

We identified two groups of households based on their past and intended energy efficiency behaviours. The first group, who adopted the Swiss energy efficiency certification for new buildings (Minergie), was not included in this analysis. The second group comprised households with houses built before 1990 that had either performed energy efficiency retrofits in the past 5 years or planned to do so within the next 5 years. From these, we categorized households as either non-takers (21% of the sample) or takers (79% of the sample) based on whether they had invested in energy efficiency or planned to do so.

Table 1 shows the different household types and how they differ for key building characteristics, demographics and psychographics (variables that measure the degree of energy-related literacy, support for environmental causes and self-reported happiness). Overall, takers and non-takers are similar in the building characteristics and demographic features, except for a slightly higher age among non-takers. For this reason, we focus on the specific barriers and determinants to the energy-efficient retrofits mentioned by takers and non-takers.

Table 1 Summary statistics by household type

Barriers and determinants of energy efficiency investments

This section presents the barriers and determinants to energy-efficient retrofits from our survey. Takers, that is, respondents who had carried out a retrofit in the 5 years before the survey or who intended to do so in the next 5 years, provided their motivations for the renovation. Non-takers stated their reasons against retrofitting. For more information on the method used to analyse the text responses, see ‘Eliciting energy efficiency narratives’ and ‘Semi-manual classification’ in Methods.

We grouped all barriers and determinants into four categories that correspond to the different types of market barriers: classic ‘market’-type barriers are normal components of markets that impact decisions, such as heterogeneity in building stock; ‘non-market’-type barriers refer to co-benefits and hidden costs, for example, hassle costs or increased comfort; ‘financial’-type barriers are related to prices and costs; and ‘behavioural’-type barriers describe the psychological, cognitive and educational factors of decision-making for households. For a detailed taxonomy of barriers and determinants, see ‘Taxonomy of barriers and determinants’ in Methods.

Table 2 shows the barriers for the non-takers that were elicited from the open- and closed-ended questions. Primarily, non-takers believe that their home is already energy efficient. In terms of consistency between the open-ended and closed-ended questions, this barrier is found to be the most prevalent for both approaches. When using narratives, ~49% of respondents wrote that their home was already energy-efficient, but only ~38% chose this option as a potential barrier in the closed-ended question.

Table 2 Barriers to energy efficiency retrofits for non-takers

However, the two building characteristics that generally have the greatest impact on a dwelling’s energy efficiency potential—the year of construction and type of heating system—did not differ drastically between non-takers and takers (see Table 1). Therefore, the belief of the non-takers about the energy efficiency of their homes could either be a personal preference (a normal part of markets) or a misperception. In the latter case, information campaigns and subsidized audits may be warranted to address this barrier.

The second most important barrier is the cost of retrofitting. Although important compared with the other barriers, it was mentioned by ~26% in the narratives and by ~22% in the closed-ended answers. Financial barriers are at the heart of energy efficiency programmes, which have led to generous subsidy programmes in Switzerland and elsewhere. Our results suggest that policymakers may shift their focus away from this financial barrier.

The third most important barrier shows a discrepancy between the open-ended and closed-ended questions. In the narratives, the age of the respondent, for advanced ages in particular, is the third most mentioned barrier. Old age is not a topic we listed a priori in the closed-ended question. With hindsight, we recognize that this can be an important barrier. Retrofits are long-term investments that older homeowners may not fully realize during their expected lifetimes. In our taxonomy of barriers, heterogeneous life expectancy is a normal component of markets that does not require policy intervention.

In the closed-ended question, the hassle costs, that is, the fact that homeowners perceive such an investment to be too complicated, are the third most important barrier, with 10% mentioning it in the closed-ended answers. The narratives show a similar but slightly lower percentage of respondents mentioning this barrier.

The respondents selected several other barriers with low frequency in the closed-ended questions but did not mention them in the open-ended responses. When respondents choose among a predefined list of elements, it is almost costless for them to select an additional option. Thus, the closed-ended question leads to a greater variety of barriers, but cheap talk could also be involved. By contrast, writing about an additional barrier in response to an open-ended question requires more effort. Therefore, open-ended questions could lead to more truthfulness in identifying the most important barrier(s) faced by each non-taker. For example, aesthetics and the difficulties associated with renovating landmarked buildings are two barriers that were highlighted as important in the closed-ended questions; however, respondents rarely mentioned them in their narratives.

Table 3 presents the determinants of energy efficiency renovations for takers. The elicitation procedure plays an even more important role compared with the barriers. In particular, it leads to greater variation in the share of each topic and ranking between open- and closed-ended questions.

Table 3 Determinants of energy efficiency retrofits for takers

From the narratives, the most important determinant is the need to replace broken building parts. The importance of technological obsolescence is consistent with the main barrier for non-takers, who perceive that they have few options for improving energy efficiency. This observation suggests that energy efficiency investments are highly opportunistic and not the result of well-planned replacements. Homeowners often do not start thinking about energy efficiency until a building technology malfunctions.

In the closed-ended question, the top-ranking determinants were reducing the environmental footprint and comfort—more than two-thirds of the respondents chose these options. In the open-ended question, these two factors ranked second and fourth, respectively; however, they were mentioned by less than one-third of the respondents. Nonetheless, both elicitation procedures show that the non-market benefits associated with energy efficiency investments are important determinants.

The relative decrease in the importance of the environmental footprint between the closed- and open-ended answers may be related to the ‘intention–behaviour’ gap in sustainable consumption, where respondents indicate that they value ecological product aspects but do not reflect this in their purchases13.

The third most important determinant relates to financial motivations. Again, there is a large discrepancy between the share of respondents who indicated this determinant in the open- and closed-ended answers. Specifically, ~29% of respondents mentioned this determinant in the narratives, but ~37% selected this topic in the closed-ended question. In addition, the impact of energy efficiency on the resale value received little attention in the open-ended question, at less than 5%. However, around 25% of respondents still selected this determinant in the closed-ended question.

Overall, the results clearly show the main reasons for investing in energy efficiency: obsolescence, non-market benefits and financial considerations. However, the respective importance of each determinant depends on the elicitation procedure. This, in turn, has important implications for targeting energy efficiency measures—depending on whether closed- or open-ended questions were used, policymakers may choose different measures.

Policy preferences

For the design of energy efficiency policies, it is crucial to identify the level of awareness and preferences for specific policy instruments, as well as how these affect key barriers and determinants. We first show the level of general awareness of the policy landscape and the respondents’ experiences with these policies. In the second step, we analyse the responses to the open-ended question on policy preferences.

We asked respondents about their awareness of the four main energy efficiency policies in Switzerland: rebates on mortgage interest, tax exemptions or deductions, various subsidies from cantons and municipalities, and the Swiss Federal Building Program. For each policy, the respondents were asked to choose one of four options: not aware; aware; used; or intend to use. The results show that the takers were slightly more aware of the policies than the non-takers, and that they used them more frequently. This trend is also reflected in two indices that we created, that is, policy awareness and policy use. Each index would score a maximum of four points if the respondent was aware of all four policies or had used them in the past. For more information on policy awareness and use, see Supplementary Methods 3.

In the second step, we relied on narratives to determine the policy preferences. Unlike for barriers and determinants, we used only an open-ended question. Table 4 shows the respondents’ policy preferences for different instruments. A wide range of topics emerged from the narratives. When asked how policies could encourage energy efficiency investments for all households, the top suggestion was more generous subsidies, followed by policymakers providing more information.

Table 4 Policy preferences from open-ended answers

The third most common suggestion was to reduce bureaucracy (especially related to subsidies and building permits). Other suggestions included promoting standards and a greater focus on PV panels and on heating systems. Other topics, which accounted for a smaller share, also appeared in the narratives. Tax-related measures were discussed but were not a popular topic, especially compared with subsidies. Although subsidies were the most frequently mentioned topic, more than half of the respondents favoured other measures.

We grouped all of the policy options suggested by respondents into three broad categories, depending on what type of barriers an instrument may best address. The first category consists of market-based instruments: policy options related to subsidies and taxes. The second category consists of behavioural instruments, including instruments motivated by behavioural biases, most notably information provision and standards. The third category consists of non-market-based policy instruments that involve other interventions, such as reducing bureaucracy.

Heterogeneity and targeting

Targeting policies to populations where they will be effective is critical to improving energy efficiency. Numerous studies that have conducted ex-post evaluations of energy efficiency subsidies have found them to be an expensive way to reduce carbon emissions—far beyond the social cost of carbon14,15,16,17. The main reason is that these subsidies are not targeted, and many recipients are so-called free riders, that is, these households would have made the same energy efficiency investments even without the often generous subsidies. One way to address the free-rider problem and increase the cost-effectiveness of energy efficiency subsidies is to target these measures to specific types of household18.

Our results suggest that it will be difficult to encourage non-takers to invest in energy efficiency through targeted interventions that are based on observable characteristics. In Supplementary Methods 4, we present a linear probability model to analyse the differences in observable variables between takers and non-takers. Except for age, we found few observable variables that predict which households do and do not take up energy efficiency investments—hence, policymakers have few opportunities to use observable information.

Instead of targeting the entire group of non-takers, policymakers could target at a more granular level within takers and non-takers. Next, we analyse the correlation between key barriers/determinants and various variables that policymakers may observe or elicit, such as demographics, building characteristics, policy preferences and psychographics (that is, energy-related literacy, support for environmental causes and self-reported happiness).

We first examine the drivers of heterogeneity in barriers within the non-takers category. Table 5 presents three linear probability models, one for each main barrier elicited with the narratives. In these regression models, the dependent variable is a zero-one dummy variable that takes the value of 1 if the respondent mentions a particular barrier in the open-ended question.

Table 5 Linear probability model on the major barriers

Model column 1 shows the heterogeneity for the barrier that the building was considered to be already energy efficient. None of the coefficients for the traditional observable covariates is statistically significant.

For the financial barrier in model column 2, the respondents are less likely to be female; they have less education but report a higher rental value for their homes. In addition, these respondents were less likely to have used existing policies for retrofits.

Model column 3 examines the barrier of old age, with the tenant age being significant and positive, again consistent with the nature of the barrier in the narratives. Here, none of the policy variables showed statistical significance. Although several variables strongly correlate with each of the main barriers, the observable variables explain only a small portion of the total variance. The lack of statistically significant characteristics shows the difficulty of policy targeting for policymakers. Results suggest some heterogeneity in policy use; however, no particular policy preference is associated with any barrier, suggesting that policy preferences follow a uniform distribution over the different barriers.

Next, we analyse the heterogeneity within the takers. Table 6 shows four linear probability models, one for each major determinant of retrofitting.

Table 6 Linear probability model on the major determinants

Model column 1 shows the result for the determinant of replacement of existing parts (that is, obsolescence). Tenant age is significantly and negatively correlated with this determinant, possibly due to the shorter time that older tenants expect to live in their homes. As expected, building age positively correlates with this determinant, as older buildings are more likely to need repairs than newer ones. Respondents who renovated to replace broken parts were less likely to know about available subsidies. In addition, the replacement determinant is associated with a policy preference for market-oriented and non-market-oriented policies; however, these respondents have no preference for behavioural policies.

Model column 2 presents the heterogeneity for the financial determinant of saving money. Interestingly, a university degree and previous donations to environmental organizations are negatively associated with this determinant. Not surprisingly, respondents who renovate to save money show higher policy usage (but not a higher level of awareness) and a preference for market-based policies (that is, subsidies).

Model column 3 displays the comfort determinant. Only a positive correlation for building age is strongly associated with this determinant. Moreover, this determinant does not seem to show statistically significant heterogeneity in terms of their awareness or policy preferences but a higher level of policy use.

Model column 4 shows heterogeneity among respondents who renovate out of environmental concerns. Both income and previous donations to environmental organizations have a strong and statistically significant association with this determinant. Similar to respondents who renovate to save money, environmental concerns are associated with higher levels of policy use but not with more awareness. Moreover, environmental motivations to renovate are strongly associated with behavioural policy preferences and are weakly negatively associated with non-market policies.

Policymakers could target their policies if barriers and determinants correlate with some well-defined household types. Therefore, we group respondents based on the previous heterogeneity analysis.

Among the non-takers, two types of homeowner emerge. Those who do not renovate because they already consider their home to be energy efficient, and those who face financial constraints. The first type of homeowner is not significantly different from other non-takers; the second type of homeowner tends to be male and has a lower level of education. Overall, policy preferences play a minor role in explaining the barriers of the non-takers, which limits the scope for policy targeting.

For the takers, the main aim of policymakers is to identify homeowners who are renovating to replace broken parts (replacers) and encourage them to plan their retrofits earlier. In addition, policymakers can target respondents who do not see energy-efficient retrofits as a financial opportunity (money savers): households that do not belong to the money-savers group may not be aware of the long-term cost savings from improved energy efficiency.

Compared with other takers, replacers are relatively young and live in older buildings. In addition, replacers are less informed about existing policies and prefer policies that involve less bureaucracy and higher subsidies. They are consistent with a pragmatic decision-maker whose primary motivation is to replace broken parts of the house (which is often a necessity). Even though policy awareness is low for this group, respondents prefer less bureaucracy and higher subsidies rather than more information on policies. The replacers are thus looking for ways to facilitate the retrofitting process, which policymakers can achieve by providing more information and reducing bureaucracy.

Money savers are less educated and less likely to donate to environmental organizations. They are more inclined than other groups to use existing policies for retrofits and favour market-based policies (such as subsidies). It is, therefore, difficult for policymakers to target specific policies.

Aside from age and income, there are few observable variables that enable policymakers to target non-takers. Most homeowners who renovate do so to fix broken parts; this can be seen as free riding because these homeowners would have renovated regardless of receiving subsidies. By contrast, replacers are less aware of policies and favour reducing bureaucratic hurdles. Policymakers could therefore limit free riding by targeting the group of replacers, especially by improving the institutional framework for energy-efficient retrofits—through reducing bureaucracy and increasing information campaigns.

Discussion and conclusions

Using an emerging approach based on open-ended survey questions, we elicited the narratives of homeowners about their renovation behaviour. Our key finding is that energy efficiency investments are highly opportunistic. Non-takers believe, rightfully or not, that there are few opportunities for energy efficiency in their homes. Whereas financial considerations are a major barrier to renovation, we found that homeowners who do make energy-efficient retrofits do not see them as financial opportunities. Strikingly, most homeowners delay renovations until they need to replace broken building components. In such a setting, the renovations would probably have been carried out even without subsidies, suggesting that free riding plays an important role in Switzerland’s generous subsidy programmes.

Our approach consists of eliciting narratives in surveys that are based on open-ended questions. Narratives, combined with NLP methods, offer a powerful means of identifying and assessing the key barriers and determinants of households’ retrofitting decisions. Notably, narrative-based rankings differ from rankings using closed-ended questions. In the closed-ended questions, respondents indicated that the primary reason for performing a renovation was to save money and address environmental concerns. By contrast, the narratives showed a different picture, where the main reason was to replace old parts of the building. In addition, the importance of several co-benefits of energy efficiency decreased when using closed-ended questions, especially regarding comfort gain and environmental concerns.

Furthermore, we found a generally low level of policy awareness and, among takers, a low policy usage. In particular, homeowners who have renovated to repair broken parts had a lower policy awareness, which provides an opportunity for information campaigns. Unsurprisingly, most homeowners supported higher subsidies; however, subsidies in Switzerland are already generous. The second and third most favoured policies were related to improving the institutions related to retrofits, that is, reducing bureaucracy and providing more information for homeowners.

Most of the characteristics that influence the decision of respondents’ to carry out retrofits are challenging to target for policymakers. For example, the main difference between takers and non-takers was the older age of the non-takers. It is difficult to imagine a means-tested subsidy programme using age as a criterion. However, a programme with a ‘senior discount’ and a more practical approach for older households may be politically feasible. We found no other demographic variables that policymakers could use to target policies besides age. Moreover, there are no significant differences in policy preferences.

Since the primary determinant of energy-efficient retrofits was to repair defective parts, more generous subsidies may have a small impact on the retrofitting decision and encourage free riding. Most homeowners renovate out of necessity rather than planning their retrofits for the long term. However, takers are often unaware of existing measures and would welcome a reduction in the bureaucratic burden of retrofitting. Instead of purely monetary incentives, an effective policy should also consider institutional factors, such as bureaucratic burden and accessibility of information.

In the future, rapid improvements in artificial intelligence and chatbots will make text-based interactions more common, providing fertile ground for researchers and policymakers. Respondents’ narratives about energy efficiency offer precise explanations for their preferences and often differ from traditional multiple-choice answers. Future research can analyse these differences and uncover new insights for policymakers, which can help to close the energy efficiency gap.


Related literature

Our study relates primarily to the literature on the energy efficiency gap that has developed a taxonomy for barriers and determinants of energy efficiency investments. It also relates to the broader literature in energy economics studying technology adoption and policy preferences using stated and revealed preferences. We briefly review these two strands of literature to put our study into context.

Taxonomy of barriers and determinants

In the literature, there are several frameworks that have identified and categorized the barriers and determinants of energy efficiency investments. One particularly influential taxonomy distinguishes between three different perspectives: economic, behavioural and organizational3.

The economic perspective considers rational utility-maximizing agents as the benchmark to understand the choices of agents regarding the adoption of energy-efficient technologies. The behavioural perspective departs from this purely neo-classical framework: it considers different manifestations of bounded rationality, which have also been referred to in the literature as behavioural failures19 or internalities20. Finally, the organizational perspective considers the role of institutions with which the agents interact. These could be institutions that governments have little ability to transform, such as values and culture, or others over which they have considerable influence, such as fiscal, competition and regulatory policies.

Although this taxonomy is helpful in navigating the different explanations of the source of the energy efficiency gap, effective policy design needs a more precise categorization. Based on the economic, behavioural and organizational perspectives from the literature3, our taxonomy thus distinguishes between behavioural, financial, non-market and market barriers.

Behavioural barriers. The empirical research has focused mainly on determining whether or not households correctly perceive the energy-savings component of the net investment costs. Some of these inefficiencies may be behavioural. Hence, we consider them to be internalities20, such as inattention and biased beliefs about energy prices, to name but a few. These behavioural barriers can, however, be confounded with neo-classic market barriers such as different access to credit or time-discounting preferences.

Financial barriers. The second set of economic barriers focuses on the role of external financially related factors. Energy prices may be too low, investment costs too high, subsidies may not be generous enough, and various financial distortions could exist. Formally, in a household-investment framework, the role of financially related barriers operates through the price variables and the preference parameters that dictate the price sensitivity.

Non-market barriers. The third subcategory of economic barriers consists of the non-market components of the investment, which includes the various co-benefits and hidden costs of such investments. The literature has pointed out that specific co-benefits can be important, which researchers investigated mainly using contingent valuation methods21,22. External organizational constraint factors can affect the hassle costs, which we also consider to be a form of a non-market barrier.

Market barriers. Finally, classic market barriers are typical components of well-functioning markets. These barriers can arise due to heterogeneity in the building stock, technologies and preferences. Understanding this heterogeneity can enable policymakers to target energy efficiency policies to increase their cost-effectiveness18. Researchers have recognized this point for a long time, and several studies have investigated different dimensions of heterogeneity in the decision to adopt energy-efficient technologies, for example, in ref. 23.

Apart from these different types of barrier, there are also market failures at the source of the energy efficiency gap. Externalities associated with energy systems that are not systematically taken into account, asymmetries of information and imperfect competition are three market failures that interact with the different barriers.

Our focus is on uncovering the barriers and determinants that arise at the level of the decision-makers. The market failures listed above are features of the entire market. For instance, pollution from energy production affects all market participants and may not be reflected in market prices. However, this externality does not arise from the individual decision-making process of households, which is the focus of our elicitation approach (for example, some households may invest in energy efficiency to profit from co-benefits such as comfort).

Methods to elicit barriers and determinants

Empirical researchers in the energy sector have used two types of approach to elicit the barriers and determinants of technology adoption: stated and revealed preferences.

With revealed-preference methods, the underlying determinants of economic decision-making are inferred from observed, real-life choices (for example, in ref. 24) or from experimental data (for example, refs. 25,26,27).

With stated-preference methods, analysts will typically construct a survey with well-defined options of barriers and determinants that they think households might face (for example, refs. 28,29,30); analysts can also construct hypothetical-choice situations to have a perfectly controlled environment from which they can infer underlying preferences (for example, refs. 31,32,33,34,35). These data, together with a model that provides the micro-foundations for mapping between preferences and observed choices, are used to infer preferences and, incidentally, particular barriers and determinants.

Eliciting narratives using open-ended survey questions is a third and complementary approach. Researchers have used this question type since the 1940s, although not systematically36. With small samples, however, open-ended questions are used regularly, for example, in refs. 37,38,39,40. On a larger scale, this approach should yield very noisy, hard-to-interpret qualitative data. However, researchers suggested revisiting the concept of open-ended survey questions41. Recent advances in NLP enabled us to turn narratives into quantifiable metrics to elicit proxies for household preferences and market barriers. Using this approach, open-ended questions have become increasingly popular to elicit opinions on critical societal issues such as immigration42, climate change43 and macroeconomic shocks44, as well as to elicit policy preferences12,45.

Narratives offer two advantages over typical closed-ended survey questions in stated-preference studies. First, at the respondent level, narratives elicit a narrower but more specific set of barriers and determinants. In particular, respondents tend to focus on a few but more important topics to explain their decision-making. Second, at the population-wide level, compared with closed-ended questions, narratives uncover a broader set of barriers and determinants.

Most research uses NLP with already existing text data, for example, public comments46, newspaper articles47,48, congressional speech49 or Twitter texts50,51. By contrast, open-ended questions enable the collection of new text data that are specific to the research question. However, owing to the smaller sample size, shorter texts and specialized vocabulary, open-ended responses require a different methodological approach, as we will outline in Supplementary Methods 1.

Important findings from other studies

Two recent reviews11,52 aim to consolidate the empirical findings related to the barriers and determinants of energy efficiency investments. The first of these reviews is the most relevant study for us as it explicitly focuses exclusively on empirical investigations and discusses the methods used in detail11: it covers 26 empirical studies, where most of them used standardized survey procedures with predetermined answers (that is, closed-ended questions) and only one used semi-standardized qualitative interviews.

Overall, the review identified 167 different explanatory variables across the 26 studies, although their main conclusions are humbling11. Despite extensive empirical work, only a few robust findings emerge. Our findings are consistent with some patterns they identified. For instance, the positive relationship between income and energy efficiency investments is present in several studies and ours. Age also tends to predict a lower take-up rate. Higher financial and energy literacy are positively associated with such investments. Comfort, when measured, is also positively correlated with investments. Finally, policies, such as audits and energy-provision programmes, tend to be associated with higher take-up rates.

However, the authors include several caveats regarding their findings11. They conclude that the field is not mature enough to obtain a general consensus on the main barriers and determinants of energy efficiency. As they pointed out, one main problem is the need for comparable elicitation procedures that truly uncover the thought processes of decision-makers. Until now, each study has used its own structured questions and focused on easy-to-measure variables. Not only does this approach make it hard to compare studies and draw general conclusions, but it is also prone to biases of the authors based on their own credences. Moreover, most of these studies rely on cross-sectional regression analyses that look for associations between behaviours and the households’ characteristics, beliefs and other constructed variables. This approach only indirectly infers households’ thought processes and the barriers to and determinants of their investment.

Our proposed elicitation approach based on open-ended questions combined with NLP aims to address these shortcomings by being scalable and comparable across domains and contexts, while directly uncovering the thought processes we ultimately want to learn about.

Survey procedure

The first survey module collected information on past and future energy efficiency-related behaviours: whether households had performed or intended to perform retrofits, as well as the types of retrofit. We used these different behaviours to distinguish between takers and non-takers of energy efficiency investments.

The goal of the remaining modules was to determine the components of the households’ decisions that influenced these behaviours. One of the most important modules focused on different barriers and determinants. To elicit these components, we used open-ended questions, which provided narratives about specific aspects of the decision-making process. We also used structured closed-ended questions that closely mirrored the open-ended questions. Our goal was to provide a benchmark for open-ended questions.

Another module focused on preferences for different types of energy efficiency policies. Finally, the remaining modules elicited household and building characteristics, including some related to the decision-making process, such as financial and energy-related literacy. We used these variables to investigate heterogeneity along several dimensions.

To recruit participants, we collaborated with the Statistical Office of the Canton of Zurich. We sent personalized invitation letters via postal mail to a random sample of homeowners. The letter contained a short description of our research project and a link to an online survey. Household respondents had to type the link into a web browser to complete the survey using the software SurveyMonkey. To incentivize participation, respondents could win one of 100 gift certificates, each worth about US$200, in a lottery. We obtained informed consent from all participants.

We stratified the sample according to the following rules: single-family homes, the year of construction before 1990, and 50% with renovation permits during the past 5 years; furthermore, we stratified the tenant age and the number of tenants by considering buckets for both variables. We also stratified the sample to target homeowners who adopted the primary certification for energy-efficient buildings in Switzerland, that is, Minergie certification.

In the Canton of Zurich, there are 127,950 single-family homes for the period of our survey; 10,737 had applied for renovation permits between 2014 and 2019. The Statistical Office of the Canton of Zurich sampled this population and sent out 16,700 letters on our behalf on 3 February 2020. A household member could complete the online survey until 13 March 2020. Of the 16,700 letters sent, the response rate was high: 3,471 respondents started the survey, which is a response rate of 20.8%. Furthermore, there was a completion rate of 82%, with an average survey completion time of 30 min.

Although our sampling strategy targeted a population of homeowners of single-family houses, a small number of respondents did not fall into this category. There were 161 (renting) tenants and a small number of respondents living in an apartment (n = 23). We excluded those observations from our analysis.

Sample composition: classifying household types

We used the past and intended energy-efficiency-related behaviours to classify households into two broad segments. First, we distinguished homeowners who adopted the Swiss energy efficiency certification for buildings (Minergie). Our stratified sampling strategy ensured that we observed a large number of those households (n = 524). We used these households for a separate study and therefore omitted these observations for the present analysis. Second, we distinguished households depending on whether they performed an energy efficiency retrofit in the past 5 years or planned to do so within the next 5 years. Furthermore, we used only answers from respondents who answered the open-ended question on the barriers/determinants to retrofitting. Based on this criterion, 2,187 households fell into two mutually exclusive categories:

  • Non-takers: households that had not performed energy efficiency retrofits in the past and were not planning to do so in the future (461 observations, 21% of the sample).

  • Takers: households that either performed energy efficiency retrofits in the past 5 years or who planned to perform at least one in the next 5 years (1,726 observations, 79% of the sample).

Our sample is representative of homeowners in Zurich with respect to the respondent age and household size. The average age for homeowners in the Canton of Zurich is 57.2 years. The percentage of single-person households is 11.6% for the entire population (in our sample, this share is 12.5% for non-takers and 8% for takers). For the other variables, there exist no official statistics, and the homeowner population differs from the rest of the population. However, our response rate was high, which supports that our sample is representative.

Eliciting energy efficiency narratives

This section describes how we elicit energy efficiency narratives and turn them into quantitative variables. Our survey first contained a closed-ended question on renovation behaviour, followed by an open-ended question on the same topic. Both questions asked why respondents decided for or against an energy-efficient retrofit. This setting enables a comparison between the two question types. Separately, we elicited barriers from the non-takers and determinants from the takers. The structure of the questions was similar for the question about barriers and the question about determinants. Finally, at the end of the survey, we asked a second open-ended question on the policies that the respondents would favour to increase the number of retrofits. This last open-ended question did not have a closed-ended counterpart.

In the closed-ended questions, respondents had to choose among several options in a multiple-choice question. For barriers, we listed 17 potential barriers discussed in the literature on the energy efficiency gap. Those options were presented to non-takers, whom we asked to select all the barriers that were relevant to them. We use a similar format to elicit the determinants. We established a list of eight potential determinants, and the takers selected those that were important to their retrofit decision.

Providing predefined options in the closed-ended question could lead to an elicitation bias. The goal of the open-ended follow-up question was to provide the respondents with time to think carefully and to elicit a more nuanced response. Writing the text answer about the retrofit decision forces respondents to reflect on their decision, to consider the points from the preceding multiple-choice options again and to decide which factors influenced their decision. When designing the survey, our initial hypothesis was that the open-ended questions would induce the survey participants to discuss fewer, but also the most important barriers or determinants, to their renovation decision. Compared with the closed-ended questions, this format enabled us to discriminate better between the different options, as well as to discover barriers and determinants we may not have thought of while designing the survey.

Our approach relates closely to deliberations in experiments. In these settings, participants make an initial choice, and subsequently reflect on and discuss their decision with other participants to revise their decision and select their true optimal choice53. In our case, the aim is that, in asking the respondents to answer the open-ended question, this implements deliberation in the survey and helps to identify true barriers and determinants.

In our design, the respondents could reflect on their choice but could not discuss it with other respondents. Instead, we leverage the different cognitive processes for closed- and open-ended responses, that is, recognition and recall. From a cognitive point of view, the process induced by an open-ended question is called recall. By contrast, responses from closed-ended questions are based on recognition, where a respondent identifies the correct answer among multiple options. The underlying processes needed to answer recognition-type questions are different. They may be less complex than the more individual task in answering recall questions54,55.

Next, we describe in detail how we design the open-ended survey questions. We then discuss our NLP-based method to classify the text responses into topics. In the final subsection we also compare our method with other text-classification approaches.

Description of open-ended questions

In designing open-ended questions, it is first essential to provide context for the participants and indicate why we ask these questions. We thus structured our survey by initially presenting the following short introduction explaining the rationale for asking open-ended questions (as the survey was conducted in German, we present our own translation of the answers here):

“The reasons for energy efficiency retrofits are complex and different for each household. We would like to learn more about why you decided (or did not) to renovate. What was important to you? Were there alternatives? Your response will help us better understand how we can support energy efficiency retrofits.”

After providing some context, we then asked the following question to elicit the determinants (or barriers) of energy efficiency retrofits:

“Describe the reasons why you decided (or did not) to carry out energy efficiency retrofits. Please write a short text of about four sentences.”

At the end of the survey, in addition to barriers and determinants, we also extracted narratives about policy preferences. For that purpose, we presented the following short introduction:

“The building sector has one of the greatest potentials for energy savings in Switzerland. One of the goals of our project is to improve public programs for energy-efficient buildings and renovation.”

As for the determinants/barriers, after providing some context, the open-ended question was introduced with the following introduction:

“We would now like to ask for your opinion. What approaches do you think the public sector should promote to encourage energy-efficient construction and renovation for households living in Switzerland?”

Overall, the implementation of open-ended questions worked very well. By inspecting a large number of responses, we found that the respondents provided meaningful answers. The length of the answers to the three open-ended questions varied, on average, between 19 and 24 words. For all questions, the standard deviation is slightly below the size of these averages, and some respondents wrote very long and detailed answers. The median number of words was between 12 and 21, and depending on the questions, the 90% percentile is between 44 and 47. The mean and median sentence length is between 1 and 2, with a standard deviation of 1.3–1.6, and the 90th percentile between 3 and 4 sentences. Hence, most respondents wrote less than the four sentences requested in the open-ended question. This result indicates that the requested length of between three and four sentences did not limit the respondents in the length of their answers.

The questions on barriers and determinants were mandatory for all non-Minergie participants. As previously mentioned, we asked only the non-takers about the barriers to renovate, and only the takers about their determinants to renovate. An overview of the summary statistics for the open-ended questions is shown in Supplementary Table 8. For these two open-ended questions, we observed an attrition rate of only 0.6% for the non-takers and 0.9% for the takers—that is, upon having to answer one of these particular questions, only 0.6% and 0.9% decided to stop the survey altogether. At the beginning of the survey, we asked three open-ended questions, which we did not analyse in this study. These questions focused on the sentimental value of the home. They asked the respondents to describe the elements of their home that are associated with positive emotions and with negative emotions, and to describe what they would buy if they won a lottery. These initial three open-ended questions had an attrition rate of 12%, 0.03% and 1.1%, respectively. The question on policy recommendations was not mandatory and was placed at the end of the survey. Furthermore, this question was presented to all respondents, including the Minergie subsample. We observed a higher but still low attrition rate of 8.5%. The self-selection of respondents is thus not a major concern.

In the following subsection, we describe our approach to classify the answers into topics, which we based on a method called ‘keyword dictionaries’.

Semi-manual classification

The first step in the text classification consists of extracting the entire text corpus for each question. Supplementary Table 1 shows the ten most used words for each of the three questions. The table also shows the original German keywords, an English translation and their total frequency in the answers. For all three questions, the most frequent words are not very informative for explaining the text answers. Instead, the most common words refer to the question itself. For instance, the answers for the barriers contain most frequently the words ‘renovation’, ‘energetic’ and ‘house’. The high prevalence of these words is not surprising, given that the question was the about energy efficiency renovations of houses.

For this reason, the most prominent words are of little help in identifying major topics in the text answers. We thus need to focus on words with a lower frequency. For example, ‘expensive’ and ‘costs’ are very informative in identifying barriers to retrofitting. In this particular case, these words indicate that financial reasons may be a barrier. One challenge is that the frequency of these words is relatively low, which makes it hard to systematically identify a specific topic. To identify topics in a robust manner, we propose a keyword approach: this consists of identifying a large number of keywords, with relatively low frequency, that map a response to a predefined topic.

The main idea of our method consists of classifying the text answers using a dictionary-based approach. Using this method, a set of keywords defines each topic: if an answer contains any of these predefined words, the algorithm will classify it into the respective topic. We included as many words as possible in the dictionaries to obtain a precise classification of the topics. For this reason, we considered all words from the open-ended questions as potential keywords for the classification.

The following approach enables us to create dictionaries with a large number of words and with high precision. The classification method proceeds via three steps: pre-processing, clustering and topic extraction. First, the pre-processing step reduces the dimensionality of the text and adds to each keyword its word embedding, which describes the semantic distance from other words. In the clustering step, we cluster the keywords into groups based on their semantic similarity. In the subsequent topic extraction, we manually build a dictionary based on the clusters from the second step.

Pre-processing. The goal of the pre-processing step is to reduce the number of words of the text to facilitate the subsequent clustering, followed by the topic extraction. We first extracted all the words from the respondents’ answers and transformed them into tokens. In this study, we use unigrams, that is, one word per token, which is considered sufficiently precise for NLP56. We then lemmatized all words using the spaCy open-source software library for NLP57 (for example, ‘better’ will be transformed into ‘good’).

Next, we sorted all words according to their part of speech (POS). This step is necessary because many words in our text corpus have a relatively small semantic distance when they belong to the same POS. The underlying reason for this is that word embeddings rely on the ‘distributional hypothesis’, which means that words occurring in similar contexts are assigned a small semantic distance58. Therefore, word embeddings for words of the same POS can be relatively similar if these words occur in the same context. For instance, two adjectives may be considered similar, not because of their descriptions but because both are adjectives. We used the spaCy algorithm for POS tagging, marking all lemmatized words as nouns, adjectives, verbs or adverbs.

To complete the pre-processing, we used word embeddings to map each remaining word to a semantic-distance metric whenever possible. Word embeddings are matrices with a column of values for each word that indicate the relative semantic distance between words (for example, the distance between the words ‘heating’ and ‘oil’ is smaller than the distance between ‘heating’ and ‘pencil’). We can calculate their semantic distance using the cosine similarity for any two words in our corpus. To construct such a matrix for our corpus of unique words, we mapped all the words present in the answers to the predefined German fastText word-embedding vectors59. Not all words could be mapped to the word embeddings and were omitted; most omitted words occurred only once or twice in the entire text corpus and have little value for the subsequent clustering and classification steps.

Clustering. Next, we clustered all unique nouns, adjectives, verbs and adverbs based on their semantic distance from the word embeddings. This step aims to create groups of similar words to facilitate the subsequent topic allocation of the keywords. As a result, for the final step of our approach, the researcher does not have to scan through an unsorted list of keywords and decide which topic each keyword belongs to. Instead, it becomes possible to analyse groups of 10–40 similar words and decide which of these words belongs to a specific topic.

To implement the clustering, we consider that words of the same POS tend to have a small semantic distance due to the word-embeddings construction. To avoid the influence of the POS in clustering words, we cluster words separately according to their POS. We use k-means clustering, which enables us to use the cosine-similarity metric with the word-embedding data; this k-means clustering is an unsupervised clustering method that groups data into a fixed number of clusters. Finally, it should be noted that the clustering step is not sensitive to either the number of clusters or to a specific clustering algorithm, as long as the resulting groups are sufficiently manageable to be read by a researcher.

Topic extraction. In the final step, we extracted the topics. We assigned each word, when possible, to one of the existing topics from the corresponding closed-ended question. Each word could belong only to a single topic. This step was not automated but was performed manually. During the topic-extraction process, we also discovered additional topics, which we then added to the list of predefined topics for the closed-ended question. Of all the unique words, we could assign between 15% and 20% of the words to a topic.

After creating the dictionary with the lemmatized words, we searched the non-lemmatized text for words for which the lemma was contained in our dictionary, and added these words. For example, the initial dictionary contained the word ‘cost’, which is the lemma of ‘costs’, which we added in this last step. Adding the non-lemmatized words enabled the dictionary on the non-lemmatized text to be used.

Finally, after assigning words to topics, we labelled the text answers by automatically searching each answer for the presence of the words that define a topic. We applied the same approach to the question with barriers but identified different keywords and topics. An overview of the most important words that define the topics can be found in Supplementary Table 2 for the two major barriers, in Supplementary Table 3 for the four major determinants and in Supplementary Table 4 for the policy preferences. All tables show the original German words, their English translations and their total frequency.

Using the above approach, we ranked the barriers and determinants to energy efficiency investments by tabulating the topic frequency. We then contrasted the rankings with those obtained from the closed-ended questions. Compared with the closed-ended questions, the answers from the open-ended questions were only sometimes consistent: a respondent could check the box for a certain topic but not mention it in the text answer and vice versa. Possibly, the text classification was inaccurate. As a robustness check, we checked all inconsistent answers manually for the major topics. In most cases, the initial classification was correct, meaning that the topic shares changed only marginally. We present a more detailed analysis of the consistency in Supplementary Methods 2. Note that the survey did not aim to specifically elicit the underlying reasons for the inconsistencies between open- and closed-ended questions. A study regarding opinions on US trade policies observes a similar inconsistency between closed- and open-ended survey responses60.

Supplementary Table 9 provides an overview of how each pre-processing step reduced the number of words. Initially, all unique words in the corpus were considered to be potential keywords. In the first pre-processing step, selections were made based only on the POS and only nouns, verbs, adjectives and adverbs were retained. Next, we lemmatized the words and dropped words with three characters or less (except for ‘CO2’ and ‘old’ with the determinants and ‘PV’ in place of photovoltaic with the policy recommendations).

For the barriers and determinants, we did not have a limit on the frequency of the words, but for the policy recommendations, we only selected words that occurred at least twice. This slight change in procedure was because we did not have a predefined list of topics from a closed-ended question. Therefore, we did not aim for maximum precision and could improve the topic extraction considerably. Selecting words with a frequency equal to or higher than two substantially reduced the number of words and thus facilitated the topic clustering. Rarely occurring words are mainly important for very precise and small topics. Furthermore, because we did not compare the open policy question to a closed question, this level of precision was not necessary. Working with a corpus with a lower dimensionality also facilitated the initial discovery and definition of topics.

The second pre-processing step mapped the remaining keywords to the word embedding whenever possible, further reducing the number of words. Most words that could not be mapped occurred just once or twice in the text corpus, which suggests that these words would not influence the classification importantly. Overall, our pre-processing reduced the number of unique words by 50–75%.

On the basis of the lists of words for each open-ended question, we constructed the initial dictionaries. The dictionaries contain between 15% and 20% of the remaining words after pre-processing. In a final step, we re-introduced the non-lemmatized words, increasing the size of the dictionaries by 7–21%.

Validation and comparison with other approaches

We compared our method with both manual classification and machine-learning-based approaches using two types of model validation: the first based on human coding and the second based on the semantic-distance measure from the word embedding.

Our first validation consisted of comparing our model against two human coders, similar to approaches in the literature56. The human coders are consistent with our model for a subsample of our data. Moreover, the consistency between the human coders was only marginally higher than their consistency with our dictionary method.

The second validation is based on the semantic distance obtained from the word embeddings, and is similar to the metrics used for unsupervised topic models to find the optimal topics61. We calculate a ‘quality’ measure that describes the semantic similarity of words within a topic compared with words of other topics. The underlying idea is that the keywords defining a topic should be closely related compared with keywords from other topics. The quality measure is the fraction of the intra-topic coherence and the inter-topic similarity. For all topics, we found that the quality measure was greater than unity, which means that the intra-topic coherence is larger than the inter-topic similarity. A detailed description of both validation approaches can be found in Supplementary Methods 1.

Our validation with human coding is similar to qualitative methods that researchers have relied on to classify open-ended survey responses before advances in NLP. Our semi-automated approach has, however, several advantages over manual approaches. It is scalable, which means that the additional time to construct a dictionary decreases with more answers. Future surveys that contain the same open-ended question can use the same dictionary to analyse the responses. By contrast, a manual classification must start from scratch for a new survey with the same open-ended question. The more topics a researcher must consider for manual classification, the more challenging the classification becomes. Our keyword approach is less affected by the number of topics because the researcher selects keywords from a pre-compiled list of similar words.

To classify text into topics, researchers differentiate between text-classification methods with known and unknown categories56. The most commonly used method for text analysis consists of topic models that detect categories in the text unknown to the researcher. Among these, latent Dirichlet allocation (LDA) is the most commonly used method62.

LDA is a topic model that assumes the following text generation process: first, each document is generated by sampling topics from a topic distribution. Conditional on the sampled topics, words are sampled from each topic-word distribution. This unsupervised method is very efficient with large texts and without any additional covariates about the text.

However, some of the characteristics of LDA make it unsuitable for deriving the exact topics from open-ended survey responses; with short text samples and a small number of documents, LDA tends to perform poorly63. Moreover, if the researcher aims to recover many topics, LDA risks giving results that human readers cannot interpret64. Open-ended survey responses fall into that category because they consist of small samples and short texts. For this reason, LDA and similar topic models may not give the exact distributions of topics from open-ended survey responses65. Other topic models that are based on LDA share these characteristics, for example, the Structural Topic Model66, LDA2Vec67 or Top2Vec68.

Finally, because LDA is an unsupervised model, it does not consider the information from the open-ended questions that the respondents provide. For instance, when asked why they performed a retrofit, some respondents answered that they exchanged their gas heating because it was no longer functioning. Others answered that they installed new windows because the old ones were broken. In both cases, the relevant topic that answers the question would be that these respondents renovated because they had to exchange broken parts of the building. An unsupervised model, however, cannot consider this information and will instead focus on the parts of the response that describe which building parts were exchanged, that is, the gas heating or the windows.

Classification methods with unknown categories, such as LDA, can be differentiated from methods where the categories are known to the researcher56. Our method to classify the open-ended responses is based on a dictionary approach, where the categories are known to the researcher. Dictionary approaches rely on pre-chosen keywords that can describe sentiments or topics. Consequently, a text containing these keywords will be associated with a specific sentiment or a topic. By contrast, machine-learning methods use the linguistic features of a text69. Because we use machine-learning methods to construct the dictionary, we consider our method to be a hybrid approach between the two69.

In contrast to machine-learning-based methods, our dictionary approach provides an exact distribution of topics. Moreover, multiple topics can be associated with text answers with the same high precision. Furthermore, the categories are transparent because they depend on an accessible list of keywords. Finally, once a dictionary has been compiled, it can be used for a future survey that relies on the same open-ended question. Overall, our approach thus provides a way to move forward and address the inconsistencies in explaining renovation decisions11. We elicit the barriers and determinants of energy efficiency investments using a scalable and easy-to-replicate questionnaire across a large set of domains and contexts.

Ethics statement

This research was approved by the Ethics Commission of ETH Zurich (EK 2019-N-130). Survey respondents gave informed consent to their participation.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.