Background & Summary

Occupants have a significant influence on their indoor environment and its energy use through their presence and interactions with the building envelope and control system1,2,3,4,5. Factors driving occupants-building interactions are linked to either the intention to adjust indoor environmental parameters (e.g. relating to thermal (dis-)comfort), or to non-environmental factors such as leaving the room6. The perception of the thermal indoor environment is one important driving factor for actions, including adjustment of heating or cooling set points, or opening windows6, which can be described by the adaptive principle: “If a change occurs that produces discomfort, people tend to act to restore their comfort”7. Hence, understanding thermal (dis-)comfort is crucial for appropriate design decisions and choosing suitable operation modes in buildings.

According to the widely-used definition by the American Society for Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE), “Thermal comfort is the condition of mind that expresses satisfaction with the thermal environment and is assessed by subjective evaluation”8. Consequently, rating scales are often used for this assessment. Whether a specific set of thermal conditions can be considered comfortable is commonly determined via simple thermal sensation ratings (“cold” to “hot”)9. In parallel, additional dimensions of thermal perception are known and applied10; e.g. affective evaluation (“comfortable” to “extremely uncomfortable”), thermal preference (“cooler” to “warmer”), or personal acceptance (“generally acceptable” to “generally unacceptable”).

The Scales Project aims at investigating participants’ concept relating to verbal anchors of thermal sensation, thermal comfort, and thermal acceptance scales and to review the validity of existing assumptions (see below) regarding the interpretation of responses on these scales. The dataset consists of data from a large-scale international survey applying a newly developed questionnaire, which asks survey participants to state their perceived distance between the verbal anchors. The questionnaire was applied in 21 language versions. Surveys were conducted in 57 cities in 30 countries resulting in a dataset encompassing responses from 8225 participants (Fig. 1). Because individual inputs are available for each dimension of thermal perception, potential analyses and their statistical power benefit from the within-subject nature of this questionnaire.

Fig. 1
figure 1

Applications and participants. Places of application of questionnaires (red dots) and places of origin of the 8225 participants (blue diamonds) in this study.

Following the project’s objective, this dataset can be used to analyse the conceptual relationships between verbal anchors of one scale, or between one or more scales. For example, thermal sensation is most frequently assessed using the seven-point ASHRAE thermal sensation scale, with the verbal anchors “cold”, “cool”, “slightly cool”, “neutral”, “slightly warm”, “warm”, and “hot”. A common assumption related to the application of the thermal sensation scale is the assumption of equidistance, meaning the difference between “warm” and “hot” is equal to that between “warm” and “slightly warm”. However, research has questioned this assumption9,11,12,13,14. Hence, the applicability of statistical methods relying on them (e.g. linear regression) needs to be questioned. Beyond reviewing the validity of this assumption, the newly developed questionnaire also enables to analyze the influence of different contexts (e.g. language, climate, and season) and characteristics of individuals (e.g. sex). Further assumptions existing for other dimensions of thermal perception, here: thermal comfort and thermal acceptance, can be assessed also.

Another important assumption in the field of thermal comfort to be reviewed through our dataset, postulates that occupants would be satisfied with the indoor thermal conditions, if they chose one of the middle three verbal anchors of the ASHRAE thermal sensation scale (“slightly cool”, “neutral”, or “slightly warm”). In other words, ‘neutrality’ is assumed to be a desired condition. Various studies have shown individual and contextual differences not supporting this assumption9,12,13,15,16,17. In particular, researchers repeatedly identify a discrepancy in users who declare satisfaction while feeling warm or cool18,19,20,21.

In addition, the data can also be used for traditional thermal comfort explorations, with more than 5,031 of the datasets include at least one measurement of indoor temperature. Furthermore, the detailed description and availability of questionnaires, available in multiple languages, can serve as a benchmark for future thermal comfort studies and permit replication to other contexts, for example libraries or offices, or other cohorts such as office workers or older people.


Questionnaire development and pilot study

Within the framework of IEA EBC Annex 69, an international and interdisciplinary group consisting of 7 independent research groups in 6 countries (Australia, China, Germany, Korea, Sweden, and United Kingdom) – the initial core group – developed the methods applied in the present paper, based on promising results from an experimental study14. This work included several rounds of face-to-face discussions email conversations as well as an online survey. The details of these discussions together with the description of the methods were submitted and registered to the Open Science Framework (OSF) as a pre-analysis plan (PAP)22. At the time of submission of the PAP, one application of the questionnaire had been conducted, but the corresponding questionnaires were securely stored and untouched until the moment of submission of the PAP.

The initial core group also developed the questionnaire in an English version. Mandarin (with simplified Chinese), German, Korean and Swedish translations were subsequently prepared by native language experts familiar with the concepts used in the questionnaire.

Each group piloted the initial version of the questionnaire according to the following procedure: The questionnaire was applied without further explanation to at least 7 individuals, of which 2 had to be experienced in the field of human thermal comfort. After collecting the questionnaires, researchers discussed with the participants the length of the questionnaire, the clarity of instructions, and issues when filling out the questionnaire. The observations made through these applications were discussed among the core group and reflected in the revised and final versions of the questionnaire.

Expansion of research group

After agreeing on a final version of the questionnaire and submitting the PAP, the initial core group reached out to other researchers in the field through existing networks, such as the Network for Comfort and Energy Use in Buildings (NCEUB) (, and personal contacts. Additional researchers had to sign a co-author agreement and guarantee to follow the procedures prescribed for data collection (see below). In case researchers used other languages than the above mentioned, they had to translate the questionnaire into their language following the same procedure as initially applied – including piloting the new language version with at least 7 individuals beforehand. The number of 7 individuals was based on observations by the initial core group during the first pilot phase revealing that the number of issues raised by test participants concerning the questionnaire do not increase substantially with a higher number of initial respondents. In addition, the project leader checked the questionnaire with respect to the formalities. The final consortium consisted of 56 research groups from 30 countries with a total of 94 individual researchers.

Survey participants

Respondents were university students attending lectures, because they were expected to have only minor variations in age and activity level, supporting the focus on our targeted contextual differences. It was a requirement that the students had not participated in lectures addressing the concept of human thermal comfort. Each respondent could only participate once.


The questionnaire consists of an introductory page, the two-page main part dealing with the scales and a fourth page addressing the respondents’ background and current thermal state (see all language versions in the online repository sites23). The main part used a newly developed free-positioning task, where participants were asked to position the verbal anchors on a straight line (Fig. 2). In the questionnaire scales relating to thermal sensation, thermal comfort and thermal acceptance were investigated. The first questions prompted participants to process each of these scales individually. Later questions addressed the relationship between (1) thermal sensation and thermal comfort; and (2) thermal sensation and thermal acceptance. Verbal anchors were chosen according to ISO 1055110. ISO 10551 and many thermal comfort studies also use a preference scale ranging for example from “prefer cooler” to “prefer warmer”. This scale was not used for this study because pilot studies suggested that this scale tends to be misinterpreted by respondents as also pointed out earlier24.

Fig. 2
figure 2

Exemplary response to one of the free-positioning tasks used in the main part of the questionnaire. Participants were asked to position the verbal anchors shown to the left on the empty line with verbal anchors at extreme positions only. Grey lines and letters present the drawing by one participant. The complete question together with instructions and examples for participants is available online23.

In addition to questions relating to thermal sensation, thermal comfort and thermal acceptance scales (Part 1 of the questionnaire), respondents were asked about their current thermal state and background (Part 2 of the questionnaire). Countries and cities of participants’ origin and residence were collated to identify potential adaptations to climatic conditions at the locations where the questionnaires were administered and where participants were living beforehand. See Online-only Table 1 for a full list of variables included in dataset and their source.

Survey procedure

In each country the questionnaire had to be distributed at least twice during two distinct seasons. The requirement for two distinct seasons was lowered for places with only minor variations in outdoor weather conditions throughout the year. Data were collected from a minimum of 100 respondents per country (a minimum of 50 per season).

The following conditions had to be followed for the distribution of the questionnaires:

  • Timing: at the end or if necessary during classes, when participants had been seated at least 30 minutes

  • Form: paper-pencil

  • Language: local language (in case of large groups of foreigners in a country/class (e.g. Chinese in Korea), researchers were free to distribute more than one language version.

On a separate sheet, researchers noted the following additional information:

  • City and Country of survey

  • Date, start time of distribution and end time of collection

  • Number of questionnaires distributed

  • Number of questionnaires received back

  • Observations made during survey distribution and collection: e.g. “very high noise levels” or “at day of survey it was unnaturally warm for this time of the year”

  • Classification of season: This classification was done without any predefined categories and based on individual researchers’ decision. The researchers used terms for seasons according to typical terms used at their location. Future users of this dataset, who may plan to include such variable into their analysis, can decide whether they follow the classifications given in the dataset or create their own classifications e.g. based on prevailing outdoor conditions, the date of application, KG class, or other information.

In addition, researchers acquired data of the outdoor conditions (outdoor temperature and humidity) from close-by weather stations (either owned by the researchers, available to researchers, or using public sources) and optionally recorded the indoor conditions during the distribution period. Despite their significant influence on thermal perception, indoor conditions were made optional for the following two reasons.

For the first, the main purpose of the Scales Project was not that of a classic thermal comfort study aiming at the analysis of the relationship between indoor thermal conditions (and other factors) and thermal perception assessed through thermal perception scales. The main objective of this study was to reveal participants’ understanding and interpretation of verbal anchors on the scales. The assumption was that they were affected by the prevailing conditions such as seasonal differences or immediate outdoor conditions as well as by an individual’s actual thermal state.

For the second, the methodological intention was to maintain a low level of constraints for additional researchers to join this project. Given the aim of a large response rate from a variety of climates and geographical contexts, a decision was made that the availability of measurement equipment should not be a prerequisite for joining. In addition, classical thermal comfort analysis requires the measurement of indoor air temperature, radiant temperature, relative humidity, and air velocity together with the assessment of clothing insulation level. Due to the place of application being university class rooms, temperature distributions in terms of air temperature and also mean radiant temperature could be expected to vary largely among individual positions in a large classroom, e.g. close-by or further away from windows or air outlets. Measuring thermal conditions at each participants’ seat would have required substantial amount of equipment significantly limiting the number of participating researchers.

Therefore, a decision was made to have indoor thermal conditions not mandatory and to focus on the assessment of participants’ self-reported thermal state. Future users of this data set should be aware of the limitations. Those planning to use recorded indoor air temperatures can still use large parts of the dataset, as 5,031 questionnaires include at least one measurement of indoor air temperature.

Data preparation

Individual research groups prepared the data from their questionnaires and submitted for each application two files: one containing the data transferred from the questionnaire, one containing the additional information for each application. The positions of the labels drawn in the free positioning task were quantified using a ruler and measuring the distance of the positioned label to the left end of each horizontal line.

Upon reception of a dataset, the project leader validated the dataset by means of an automated script (see section Technical validation). In addition, the project leader made the following adjustments to harmonize the data and added further variables by means of an R script available online23 (see section “Custom code used” below):


  • Researchers participating in this study were advised to print the questionnaire, so that each line representing a linear scale was exactly 100 mm long. This would result in measured distances of verbal anchors between 0 mm and 100 mm. However, there were several cases where the printouts were slightly distorted, i.e. shorter or longer. The real length of the lines in the printouts was reported with the information for each application. Based on this information, the measured values were adjusted for the ratio between the real length of the line in the printed version and the prescribed length of 100 mm.

  • Date and time formats were harmonized.

  • Season descriptions were harmonized.

    Additional variables:

  • KG class: Koeppen-Geiger (KG) classifications were derived for the place of survey (provided by the researcher), and the places of current residence, previous residence, and origin (as stated by the participants). To obtain the KG class for each combination of city and country, the KG world map (Version March 2017) provided for R ( was used. This map is based on data from 1986 to 2010 and is the re-analysed KG map with a resolution of 5 arc minutes using the downscaling algorithms25. In order to obtain the KG class automatically, the latitude and longitude were first derived based on Nominatim, the search engine for OpenStreetMap data (, then converted to the pixel number of the map.

  • Language type: The verbal anchors differ in their type between languages (see also10). In some languages, e.g. English, two adjectives are used on the cool and warm side of the scale, respectively, e.g. “warm” and “hot”. Data entries from these languages were assigned the language type “2”. Other languages, e.g. Portuguese, use only one adjective on each side, e.g. “frio”. These are language type “1”. In addition, few languages use either two adjectives on the hot side and one on the cold side (“3h”) or vice versa (“3c”).

  • Adaptation level: Depending on the answers to the places of current and previous residence, this variable had the levels: “low”, “middle”, or “high”. The coding was based on the length of residency and the KG classes of current and previous place of living. “Low” denotes that the respondent was living less than a year in the current KG class and that the previous place had a different KG class. “Middle” was assigned to those living 1 to 3 years in the current KG class, but a different one before. All others, i.e. living more than 3 years in the current KG class are “high”.

  • Native: The variable native speaker was a binary variable (yes/no) generated. Responses are marked as “yes”, in case the language of the questionnaire is equal to (one of) the language(s) spoken in the country of origin of the respondent. All other responses were marked as “no”.

  • Country of residence plausible: Participants reported their country of residence. This record was compared to the country of application noted by the researcher. In case these two countries differed (52 responses), the new variable “Country of residence plausible” was set to “no”, otherwise “yes”.

The variables available in the dataset, including their measurement scale, and levels (if applicable) are presented in Online-only Table 1.

Ethics and consent

Ethic approvals were acquired where institutional or national requirements made it necessary, such as the Institutional Review Board’s approval. Informed consent was obtained from all subjects before conducting the survey.

Data Records

All data records listed in this section are available from the project page23 on OSF and can be downloaded without an OSF account. The information regarding the cities of current residence, previous residence and origin were removed, as they can serve to identify an individual participant. The R script used for pre-processing the semi-raw data is also available. The data were licensed under a CC0 1.0 Universal license.

Data structure

All Datasets

File format: comma separated values file (.csv).

These files contain:

  • Individual raw datasets without information on cities

    • survData: file containing participants responses

    • survInfo: file containing researcher observations

    • summary reports: Summary report by researcher for each application

  • One changelog-File: Record of changes made to raw datasets

  • Codebooks for each raw data

  • Combined dataset including all information from survData and survInfo file together with additional variables.


File format: Adobe pdf (.pdf).

These files contain the 21 language versions used.

Data templates

File format: Microsoft Excel (.xlsx).

These files contain the data templates used for data collection.

R Scripts

File format: plain text files (.R)

These files contain the R Scripts used for processing the raw datasets, technical validation, and additional variables.

Technical Validation

Incoming datasets were rigorously checked in several steps:

  1. (1)

    Visual inspection of the datasets, whether they comply with the prescribed formats. If not, datasets were send back to researchers.

  2. (2)

    Semi-automated R Script to validate

    1. a.

      Spelling of country names, city names, and language codes

    2. b.

      Whether KG class can be derived automatically (otherwise KG class was added manually)

    3. c.

      Whether data points are within the expected range (e.g. relative humidity between 0 and 100%), or one of the available categories (e.g. only one of 7 age groups)

  3. (3)

    Additional checks. After combining all datasets, combinations of data points were validated. For example, it can be expected that the verbal anchors of the sensation scale are drawn in the right order, i.e. from cold to hot. Responses, where the verbal anchors for the sensation scale were not in this order were flagged. In addition, outliers were flagged in case a multi-variate regression analysis detected them as outliers. The project leader informed researchers of those data points being flagged and requested an additional check of the original questionnaire. These validations looking for data consistency revealed that the answer patterns in questions 1a, 2a, and 3a, for 266, 307, and 84 questionnaires, respectively, did not meet the researchers’ expectations. The additional checks showed, that one researcher did not understand parts of the instruction of data entry correctly and repeated the data entry again. In the other cases, only 49, 34, 5 values were not correctly transferred, i.e. in 81.6%, 88.9%, 94%, the data was correctly transferred. In case there were more than 10% of data points flagged per one application, the project leader checked the validity of data entries at a random base based on scans of original questionnaires provided by the submitting researchers.

Usage Notes

For further analyses, it is recommended to use the final dataset on OSF and not the individual raw datasets, because revisions by data providers have only be made on the final dataset.

In total, 9111 questionnaires were distributed and 8225 responses collected (90% response rate). Note that the dataset provided consists of all 8225 questionnaires submitted. The authors did not want to exclude any questionnaire from the dataset based on specific exclusion criteria, e.g. outlier definition or completeness of questionnaire. Future users of the dataset can make their own decisions, which questionnaires they consider as valid or not. Any exclusion criteria from the authors side, would limit the freedom of future users to make such decision.