The Scales Project, a cross-national dataset on the interpretation of thermal perception scales

Thermal discomfort is one of the main triggers for occupants’ interactions with components of the built environment such as adjustments of thermostats and/or opening windows and strongly related to the energy use in buildings. Understanding causes for thermal (dis-)comfort is crucial for design and operation of any type of building. The assessment of human thermal perception through rating scales, for example in post-occupancy studies, has been applied for several decades; however, long-existing assumptions related to these rating scales had been questioned by several researchers. The aim of this study was to gain deeper knowledge on contextual influences on the interpretation of thermal perception scales and their verbal anchors by survey participants. A questionnaire was designed and consequently applied in 21 language versions. These surveys were conducted in 57 cities in 30 countries resulting in a dataset containing responses from 8225 participants. The database offers potential for further analysis in the areas of building design and operation, psycho-physical relationships between human perception and the built environment, and linguistic analyses.

www.nature.com/scientificdata www.nature.com/scientificdata/ Following the project's objective, this dataset can be used to analyse the conceptual relationships between verbal anchors of one scale, or between one or more scales. For example, thermal sensation is most frequently assessed using the seven-point ASHRAE thermal sensation scale, with the verbal anchors "cold", "cool", "slightly cool", "neutral", "slightly warm", "warm", and "hot". A common assumption related to the application of the thermal sensation scale is the assumption of equidistance, meaning the difference between "warm" and "hot" is equal to that between "warm" and "slightly warm". However, research has questioned this assumption 9,[11][12][13][14] . Hence, the applicability of statistical methods relying on them (e.g. linear regression) needs to be questioned. Beyond reviewing the validity of this assumption, the newly developed questionnaire also enables to analyze the influence of different contexts (e.g. language, climate, and season) and characteristics of individuals (e.g. sex). Further assumptions existing for other dimensions of thermal perception, here: thermal comfort and thermal acceptance, can be assessed also.
Another important assumption in the field of thermal comfort to be reviewed through our dataset, postulates that occupants would be satisfied with the indoor thermal conditions, if they chose one of the middle three verbal anchors of the ASHRAE thermal sensation scale ("slightly cool", "neutral", or "slightly warm"). In other words, 'neutrality' is assumed to be a desired condition. Various studies have shown individual and contextual differences not supporting this assumption 9,12,13,[15][16][17] . In particular, researchers repeatedly identify a discrepancy in users who declare satisfaction while feeling warm or cool [18][19][20][21] .
In addition, the data can also be used for traditional thermal comfort explorations, with more than 5,031 of the datasets include at least one measurement of indoor temperature. Furthermore, the detailed description and availability of questionnaires, available in multiple languages, can serve as a benchmark for future thermal comfort studies and permit replication to other contexts, for example libraries or offices, or other cohorts such as office workers or older people.

Methods
Questionnaire development and pilot study. Within the framework of IEA EBC Annex 69, an international and interdisciplinary group consisting of 7 independent research groups in 6 countries (Australia, China, Germany, Korea, Sweden, and United Kingdom) -the initial core group -developed the methods applied in the present paper, based on promising results from an experimental study 14 . This work included several rounds of face-to-face discussions email conversations as well as an online survey. The details of these discussions together with the description of the methods were submitted and registered to the Open Science Framework (OSF) as a pre-analysis plan (PAP) 22 . At the time of submission of the PAP, one application of the questionnaire had been conducted, but the corresponding questionnaires were securely stored and untouched until the moment of submission of the PAP.
The initial core group also developed the questionnaire in an English version. Mandarin (with simplified Chinese), German, Korean and Swedish translations were subsequently prepared by native language experts familiar with the concepts used in the questionnaire.
Each group piloted the initial version of the questionnaire according to the following procedure: The questionnaire was applied without further explanation to at least 7 individuals, of which 2 had to be experienced in the field of human thermal comfort. After collecting the questionnaires, researchers discussed with the participants the length of the questionnaire, the clarity of instructions, and issues when filling out the questionnaire. The observations made through these applications were discussed among the core group and reflected in the revised and final versions of the questionnaire. expansion of research group. After agreeing on a final version of the questionnaire and submitting the PAP, the initial core group reached out to other researchers in the field through existing networks, such as the Network for Comfort and Energy Use in Buildings (NCEUB) (http://nceub.org.uk/), and personal contacts. www.nature.com/scientificdata www.nature.com/scientificdata/ Additional researchers had to sign a co-author agreement and guarantee to follow the procedures prescribed for data collection (see below). In case researchers used other languages than the above mentioned, they had to translate the questionnaire into their language following the same procedure as initially applied -including piloting the new language version with at least 7 individuals beforehand. The number of 7 individuals was based on observations by the initial core group during the first pilot phase revealing that the number of issues raised by test participants concerning the questionnaire do not increase substantially with a higher number of initial respondents. In addition, the project leader checked the questionnaire with respect to the formalities. The final consortium consisted of 56 research groups from 30 countries with a total of 94 individual researchers.

Survey participants.
Respondents were university students attending lectures, because they were expected to have only minor variations in age and activity level, supporting the focus on our targeted contextual differences. It was a requirement that the students had not participated in lectures addressing the concept of human thermal comfort. Each respondent could only participate once.
Questionnaire. The questionnaire consists of an introductory page, the two-page main part dealing with the scales and a fourth page addressing the respondents' background and current thermal state (see all language versions in the online repository sites 23 ). The main part used a newly developed free-positioning task, where participants were asked to position the verbal anchors on a straight line (Fig. 2). In the questionnaire scales relating to thermal sensation, thermal comfort and thermal acceptance were investigated. The first questions prompted participants to process each of these scales individually. Later questions addressed the relationship between (1) thermal sensation and thermal comfort; and (2) thermal sensation and thermal acceptance. Verbal anchors were chosen according to ISO 10551 10 . ISO 10551 and many thermal comfort studies also use a preference scale ranging for example from "prefer cooler" to "prefer warmer". This scale was not used for this study because pilot studies suggested that this scale tends to be misinterpreted by respondents as also pointed out earlier 24 .
In addition to questions relating to thermal sensation, thermal comfort and thermal acceptance scales (Part 1 of the questionnaire), respondents were asked about their current thermal state and background (Part 2 of the questionnaire). Countries and cities of participants' origin and residence were collated to identify potential adaptations to climatic conditions at the locations where the questionnaires were administered and where participants were living beforehand. See Online-only Table 1 for a full list of variables included in dataset and their source. Survey procedure. In each country the questionnaire had to be distributed at least twice during two distinct seasons. The requirement for two distinct seasons was lowered for places with only minor variations in outdoor weather conditions throughout the year. Data were collected from a minimum of 100 respondents per country (a minimum of 50 per season).
The following conditions had to be followed for the distribution of the questionnaires: • Timing: at the end or if necessary during classes, when participants had been seated at least 30 minutes • Form: paper-pencil • Language: local language (in case of large groups of foreigners in a country/class (e.g. Chinese in Korea), researchers were free to distribute more than one language version.
On a separate sheet, researchers noted the following additional information: • City and Country of survey • Date, start time of distribution and end time of collection • Number of questionnaires distributed • Number of questionnaires received back • Observations made during survey distribution and collection: e.g. "very high noise levels" or "at day of survey it was unnaturally warm for this time of the year" • Classification of season: This classification was done without any predefined categories and based on individual researchers' decision. The researchers used terms for seasons according to typical terms used at their location. Future users of this dataset, who may plan to include such variable into their analysis, can decide whether they follow the classifications given in the dataset or create their own classifications e.g. based on prevailing outdoor conditions, the date of application, KG class, or other information. www.nature.com/scientificdata www.nature.com/scientificdata/ In addition, researchers acquired data of the outdoor conditions (outdoor temperature and humidity) from close-by weather stations (either owned by the researchers, available to researchers, or using public sources) and optionally recorded the indoor conditions during the distribution period. Despite their significant influence on thermal perception, indoor conditions were made optional for the following two reasons.
For the first, the main purpose of the Scales Project was not that of a classic thermal comfort study aiming at the analysis of the relationship between indoor thermal conditions (and other factors) and thermal perception assessed through thermal perception scales. The main objective of this study was to reveal participants' understanding and interpretation of verbal anchors on the scales. The assumption was that they were affected by the prevailing conditions such as seasonal differences or immediate outdoor conditions as well as by an individual's actual thermal state.
For the second, the methodological intention was to maintain a low level of constraints for additional researchers to join this project. Given the aim of a large response rate from a variety of climates and geographical contexts, a decision was made that the availability of measurement equipment should not be a prerequisite for joining. In addition, classical thermal comfort analysis requires the measurement of indoor air temperature, radiant temperature, relative humidity, and air velocity together with the assessment of clothing insulation level. Due to the place of application being university class rooms, temperature distributions in terms of air temperature and also mean radiant temperature could be expected to vary largely among individual positions in a large classroom, e.g. close-by or further away from windows or air outlets. Measuring thermal conditions at each participants' seat would have required substantial amount of equipment significantly limiting the number of participating researchers.
Therefore, a decision was made to have indoor thermal conditions not mandatory and to focus on the assessment of participants' self-reported thermal state. Future users of this data set should be aware of the limitations. Those planning to use recorded indoor air temperatures can still use large parts of the dataset, as 5,031 questionnaires include at least one measurement of indoor air temperature.
Data preparation. Individual research groups prepared the data from their questionnaires and submitted for each application two files: one containing the data transferred from the questionnaire, one containing the additional information for each application. The positions of the labels drawn in the free positioning task were quantified using a ruler and measuring the distance of the positioned label to the left end of each horizontal line.
Upon reception of a dataset, the project leader validated the dataset by means of an automated script (see section Technical validation). In addition, the project leader made the following adjustments to harmonize the data and added further variables by means of an R script available online 23 (see section "Custom code used" below): Adjustments: • Researchers participating in this study were advised to print the questionnaire, so that each line representing a linear scale was exactly 100 mm long. This would result in measured distances of verbal anchors between 0 mm and 100 mm. However, there were several cases where the printouts were slightly distorted, i.e. shorter or longer. The real length of the lines in the printouts was reported with the information for each application. Based on this information, the measured values were adjusted for the ratio between the real length of the line in the printed version and the prescribed length of 100 mm. • Date and time formats were harmonized.
• Season descriptions were harmonized.
Additional variables: • KG class: Koeppen-Geiger (KG) classifications were derived for the place of survey (provided by the researcher), and the places of current residence, previous residence, and origin (as stated by the participants).
To obtain the KG class for each combination of city and country, the KG world map (Version March 2017) provided for R (http://koeppen-geiger.vu-wien.ac.at/present.htm) was used. This map is based on data from 1986 to 2010 and is the re-analysed KG map with a resolution of 5 arc minutes using the downscaling algorithms 25 . In order to obtain the KG class automatically, the latitude and longitude were first derived based on Nominatim, the search engine for OpenStreetMap data (http://nominatim.openstreetmap.org), then converted to the pixel number of the map. • Language type: The verbal anchors differ in their type between languages (see also 10 ). In some languages, e.g.
English, two adjectives are used on the cool and warm side of the scale, respectively, e.g. "warm" and "hot". Data entries from these languages were assigned the language type "2". Other languages, e.g. Portuguese, use only one adjective on each side, e.g. "frio". These are language type "1". In addition, few languages use either two adjectives on the hot side and one on the cold side ("3h") or vice versa ("3c"). • Adaptation level: Depending on the answers to the places of current and previous residence, this variable had the levels: "low", "middle", or "high". The coding was based on the length of residency and the KG classes of current and previous place of living. "Low" denotes that the respondent was living less than a year in the current KG class and that the previous place had a different KG class. "Middle" was assigned to those living 1 to 3 years in the current KG class, but a different one before. All others, i.e. living more than 3 years in the current KG class are "high". • Native: The variable native speaker was a binary variable (yes/no) generated. Responses are marked as "yes", in case the language of the questionnaire is equal to (one of) the language(s) spoken in the country of origin of the respondent. All other responses were marked as "no". • Country of residence plausible: Participants reported their country of residence. This record was compared to the country of application noted by the researcher. In case these two countries differed (52 responses), the new variable "Country of residence plausible" was set to "no", otherwise "yes".
www.nature.com/scientificdata www.nature.com/scientificdata/ The variables available in the dataset, including their measurement scale, and levels (if applicable) are presented in Online-only Table 1. ethics and consent. Ethic approvals were acquired where institutional or national requirements made it necessary, such as the Institutional Review Board's approval. Informed consent was obtained from all subjects before conducting the survey.

Data records
All data records listed in this section are available from the project page 23 on OSF and can be downloaded without an OSF account. The information regarding the cities of current residence, previous residence and origin were removed, as they can serve to identify an individual participant. The R script used for pre-processing the semi-raw data is also available. The data were licensed under a CC0 1.0 Universal license. Data structure. All Datasets. File format: comma separated values file (.csv).
These files contain: • Individual raw datasets without information on cities • survData: file containing participants responses • survInfo: file containing researcher observations • summary reports: Summary report by researcher for each application • One changelog-File: Record of changes made to raw datasets • Codebooks for each raw data • Combined dataset including all information from survData and survInfo file together with additional variables.
Questionnaires. File format: Adobe pdf (.pdf). These files contain the 21 language versions used.
Data templates. File format: Microsoft Excel (.xlsx). These files contain the data templates used for data collection.
R Scripts. File format: plain text files (.R) These files contain the R Scripts used for processing the raw datasets, technical validation, and additional variables.

technical Validation
Incoming datasets were rigorously checked in several steps: (1) Visual inspection of the datasets, whether they comply with the prescribed formats. If not, datasets were send back to researchers. (2) Semi-automated R Script to validate a. Spelling of country names, city names, and language codes b. Whether KG class can be derived automatically (otherwise KG class was added manually) c. Whether data points are within the expected range (e.g. relative humidity between 0 and 100%), or one of the available categories (e.g. only one of 7 age groups) (3) Additional checks. After combining all datasets, combinations of data points were validated. For example, it can be expected that the verbal anchors of the sensation scale are drawn in the right order, i.e. from cold to hot. Responses, where the verbal anchors for the sensation scale were not in this order were flagged. In addition, outliers were flagged in case a multi-variate regression analysis detected them as outliers. The project leader informed researchers of those data points being flagged and requested an additional check of the original questionnaire. These validations looking for data consistency revealed that the answer patterns in questions 1a, 2a, and 3a, for 266, 307, and 84 questionnaires, respectively, did not meet the researchers' expectations. The additional checks showed, that one researcher did not understand parts of the instruction of data entry correctly and repeated the data entry again. In the other cases, only 49, 34, 5 values were not correctly transferred, i.e. in 81.6%, 88.9%, 94%, the data was correctly transferred. In case there were more than 10% of data points flagged per one application, the project leader checked the validity of data entries at a random base based on scans of original questionnaires provided by the submitting researchers.

Usage Notes
For further analyses, it is recommended to use the final dataset on OSF and not the individual raw datasets, because revisions by data providers have only be made on the final dataset. In total, 9111 questionnaires were distributed and 8225 responses collected (90% response rate). Note that the dataset provided consists of all 8225 questionnaires submitted. The authors did not want to exclude any www.nature.com/scientificdata www.nature.com/scientificdata/ questionnaire from the dataset based on specific exclusion criteria, e.g. outlier definition or completeness of questionnaire. Future users of the dataset can make their own decisions, which questionnaires they consider as valid or not. Any exclusion criteria from the authors side, would limit the freedom of future users to make such decision. code availability R software 26 was used for all steps requiring data manipulation described above in section Data preparation and below in section Technical validation. Custom code was developed for these steps and is made available (see Code availability below). The custom code consists of three main scripts with additional functions loaded when running the scripts.
The first script (ScalesSurv_00_DataPreparationAndChecks.r) contains the steps for initial data screening and calculating additional variables for individual datasets.
The second script (ScalesSurv_01_LoadAllDatasets_compl_final.R) combines individual datasets and harmonizes some factor levels, e.g. changing all season descriptors "fall" to "autumn" for consistency with other descriptors.
The third script (ScalesSurv_02_Prepare_Data_final.R) is used for additional data preparations, such as additional harmonisations and adjustments to individual data points based on the results from technical validation.