Combining stochastic models of air temperature and vapour pressure for the analysis of the bioclimatic comfort through the Humidex

Several studies evidenced the importance of the knowledge of the bioclimatic comfort for improving people’s quality of life. Temperature and relative humidity are the main variables related to climatic comfort/discomfort, influencing the environmental stress in the human body. In this study, a stochastic approach is proposed for characterizing the bioclimatic conditions through the Humidex values in six sites of Calabria (southern Italy), a region often hit by heat waves in summer months. The stochastic approach is essential, because the available time series of temperature and relative humidity are not long enough and present several missing values. The model allowed the characterization of sequences of extreme values of the Humidex. Results showed different behaviours between inner and coastal stations. For example, a sequence of 20 consecutive days with maximum daily Humidex values greater than 35 has a return period ranging from 10 to 20 years for the inner stations, while it exceeds 100 years for the coastal ones. The maximum yearly Humidex values for the inner stations have a larger range (40–50) than the coastal ones (38–45), reaching higher occurrence probabilities of serious danger conditions. Besides, the different influence of temperature and relative humidity on the Humidex behaviour has been evidenced.

where e sat depends on the air temperature alone, and can be calculated through the Tetens' formula 35 : The Humidex has no specific measurement unit, anyway it can be associated to the same unit of the temperature (°C), though it is not a physical variable. As a result, the temperature perceived by human body can be easily found by using the observed values of temperature and relative humidity in Eqs. (1)-(3), and detecting the discomfort level corresponding to the Humidex value (Table 1).
Stochastic models. The stochastic approach here proposed analyses the couple temperature-partial pressure of water vapour at the daily scale, whose data have been evaluated starting from an hourly dataset. In particular, given the higher influence of the temperature in the evaluation of the Humidex, for each day the maximum hourly temperature value and the corresponding humidity data have been considered as daily value.
Since the partial pressure of the water vapour is always positive, its natural logarithm has been considered, Le(t) = ln(e(t)), in order to model couples of T-ln(e) values whose elements can be more easily treated through the linear stochastic modelling. In particular, the sequences of daily temperature, T(i), and partial pressure of (1) H = T + 5 9 (e − 10), www.nature.com/scientificreports/ water vapour, Le(i), of the i-day starting from a generic point (i = 0,1,2,…), can be generally recognized as a realization of discrete parameter stochastic processes with cyclostationarity features in a period equal to 1 year.
Deseasonalisation and Gaussianization procedures. To deal with the seasonal features of the variables, each of the T(i) and the Le(i) processes can be separately reduced to a weakly stationary standardized process, respectively termed as X(i) and Y(i), through the transformation: where µ T (i) and σ T (i) are the mean and the standard deviation functions of the T(i) process, µ Le (i) and σ Le (i) are the analogous functions of the Le(i) process. In particular, the functions µ T (i) , σ 2 T (i) , µ Le (i) and σ 2 Le (i) can be obtained by means of truncated expansion Fourier series that are composed by one or more harmonics, allowing for different values of the Fourier coefficients, generally estimated through the least squares method. In order to make the better choice about the number of harmonics, also taking into account the parsimony principle, the behaviour of the different Fourier series have to be compared at annual scale to the correspondent observed mean daily values of µ T (i) , σ 2 T (i) , µ Le (i) and σ 2 Le (i) . This can be performed through a simple test, which however requires independent data. To this aim, subsamples from the X(i) and Y(i) series can be extracted for each couple of number of harmonics hypothesized for the mean and the standard deviation functions of T(i) and Le(i) (see Eqs. 4 and 5). Finally, by splitting the subsamples into separate classes with distinct mean and variance values, the hypotheses of equality of the means and the variances of each class can be tested 33 .
The sample values of the random variables X(i) and Y(i) have a null mean value and a unit variance, but generally climatic variables, once deseasonalized, show skewness and kurtosis coefficients significantly different from the theoretical values expected for a normal variable 36 . On the other side, the Gaussianization step is needed for developing a coherent linear stochastic model. To cope with such problems, the deseasonalized variables X(i) and Y(i) can be converted into standardized normal variables, U(i) and V(i), respectively, by means of the transformation functions introduced by Johnson 37 , whose general equation applied to the random variables X(i) and Y(i) provides the following relationships: where − ∞ < η u, η v < + ∞, θ u ,θ v > 0, − ∞ < α u ,α v < + ∞, and β u ,β v > 0 are the parameters of the transformations. The functions f X (x; α u , β u ) and f Y y; α v , β v can assume one of the forms known as unbounded and bounded Johnson transformations, and log-normal law with 3 parameters, depending on the sample values of the skewness (g 1,X and g 1,Y ) and the kurtosis coefficients (g 2,X and g 2,Y ). For details on the statistical procedure used for the estimation of the parameters, see Sirangelo et al. 33 .
Analysis of the correlative structure. Generally, a marked persistence can still characterize the correlative structures of the sample data series U(i) and V(i) obtained from the Johnson transformations. These structures can be explained through FARIMA (fractionally differenced autoregressive integrated moving average) models [38][39][40] , that, with reference only to the variable U, can be described by the following expression: where B is the backward operator, � u,p u (B) is the p-order polynomial of the autoregressive component of the u(i) process, � u,q u (B) is the q-order polynomial of its mean average component, ε u (i) is a sequence of i.i.d. random variables with mean zero and standard deviation σ ε u , and d u is its fractional order of differentiation. The FARIMA(p u , d u , q u ) model adopted for the u(i) process embodies a fractional filter d u and an ARMA(p u , q u ) process, which is assumed to describe the intermediate process: www.nature.com/scientificreports/ For each assigned value of the parameter d u , the sample u(i) can be transformed into a sample u * (i), which represents a realization of the ARMA(p u , q u ) process, that can be expressed as: The Eqs. (8)- (10) can be implemented also for the v(i) process by adopting analogous parameters ( A procedure for the contextual estimation of the parameters of the two processes u(i) and v(i) can now be performed. By setting a value for each fractional filter of the FARIMA models, d u and d v , two intermediate realizations u * (i) and v * (i) of the processes ARMA(p u , q u ) and ARMA(p v , q v ), respectively, can be separately obtained [e.g., Eq. (9) for the u(i) process]. Hence, the parameters ϕ u,1 ,…,ϕ u,pu ,ψ u,1 ,…,ψ u,qu and ϕ v,1 ,…,ϕ v,pv ,ψ v,1 ,…, ψ v,qv of the two ARMA processes can be distinctly evaluated by prefixing the values of the orders (p u , q u ) and (p v , q v ) of the autoregressive and mean average components [e.g., Eq. (10) for the u * (i) process]. Ultimately, varying in a joint way the parameters (p u , q u , d u ) for the u(i) process and (p v , q v , d v ) for the v(i) process, a trial and error technique allows to obtain a correlation value between the errors ε u and ε v of the two ARMA models, ρ ε U * , ε V * , reproducing the sample cross-correlation value observed between the variables U and V, r U,V .
Study area and data. Located at the toe of the Italian peninsula, Calabria has a typically Mediterranean climate. It features sharp contrasts due to both its position within the Mediterranean Sea and its orography. Specifically, warm air currents coming from Africa affect the Ionian side, leading to high temperatures, and to short and heavy precipitation. The Tyrrhenian side, instead, is affected by western air currents, which cause milder temperatures and more intense precipitations if compared to the Ionian side. Cold and snowy winters, and fresh summers with some precipitation, are typical of the inner areas of the region 41,42 .
In this paper, the original database is composed by a set of hourly time series of air temperatures, T (°C), and relative humidity, Ur (%). In particular, six stations managed by the Multi-Risk Functional Centre of the Regional Agency for Environment Protection ( Fig. 1) have been considered. Through a preliminary exploratory analysis focused on the data quality, the hourly values which lead relative humidity to jump from less than 75-100% or viceversa have been detected and consequently discarded from the original data. The main features of the databases are presented in Table 2.

Results
parameter estimation. First, the functions µ T (i) , σ 2 T (i) , µ Le (i) and σ 2 Le (i) have been defined by means of the development of truncated Fourier series. In order to verify the adaptation of these functions, characterized by different numbers of harmonics, Fig. 2 shows the comparison at annual scale of the observed mean daily values and the corresponding functions of µ T (i) and σ 2 T (i) for the Cosenza station, and of µ Le (i) and σ 2 Le (i) to for the Torano Scalo station.
Due to parsimony criteria, for each function, the minimum number of harmonics to be used in the truncated Fourier series has been evaluated as the one which allows not to reject the null hypothesis with a significance level of 5%. As an example, with reference to the Cosenza station, Table 3 shows some trials of the test used for the identification of the minimum number of harmonics of the Fourier expansions. As can be seen, for the functions σ 2 T (i) , µ Le (i) and σ 2 Le (i) the hypothesis of one harmonic cannot be rejected for all the classes. As regards the function µ T (i) , the hypothesis of one harmonic is rejected for the 4th and the 7th classes, thus a further harmonic is needed.
Similarly to the Cosenza station, the number of the harmonics has been estimated for each station. The results, indicated in Table 4, show that in order to remove the periodicity in the mean function µ T (i) , two harmonics are needed for all the stations. For the functions σ 2 T (i) , µ Le (i) and σ 2 Le (i) the number of harmonics varies between 1 and 2. In particular, 2 harmonics are required for Paola and Torano Scalo as regards σ 2 T (i) , for Castrovillari, Reggio Calabria and Torano Scalo with respect to µ Le (i) , and for Castrovillari when the function The gaussianisation procedure, performed through the Johnson transformation, has been applied to the deseasonalised data series X(i) and Y(i). The results show that the Gaussian functions U(i) and V(i) required the use of the unbounded version for all the stations, with the exception of the U function for Torano Scalo, which required the bounded version. Figure 3 shows the comparisons between theoretical and observed quantiles of the standardized Gaussian laws for the variables X and U, and Y and V evaluated for the Cosenza and Paola stations. Generally, better performances appear for the function U, while for V some discordances between theoretical and sample values have been observed, especially for the extreme ones. Figure 4 presents the autocorrelograms of U and V evaluated for each station considering a maximum lag value equals to 30. For the variable U, a similar behaviour for all the station has been detected. In particular, the autocorrelation coefficients rapidly decrease, reaching values lower than 0.1 after 10 days. For the variable V the persistency is stronger, especially for the Cosenza and Reggio Calabria stations, for which the autocorrelation coefficients are still about 0.2 after 30 days. The autocorrelation coefficients of the remaining stations decrease faster, achieving a value of about 0.1 just after 10 days.
The identification of the FARIMA(p, d, q) process allows to describe the correlative structure of the Gaussian series. The results of the process are summarised in Table 5, in which the FARIMA parameters of both U and V   www.nature.com/scientificreports/ have been generated, also taking into account leap years. The paired values of these variables have been used to evaluate a corresponding number of annual series of daily Humidex values. In Fig. 6, the average values of the maximum annual Humidex evaluated from the observed data for all the stations are superimposed to the boxplots of the maximum annual values of Humidex obtained from the synthetic series. As a result, all the sample average values fall within the range of the percentiles 25-75%, with the exception of the value observed for the Castrovillari station, which is higher than the 75% percentile. Figure 7 shows for the Cosenza station the return period of the maximum yearly values of consecutive days with maximum daily Humidex values greater than prefixed thresholds. Obviously, for a fixed return period, the sequences of consecutive days shorten for increasing threshold values. In particular, for a return period of 10 years the sequences vary between 2 and 20 days for a Humidex threshold of 45 and 35, respectively. At the same time, considering a return period of 100 years, the sequences vary from 4 to 35 days for a Humidex threshold of 45 and 35, respectively. Figure 8 is similar to Fig. 7 but the return periods have been estimated for all the stations and only for a Humidex threshold value equal to 35. Results show that, for their behaviour, the six stations can be divided into two groups: the first one includes the inner stations (Cosenza, Torano Scalo and Castrovillari), while the second one contains the stations near to the sea (Satriano Marina, Reggio Calabria and Paola), thus evidencing the possible influence of sea proximity. As an example, the occurrence of sequence of consecutive 20 days has a return www.nature.com/scientificreports/     www.nature.com/scientificreports/  www.nature.com/scientificreports/ period ranging from 10 years (Cosenza) to about 20 years (Castrovillari). Conversely, for the stations nearer to the sea this occurrence has return periods much higher than 100 years. It is interesting to estimate in which day of the year the highest occurrence probability to start a sequence of consecutive days with the daily maximum Humidex value higher than a prefixed threshold occurs. As an example, for the Cosenza and Satriano Marina stations (Fig. 9), results show that the lower the Humidex threshold is, later in the year the highest value of probability is registered. Moreover, the diagrams show also a tendency to a decrease of the highest probability: the higher the Humidex threshold is, the lower the occurrence probability. In fact, for the Cosenza station the highest probabilities during a year occur from the 205th (Humidex = 45; P max = 0.017) to the 235th day (Humidex = 30; P max = 0.019) while for the Satriano Marina station little higher probabilities than the Cosenza station have been observed, e.g. for Humidex = 30 the highest probability is 0.021.
In order to detect the influence of the temperature value in the evaluation of the Humidex, it is interesting to analyse the occurrence probability of temperatures for fixed values of the maximum daily Humidex. As an example, for the Cosenza and the Paola stations, the results of this analysis show a tendency to a decrease of the maximum probabilities when the Humidex and temperature values increase (Fig. 10). In particular, for Cosenza, Table 5. Estimated values of the parameters of the FARIMA processes applied to all the stations.

Station
Variable FARIMAdφ 1ψ1σερεU * ,εV * r U,V   Figure 11 shows diagrams analogous to those of Fig. 10, but referred to Ur, for the Cosenza and Castrovillari stations. Differently from the results obtained for the temperature, the curves are very close each other with the highest probability values falling within a short range of Ur values. Moreover, an opposite behaviour with respect to the temperature has been observed, with the maximum values of the probabilities that tend to fall when Humidex decreases. Specifically, for the Cosenza station, the highest probabilities range between 0.037 (Humidex = 45) for Ur = 35% to 0.029 (Humidex = 30) for Ur = 45%. For the Castrovillari station, higher maximum probabilities values and lower corresponding Ur values than the Cosenza stations have been observed: these data vary from 0.047 (Humidex = 40) for Ur = 25% to 0.037 (Humidex = 32.5) for Ur = 35%.
A significant analysis can be performed to evaluate the excursion of the maximum daily Humidex values from one day to another. Figure 12 shows the maximum yearly increase of this excursion in 1-7 consecutive days for different probability values (90%, 95%, and 99%), for the Cosenza and Reggio Calabria stations. For Cosenza, this increase of Humidex values ranges between 12.9 (P = 90%) and 15.3 (P = 99%) for a lag of 1 day and can reach values between 20.2 (90%) and 24.2 (99%) for a lag of 7 days. For Reggio Calabria, the increases are lower than the Cosenza station. In fact, for a lag equal to 1 day, the Humidex values can span from 10.1 (90%) to 12.7 (99%), and, for a lag of 7 days, can range between 14.6 (90%) and 18 (99%).  In order to evidence the combined influence of the variables T and Ur on the Humidex behaviour, the contour lines corresponding to different values of probability of the maximum yearly values of Humidex have been evaluated through the synthetic series. Figure 14 shows the results for three different probability values, i.e. 50%, 75%, and 95%, for the Cosenza and Reggio Calabria stations. For the Cosenza station the greatest part of the maximum yearly Humidex values ranges from 40 to 50, while for Reggio Calabria the curves are very close to theoretical 40-curve and ranges from about 38 and 45, thus evidencing higher probabilities of reaching serious danger conditions for Cosenza.
With reference only to the 95% probability value, the different behaviour of the maximum yearly Humidex values among all the stations is shown in Fig. 15. As a result, a marked difference has been detected between the

Discussion and conclusions
Human-perceived equivalent temperature under extremely warm periods depends on the humidity conditions. Specifically, the relative humidity can compromise the body's evaporative cooling mechanism, thus inducing a lower degree of wellness 43 . The occurrence of high value of an index such as the Humidex, which is based on     www.nature.com/scientificreports/ temperature and humidity, is thus paramount for its impact on human health [44][45][46][47] . In fact, extreme heat conditions are characterized by values of Humidex beyond specified thresholds, which also take into account peculiar humidity conditions increasing the impact of temperature on people's health 48 .
Usually, the quality of the time series of temperature and relative humidity is not good enough for statistical purposes, presenting missing values or too low years of observation. To overcome this problem, a stochastic model has been proposed and applied to six stations of Calabria, an Italian region often coping with summer periods characterized by very high temperature 41 . The model required the use of FARIMA processes to describe the correlative structures of temperature and relative humidity series, once duly deseasonalized and normalized. The goodness of the stochastic model has been assessed from the comparison of extreme observed Humidex values to the simulated Humidex series obtained by synthetic generation through a Monte Carlo procedure.
Focusing on the results provided by the application of the stochastic model, the maximum yearly values of consecutive days with maximum daily Humidex value greater than prefixed thresholds, for a fixed return period, shorten for increasing threshold values. In this way, a different behaviour has been recognized between inner stations and stations located near the coast. In fact, return periods corresponding to the same sequences of consecutive days with prefixed Humidex values are much higher for the coastal stations than for the interior ones. This evidences the possible influence of sea proximity, which appears to relieve uncomfortable conditions. This result confirms the outcomes of Cannistraro et al. 49 , which revealed better comfort conditions in coastal cities subject to constant ventilation than in inner areas, using different approach and index and comparing the data of various stations.
Moreover, the statistical investigation of the synthetic series evidenced the different influence of temperature and relative humidity on the Humidex behaviour. In fact, confirming what is widely stated in the literature 50 , the analyses show that the discomfort conditions are significantly related to air temperature, while the impacts of humidity are of less importance. More specifically, our results also confirm that the relative humidity contributes to the most dangerous discomfort conditions only within a narrow range of percentile values, due to a lower influence of relative humidity with higher temperature values 51 .
The consequences of the analyses confirm the importance of taking account of the Humidex, in order to detect promptly the occurrence of potential, unusual heat-related discomfort conditions. In other terms, the results obtained for specific communities could provide help to local health agencies in inferring about the possible occurrence of discomfort conditions.