The role of city size and urban form in the surface urban heat island

Urban climate is determined by a variety of factors, whose knowledge can help to attenuate heat stress in the context of ongoing urbanization and climate change. We study the influence of city size and urban form on the Urban Heat Island (UHI) phenomenon in Europe and find a complex interplay between UHI intensity and city size, fractality, and anisometry. Due to correlations among these urban factors, interactions in the multi-linear regression need to be taken into account. We find that among the largest 5,000 cities, the UHI intensity increases with the logarithm of the city size and with the fractal dimension, but decreases with the logarithm of the anisometry. Typically, the size has the strongest influence, followed by the compactness, and the smallest is the influence of the degree to which the cities stretch. Accordingly, from the point of view of UHI alleviation, small, disperse, and stretched cities are preferable. However, such recommendations need to be balanced against e.g. positive agglomeration effects of large cities. Therefore, trade-offs must be made regarding local and global aims.

Urban Heat Island (UHI) is a commonly observed phenomenon worldwide, describing an elevated temperature of urban areas compared to their surroundings. Understanding UHI is of great relevance in the current discussion on sustainable urban design. In particular, heat waves have been observed more persistent and more frequent in the last decades 1,2 , and are projected to intensify in the future 3 . Furthermore, heat waves are shown to pose an added stress on cities 4 , raising serious concerns regarding general well-being and potential threats to human health, which in turn demands effective adaptation measures to alleviate the UHI.
The UHI effect arises from the anthropogenic modification of natural landscapes and the consequent atmospheric and thermophysical changes in the urban boundary layer 5 . The formation of UHI can be mainly ascribed to an increased absorption and trapping of solar radiation in built-up urban fabrics associated with high thermal admittance of construction materials and the urban canyon structure. Anthropogenic heat release from transport and buildings in the purpose of heating and air conditioning further exacerbate the UHI. Other factors, such as population density, built-up density, and vegetation fractions can also directly or indirectly contribute to the formation of UHI. However, studies based on different spatial scales, more precisely, the vertical scales (urban screen, -canopy, -boundary levels) and the horizontal scales (mirco-, local, regional scales) may lead to varying results on the individual contributions of each factor 6 .
UHI studies can be roughly categorized in two domains regarding the number of investigated cities. On the one hand, case study work focuses on one or a few cities and assesses the UHI characteristics with a high level of detail. On the other hand, ensemble or cross-sectional studies investigate a large number of cities aiming at achieving an understanding of common characteristics or fundamental differences arising among them. The availability of remote sensing surface skin temperature with global coverage has given rise to a number of systematic empirical studies of the latter type. In the following we focus on surface skin temperature and only mention the type explicitly (i.e. surface or 2 m) when necessary.
A global UHI study across more than 400 big cities 7 revealed that the average annual intensity during daytime is higher than during nighttime and that the daytime intensity correlates negatively with the difference of vegetation cover and activity between urban and suburban areas. Similar diurnal patterns were found in an analysis of 32 Chinese cities 8 . A follow-up work 9 based on the same Chinese cities suggested an exponential decay of the UHI along urban-rural gradients, the rate and extent exhibit site-specific diurnal and seasonal variations. In Europe, the UHI intensity of urban agglomerations exhibits a size dependency, and can typically reach a maximum of approx. 3 °C in summer and 0.5 °C in winter 10 .
A study 11 based on 65 cities in North America found that the annual mean daytime and nighttime UHI are positively correlated with the precipitation and the logarithm of population, respectively. It was suggested that the enhanced aerodynamic roughness of densely vegetated rural areas in the humid climate zone (with abundant precipitation) leads to less efficient convection, which hampered the heat transfer from urban to rural areas and resulted in an intensified UHI. This outcome at first glance seems to differ from previous studies 12, 13 based on air temperature, stating that a deficit of precipitation in the summer leads to stronger rural warming than in urban areas, i.e. a diminished UHI. However, there are substantial differences between these studies besides the data type. The positive correlation in ref. 11 is regressed out of annual mean data (UHI intensity against precipitation) among scores of cities (cross-sectional), whereas the studies 12,13 are based on data of individual case study cities across time (temporal).
Another global study comprehensively assessed the dependence of UHI on various urban intrinsic factors, regardless of geographic and climatic factors 14 . Night light, urban area and vegetation are, inter alia, dominant ones accounting for the UHI or urban heat sinks, whereas population and urban structure were found to be of less relevance.
These studies have in common that land cover data is combined with remotely sensed surface skin temperature, i.e. urban land cover is used to define the physical extent of urban areas enabling to systematically extract the temperatures inside the cities and in their rural surroundings. To date, this methodology is an established standard protocol for robustly benchmarking the thermal stress across cities, and for deciphering statistical features of the UHI associated with biophysical and socio-economical indicators. These merits can scarcely be promised by the conventional case study work.
In order to gain an understanding of the UHI phenomenon and its relevance in terms of urban design, insights about the influencing factors are necessary. On the one hand, the UHI intensity of a city is subject to the empirical metrics and indicators used for quantifying the phenomenon 15 . On the other hand, while analyzing its physical essence, it is determined by a variety of factors which can roughly be categorized into (i) external and (ii) intrinsic ones 16 . External factors include location (lat./lon.) 17 , background climate (in particular wind) 10,18 , proximity to water courses (associated with sea-or lake-breeze circulation), etc., whereas intrinsic ones depict city-specific features (e.g. city size, land cover fractions, anthropogenic heat releases) which, despite being outcomes of long-run urbanization, can be regulated and reshaped.
How to alleviate the UHI effect is another issue of considerable interest. Local interventions (e.g. parks of various sizes, green and cool roofs) are shown to have a limited influence on local climate. The cooling distance, i.e. the maximum distance within which the cooling effect of such green spaces can still be detected, ranges from tens to hundreds meters 19,20 . Possibilities to influence intrinsic properties -including the overall urban formare very limited in cities of developed countries due to small growth rates or even negative ones. In contrast, dramatic urbanization is taking place in developing countries, so that insights about how the urban form affects UHI intensities could provide guidance for the large scale planning of cities, where there is a great demand of new infrastructure.
Thus, we search for traceable signatures between features of urban form and UHI intensity. We consider three features of urban form which break down the spatial shape of the urban extent into single values. First, city size, since it has been shown previously that larger cities tend to have higher UHI intensities. Second, the fractal dimension which represents an established measure to characterize the compactness of a city. Third, anisometry which we revealed as an important measure of city shape, quantifies to which extent a city's length is greater than its width. Examples include cities extending along valleys, rivers, country borders, etc. As we show below, interactions among the three indicators need to be taken into account which implies that the influence of each of them on the UHI intensity cannot be separated.

Results
Following the methodology employed in previous studies 10, 21 , we combine land cover data with remote sensing temperature data and define the surface UHI intensity ΔT as the difference between the average temperature within the considered urban cluster and the average temperature within an equal area belt around it (see Methods Section for details). In contrast to ref. 10, here we consider the 5,000 largest urban clusters in Europe and average the summer months (June, July, August) daytime observations from 2006 to 2013. In the following we investigate how the UHI intensity ΔT depends on (i) the size, (ii) the fractality, and (iii) the anisometry of the city clusters. Therefore, we need to measure the three quantities for all considered city clusters.
(i) The city size S C is simply given by the number of cells constituting the city clusters multiplied by the area of each cell, 6.25 × 10 −2 km 2 . Due to Zipf 's law for cities [22][23][24][25][26] there are many small cities and few large ones so that we use the logarithm of city size, ln S C , in order to reduce the skewness. (ii) We compute the fractal dimension using the box counting method, assuming ∼ − n r D f , where n is the number of (square) boxes of side length r necessary to cover the structure, see Methods Section. In Fig. 1 i.e. the ratio of major axis to its minor axis, see Methods Section. It is a measure for the extent to which the city deviates from an approximate circular shape (A → 1), i.e. to which extent it's length is greater than its width (A > 1). Figure 1(a-c) also illustrates the anisometry by means of ellipses. As can be seen, the stretched shape of Belgrade is reflected in a higher value of A. Again, we use the logarithm, i.e. ln A. Figure 2 consists of scatter-plots where the daytime UHI intensity is plotted separately vs. the three quantitiesbinned values and regressions are included for illustrative purposes. In Fig. 2(a), ΔT is displayed as a function of the city size. As expected and consistent with previous work 10,11,14,16,18,27 , the UHI intensity increases with city size and doubling the city size leads to approximately 0.4 °C additional UHI intensity. Studies of UHI intensities in relation to population size go back to Oke 28 , who reported both a logarithmic and a power-law (exponent ≈ 1/4) relation between UHI intensity and population. In Fig. 2 we also include quantile regressions and find that there is heteroscedasticity in the form of stronger spreading of ΔT among large cities.
In Fig. 2(b) the influence of the fractal dimension on the UHI intensity is shown. In the range where most cities are found, ΔT typically increases by roughly 2 °C with increasing D f . This finding suggests that more compact cities have more pronounced UHI intensities. The literature on UHI intensity and fractality is very limited. The fractal analysis of surface skin temperature related to vegetation abundance 29 cannot be easily compared with our results since here we study the urban cluster which leads to the UHI. In a more general sense, the influence of urban form has been studied for the example of Beijing metropolitan area 30 , and it has been reported that, compared to a compact city, a dispersed one is efficient in reducing mean urban heat island intensity, but affects the thermal feedback at the regional scale. Last, in Fig. 2(c) ΔT is plotted as a function of the anisometry. As one would expect from intuition, the UHI intensity decreases with increasing anisometry, by approximately 1.5 °C in the shown range. Thus, more circular cities seem to exhibit elevated UHI intensities. The above mentioned heteroscedasticity is also observed in Fig. 2(b) and (c), and the spreading of ΔT is wider among cities with larger D f and smaller ln A.
Correlations among the three quantities ln S C , D f , and ln A require a more complex analysis. While anisometry and cluster size are essentially uncorrelated with a Pearson correlation coefficient ρ of −0.05 [see supplementary  Fig. S2(b) and (c), respectively. On the one hand, cities with lower fractal dimension tend to exhibit higher anisometry, i.e. more compact cities also tend to be more circular. This correlation is the strongest among the three variables considered. On the other hand, larger cities tend to exhibit higher fractal dimensions, i.e. they are more compact. It has been reported previously 31-33 that city size and fractal dimension are positively correlated, i.e. larger cities in terms of population or urbanized area have higher fractal dimensions.
Thus, we employ multi-linear regression in order to characterize the complex interplay between the UHI intensity and the three factors. Linearity, however, still represents an approximation -but a reasonable one -as . The red ellipse is the equivalent ellipse of the urban cluster, determined by following 50 . The lower row, panels (d-f), depicts the number of boxes necessary to cover the clusters as a function of the box size in double-logarithmic scale for the corresponding cities. The fractal dimension is obtained from the slope of linear regressions (straight lines). As can be seen from these examples, clusters vary in size, fractal dimension, and anisometry. Esri ArcMap 10.4 (www.esri.com/software/arcgis) and MATLAB R2015b (www.mathworks.com/ products/matlab) were used to create the maps.
will be discussed with the following example. The correlations between UHI intensity and city size S C have been fitted according to a log-logistic function 10 where the parameters a, b, c determine the saturation value, the inflection point, and the steepness, respectively. However, the sigmoid-shape of Equation (1) takes only effect far from the inflection point, i.e.
] the accuracy of ΔT is limited by the resolution of land surface temperature data (≈1 × 1 km 2 ). In order to have a reasonable estimate, both cluster and belt temperature should be based at least on a few gridded values. (ii) Due to Zipf 's law for cities (see above), for large clusters [ → ∞ S ln( ) C ] the sample of cities reduces considerably. As a consequence, there are simply too few data points carrying information on whether or not ΔT(S C ) saturates. Thus, it is justified to expand Equation (1) in the mid-range. Since around the inflection point the logistic function = + − F x x ( ) 1/(1 exp( )) can be approximated 34 (1) can be approximated by the logarithmic function which corresponds to a linear polynomial of S ln C . After having motivated the linear approximation, we finally apply the multi-linear regression. In the absence of correlations among the intrinsic urban factors a simple linear combination according to ΔT = a + b , where a, …, d are parameters, would be sufficient. Due to the correlations, all interaction terms need to be taken into account. By interaction we mean the statistical correlations between two independent variables when multilinearity occurs. We performed a stepwise linear regression with interactions (see Methods Section) to all 5,000 considered city clusters and obtained  Following the above example, Fig. 3 illustrates the linear dependencies of the UHI intensity on one urban factor when the other two are kept constant. Therefore, we fix two of the factors, simplify Equation (3) to a linear form depending only on the third factor, and extract slope and intercept. Then we rasterize the two fixed factors, repeat the procedure, and display the slope and intercept as shown in Fig. 3.
In Fig. 3(a) we observe that ΔT(D f ) is steepest for large cities with small anisometry and less steep for small cities with large anisometry. The diagonal stripes are due to the interactions of D f with ln S C and ln A. In Fig. 3(b), ∆T S (ln ) C has its largest slope for compact cities, i.e. large D f , which only occurs in combination with small anisometry. In this case, the slope only changes along D f (horizontal stripes) -interactions with ln A have not been found. Lastly, the slope of ΔT(ln A) is mostly negative [Fig. 3(c)], with the steepest negative slopes observed for cities with a large fractal dimension. The vertical stripes illustrate the interactions with D f , i.e. there would be no stripes in the absence of interactions.
At this point we still do not know which of the three factors has the strongest influence. The reason is that due to different ranges (e.g. D f is roughly within 1.2 and 1.8, while A ln is roughly in the range between 0 and 2.5), the parameters obtained in Equation (3) are not comparable. Thus, we repeat the multi-linear regression, but normalize the data previously to zero mean and unit standard deviation, e.g. A ln . However, due to the above discussed interaction, the ranking is only valid for typical cities in our sample and including further small cities could affect the overall outcome. Moreover, we perform a rather basic normalization and we cannot exclude that the skewed distributions could affect the resulting parameters.
Since it has been argued that the UHI intensity based on daytime LST are overestimated 35 , we also included an analysis based on nighttime LST in the Supplementary Information. Due to overall weaker intensities, the dependencies on the city size, fractality, and anisometry are less pronounced. Nevertheless, the relative contributions are consistent with the daytime results.
Certainly, the regression Equations (3) and (4)  The three examples suggest that the predictive power of the global regression model, i.e. based on the full sample of cities, is rather limited, which could be due to regional inhomogeneities. Therefore, we adopted two different sampling strategies to assess the robustness of the results against the regional inhomogeneities. We first divided the study area into 9 zones of similar number of cities and applied the multi-linear regression in Equation (4), independently. As shown in Fig. 4(a), city size dominates in most cases, followed by fractal dimension, whereas in South Europe anisometry has a larger impact on the UHI. Second, we created subsamples of varying number of cities with and without replacement and applied stepwise regression to the subsamples. Figure 4(b) and (c) reveal that as the sample size increases, our model in Equation (4) tends to appear more frequently. We conclude that our model has a good global performance, while at local scale the model should be used with certain precaution.

Discussion
In summary, we explore the recently established methodology, which systematically combines urban land cover and remote sensing surface skin temperature, in order to characterize the UHI intensities of a vast number of cities. Studying the largest 5,000 European urban agglomerations, we find a complex interplay among the correlations with intrinsic urban factors. Among the three considered large scale urban features, typically city size has the strongest influence, followed by the fractality -and the anisometry presents the weakest influence. That is, in general, the larger, the more compact (high fractal dimension), and the less stretched (small anisometry) the cities are, the stronger their UHI intensity tends to be.
Our empirical findings on the dependence of the UHI intensity on the city size and form could be attributed to the scale effect of convection 36  0. For a fixed fractal dimension, as the urban area increases, the heat convection (quantified by the convection heat transfer coefficient h) diminishes, resulting in a higher surface temperature. Analogously, for a fixed surface area S C , an increasing fractal dimension D f weakens the convection and leads to a higher surface temperature.
Our results can be relevant for urban policy and planning in the context of global warming and local UHI adaptation.

Avoid large cities.
How the UHI intensity depends on city size is in particular relevant in world regions of ongoing urbanization. Policies could be developed for incentives to also populate medium size and small cities, i.e. thereby to control the exponent of Zipf 's law for cities [22][23][24][25][26] , which relates the relative frequency of large and small cities.

Avoid compact cities.
More compact urban clusters have larger fractal dimensions 37 . Qualitatively, it is comprehensible that urban sprawl and polycentric form lead to smaller fractal dimensions. Urban planning can influence these features of urban form.  Figure 4. Robustness of multi-linear regression under spatial and random sampling. In panel (a) we divide the study area into 9 partitions (blue rectangles) of similar size and separately apply linear regression according to Equation (4) to the normalized quantities. The pie-charts depict the resulting coefficients (for negative values the absolute value has been taken). The area of the pie-charts is proportional to the number of cities in the partition. Only statistically significant coefficients (at 95% level) are labeled. City size dominates in most cases, followed by fractal dimension, whereas in the south the anisometry becomes important. In panels (b) and (c) the results of stepwise regression on randomly sampled cities (500 repetitions) without and with replacement, respectively, are displayed. In both cases, as the sample size increases, Model A i.e. Equation (4) (6)], respectively. MATLAB R2015b was used to create the map (www.mathworks.com/products/matlab).
It is plausible that stretched cities have lower UHI intensities since the distances to the city border are shorter, in favor of enhanced atmospheric convection. Thus, from an UHI alleviation perspective, cities extending along natural or artificial topographic lines are preferable over those developing mostly around their center.
Certainly, such recommendations need to be opposed to other advantages and disadvantages. In particular, keeping cities small and the consequent ameliorated urban climate should be balanced against positive agglomeration effects of large cities such as shorter trip lengths 38 . Scattered and anisometric cities come along with more traffic, which has negative side effects, including increased anthropogenic heat and CO 2 emissions. Thus, trade-offs on the local scale need to be made, when implementing urban factors. Moreover, from a global point of view it has been argued that compact cities are preferred because of their potentials in reducing energy consumption and CO 2 emissions 39 . However, as mentioned above such recommendations should also be adjusted according to regional specificities (see Fig. 4).
Our work adds to previously gained understanding on how compact urban form increases the UHI intensity and on the problems of transferring such insights into spatial planning 40,41 . Thus, our results also contribute to the ongoing discussion on the effectiveness of urban forms -in particular, single-centric (compact city) vs. poly-centric city (dispersed city) -as a means for alleviating heat islands as a negative impact of urbanization 30 .
This study is also an example on how concepts from fractal geometry are of use in city science. For three decades, it has been argued that cities are fractal in form [42][43][44] , and the relation between fractal structures and urban areas has received widespread attention 32 . The fractal dimension of urban agglomerations is a measure of their compactness. Thus, in this study we contribute to the view on cities from a fractals perspective and postulate that the correlations between cluster size and fractal dimension are a manifestation of multi-fractality at the regional scale.
Last but not least, our work opens a perspective for future studies in various directions. First, since here we solely investigate surface skin temperature, an apparent question to be raised is to what extent similar correlations of UHI intensities with urban form also appear considering air temperature. Due to data limitations this can hardly be verified empirically, so that numerical modeling 45 could represent an alternative. Second, we focus on large scale features of urban form, i.e. intrinsic factors. It could be interesting to test whether the consideration of external factors, foremost wind, would improve the characterization of the influence of intrinsic factors on the UHI intensities. Third, we study ensemble data, i.e. quantify correlations among the sample of cities, and do not consider temporal dynamics. It is important to verify if our findings also hold for an individual city under growth scenarios reflecting the features of urban form 30 . Lastly, can numerical models reproduce our findings or lead at all to comparable results 21 ?

Methods
Datasets. CORINE urban morphological zones (UMZ) 2006 data at 250 m spatial resolution are used for delineating urban areas in Europe. For countries where raster UMZ data are not available, e.g. UK and Switzerland, we generated the UMZ data based on the CORINE Land Cover 2006 data following the method described in ref. 46. The processed UMZ data, containing binary urban/non-urban information for 38 European countries, are projected to the sinusoidal coordinate system which is consistent with that used in the LST data.
We used the MODIS Aqua 8-day composite (MYD11A2, Version 5) LST products for the summer months (June-July-August, JJA) from 2006 to 2013, in total 104 observations across 8 years. The data are at 926.6 m ≈ 1 km spatial resolution, and are measured at 13:30 (daytime) and 01:30 (nighttime) local solar time. In this study, we focus on the daytime LST, because the daytime surface UHI is more pronounced than that of the nighttime. Complementary results based on nighttime LST can be found in supplementary. According to the pixel-wise LST error flag inherent in MYD11A2, we disregard pixels with LST error >2 K. We also omitted pixels with view zenith angle >35° to minimize the anisotropy bias caused by the view angle, while guaranteeing a sufficient data quantity for further analyses 47 . Based on the processed LST data, multi-annual summer mean LST is calculated.
Urban heat island (UHI) intensity. We followed the methodology employed in previous studies 7, 10, 21 to calculate the surface UHI intensity. Cities are defined via the City Clustering Algorithm (CCA) 24, 26, 48 based on the UMZ data, with a clustering parameter l = 250 m, being in accordance with the spatial resolution. According to CCA, any pair of urban cells with a distance no larger than l are assigned to the same urban cluster. We defined an equal-area belt region around an identified city cluster as its rural or suburban reference, devoid of water courses and urban pixels of other clusters. The surface UHI intensity of an urban cluster is defined as the difference between average urban and rural temperature 10, 21 , i.e. ∆ = − T T T C B . In contrast to previous studies 10, 21 , here we base our analysis on temporally aggregated temperature data, namely multi-annual summer mean LST. Moreover, in contrast to ref. 10, here we disregard small city clusters and consider only the largest 5,000 clusters corresponding to cluster areas S C > 6.1 km 2 .
Fractal Dimension. We used the box-counting algorithm to compute the fractal dimension D f for each urban cluster 49 . Therefore, we count the number of boxes N of size r × r necessary to fully cover the considered urban cluster. Assuming ∼ − N r r ( ) D f , the linear regression to N ln( ) vs. ln(r) provided the slope which corresponds to the fractal dimension D f . The conventional method is to initialize r to the minimum cell size and stepwise double it until = N r ( ) 1. It turned out that this 2-based exponential sampling method led to a discreteness artefact and denser sampling was more robust. Thus, we adopted a denser sampling strategy by incrementing r by 1 and omitting any point [r, N(x)] if the count = − N r N r ( ) ( 1). Sampling can be seen in Fig. 1(d-f).
Scientific RepoRts | 7: 4791 | DOI:10.1038/s41598-017-04242-2 Anisometry. We computed the anisometry (A) of city clusters similar to the method 50 . We defined the anisometry of a city cluster as the ratio of the city cluster's maximum Feret's diameter to its minimum Feret's diameter. The Feret's diameter is the distance between two parallels tangent to an object along a certain direction. In order to illustrate the relative stretch of clusters, we drew the equivalent ellipse of a city cluster by assigning the maximum and minimum Feret's diameters to the axes of the ellipse (see. Figure 1(a-c). The ellipse centers at the centroid of a city cluster. Analog to the cluster size, we use the logarithms of anisometry (ln A) throughout the study to reduce the skewness.
Quantile regression. Quantile regression 51, 52 is a method for estimating the impact of observed covariates on quantiles of the response variable. In contrast to ordinary least squares regression, quantile regression is particular applicable for the model with heterogeneous variance, e.g. in the presence of heteroscedasticity, where the former approach usually misestimates the real relationship or fails to detect the nonzero changes 53 . Quantile regression finds wide application in disciplines, where data are seldom normally distributed, e.g. ecology 53 , climatology 54, 55 , etc. Assuming a regression function β ε = + Y X . The estimators for the quantile τ, i.e. β τ are obtained by minimizing the sum of asymmetrically weighted absolute residuals. The weights are given by the function ρ τ 52 .
Multi-linear regression. We employed the general multi-linear model to quantify the relation between the UHI intensity ΔT and predictive variables -the logarithm of city size S ln C , fractal dimension D f , and the logarithm of anisometry A ln . We use the general ansatz where a, …., h are eight parameters, and e.g. D f ln S C is the interaction between fractal dimension and city size. We used the forward and backward stepwise regression to determine the variables in the multi-linear model. The Bayesian Information Criterion was used to add and remove terms in the model, and to avoid data-overfitting.