An experimentally validated fading model for THz wireless systems

As the wireless world moves towards the sixth generation (6G) era, the demand of supporting bandwidth-hungry applications in ultra-dense deployments becomes more and more imperative. Driven by this requirement, both the research and development communities have turned their attention into the terahertz (THz) band, where more than \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\,{\text {GHz}}$$\end{document}20GHz of contiguous bandwidth can be exploited. As a result, novel wireless system and network architectures have been reported promising excellence in terms of reliability, massive connectivity, and data-rates. To assess their feasibility and efficiency, it is necessary to develop stochastic channel models that account for the small-scale fading. However, to the best of our knowledge, only initial steps have been so far performed. Motivated by this, this contribution is devoted to take a new look to fading in THz wireless systems, based on three sets of experimental measurements. In more detail, measurements, which have been conducted in a shopping mall, an airport check-in area, and an entrance hall of a university towards different time periods, are used to accurately model the fading distribution. Interestingly, our analysis shows that conventional distributions, such as Rayleigh, Rice, and Nakagami-m, lack fitting accuracy, whereas, the more general, yet tractable, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α–\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document}μ distribution has an almost-excellent fit. In order to quantify their fitting efficiency, we used two well-defined and widely-accepted tests, namely the Kolmogorov–Smirnov and the Kullback–Leibler tests. By accurately modeling the THz wireless channel, this work creates the fundamental tools of developing the theoretical and optimization frameworks for such systems and networks.

The teraherthz (THz) wireless communications have been identified as a promising enabler for the sixth generation (6G) wireless technologies, because the THz band offers a contiguous bandwidth of more than 20 GHz 1,2 . The THz band (0.1-10 THz) is envisioned to be utilized in the deployment of both indoor and outdoor wireless systems. In the recent years, both academics and industry have focused their attention on the development of outdoor and especially indoor THz wireless systems [2][3][4] . In more detail, regarding the indoor THz wireless communications significant standardization bodies are in the process of publishing spectrum allocation regulations and standards such as, the Institute of Electrical and Electronics Engineers (IEEE) standard (Std.) 802. 15 [5][6][7][8][9] . However, despite all this effort the channel modeling of indoor THz wireless communications has not been yet adequately settled 3,4,10-24 . Due to the severe propagation losses in the THz band, the wireless communications in this frequency range rely heavily on the line-of-sight (LoS) component of the received signal 11,12,25 . Moreover, by taking this into account, the THz channel is commonly modeled by considering only the large-scale propagation phenomena, namely the shadowing and the deterministic pathloss [11][12][13][19][20][21][22][23][24] . The pathloss in the THz band is expressed as the product of the free space and molecular absorption loss 11 . In order to acquire and model the pathloss coefficient of the molecular absorption loss, the use of spectroscopic databases yielding the different molecular absorption lines is needed 3 . To mitigate the need to access the spectroscopic databases, various simplified molecular absorption loss models were developed for the ranges of 100-450 GHz, 200-450 GHz and 275-400 GHz 11,19,20 . Moreover, by employing these simplified models the THz channel was assumed to consist of a single coefficient in the LoS direction, which was obtained as the product of the free-space loss and the molecular absorption loss 11,19,20 . Meanwhile, LoS and non-line-of-sight (NLoS) channel measurements for various narrowband indoor wireless communications links operating at 28 GHz and 140 GHz were performed 12,13 . In these works, based on the measured received signal power of the multipath components of the different links, the respective millimeter wave (mmWave) and THz channels were deterministically modeled as the sum (in dB) of an exponential pathloss and a lognormal shadowing distribution. The parameters of the exponential pathloss and the variance of shadowing were extracted by making use of the received signal powers of the observed links. Additionally, a single path theoretical THz channel model for nano-scale machine communications within vegetation was derived, where the receiver (RX) was assumed to detect signals only from the LoS direction 22,24 . More specifically, it was assumed that, the channel was composed by two coefficients, which were the pathloss modeled as the product of the free space and molecular absorption losses and a lognormal shadowing. Also, a new paradigm of aerially suspended nano-nodes to bridge the disjoint internet-of-things (IoT) THz networks was proposed. In this work, the proposed LoS THz channel model between the TX and the RX nano-nodes was expressed in terms of the deterministic pathloss. More specifically, the pathloss model took into account the environmental conditions, the spreading and molecular absorption losses, the transceivers distances, the angle of arrival, and the RX antenna dimensions 26 . Moreover, two single-frequency and one multiple frequency THz pathloss models were introduced. These models were extracted by employing multiple indoor wideband measurements in the range of 220-330 GHz 27 . Furthermore, an indoor channel model for mmWave and THz frequencies operating at 28 and 140 GHz was developed. This model included the cases of omni directional and directional pathloss as well as cluster channel statistics, namely their number, delays and powers 28 .
Despite the heavy rely on the existence of the LoS component in THz wireless communications, there are aerosols in the atmospheric medium, as well as objects laid in the propagation environment that can act as scatterers 12,13,29 . Hence, there can be THz multipath components with significant power capable of being detected by the RX even if they arrive from NLoS directions 4,12,21 . The existence of multipath components having different levels of received power, angles of arrival, angles of departure and delay times means that the received signal power at the RX can have deep and time varying fast fades 30 . Those phenomena are parts of the stochastic small-scale fading 30 . According to the technical literature there are works that perform and employ theoretical as well as experimental THz channel modeling by taking into account phenomena belonging to the small-scale fading 3,4,10,[12][13][14][15][16][17][18]21,31 . It was observed by means of experimental THz channel measurements that the small-scale fading in this band can be modeled by means of Rice, Nakagami-m and Rayleigh distributions 3 . Considering this, the more generic α-µ distribution was employed to model the small-scale fading of a THz backhaul wireless system and the performance was evaluated under different levels of transceiver antennas misalignment, hardware impairements and fading severity, in terms of outage probability and ergodic capacity 3,10 . Meanwhile, a stochastic two dimensional geometrical channel model for indoor THz communications was developed. By employing this model, a parametric multipath Rice fading model for THz communications was elaborated 16,17 . Furthermore, a stochastic indoor THz channel model was introduced, were the small-scale fading attenuation factor was obtained by a Rayleigh or Nakagami-m disttribution under NLoS conditions and as a Rice or Nakagami-m in LoS. The aforementioned claims were validated by means of experimental measurements, which took place inside an anechoic chamber 15 . In the meantime, experimental received signal power measurements of multiple LoS and NLoS transceiver links recorded in a shopping mall were employed to derive the suitable small-scale fading distribution for THz systems operating at 140 GHz. Then, based on those measurements it was concluded that for the LoS and NLoS links the distribution that most accurately describes the measured channel data is the Weibull and Nakagami-m, respectively. To support this claim the fitting of those theoretical fading distributions to the empirical data was evaluated in terms of the goodness of fit Kolmogorov-Smirnov (KS) test 18 . Also, another approach to model indoor THz wireless systems operating at the range of 240-300 GHz was by assuming a small-scale fading distribution expressed as a sum of individual Gamma distributions. The suitability of the Gamma distribution for THz channel modeling was verified by means of the KS test, the Kullback Leibler (KL) divergence test and the weighted relative mean difference error metric, which tested the fitting of the analytical expression to the measured data 4 . Furthermore, a measurement based channel model for LoS and NLoS conditions was proposed for THz transceivers operating in the range 126-156 GHz and it was based on the extended Saleh-Valenzuela channel model. In this model the large-scale fading was expressed in terms of exponential pathloss and shadowing, while the small-scale fading amplitude was obtained by a novel distribution. The accuracy of the model was evaluated by means of indoor exprerimental measurements 21 . Meamwhile, the small-scale fading of a holographic multiple-input-multiple-output (MIMO) system suitable for mmWave and THz communications was theoretically proposed. There, the small-scale fading was modeled as a zero-mean, spatially stationary, and corelated Gaussian scalar random field 31 . Also, quite recently the fluctuating two-ray (FTR) model has been considered as a promising candidate to accurately model the small-scale fading statistics of THz wireless channels. In more detail, a THz measurement campaign at 300.4 GHz was conducted in a train facility test center, where various obstacles were present, such as trains, tracks and lampposts 32 . Then, the small-scale fading statistics of those measurements were verified to be very accurately fitted by the FTR fading model, which performed significantly better than the Rice, Gaussian and Nakagami-m distributions 33 . Finally, there are publications in lower frequency bands, such as the mmWave band, which study the small-scale fading distributions of the wireless channels by means of the α-µ and κ-µ distributions [34][35][36] . However, due to the fact that the notion of scatterer and blocker are dependent on the wavelength, while moving to higher frequencies such as those of the THz band; the need to re-investigate those terms arises 25,37 .
To the best of the authors knowledge, no fading distribution modeling the channel of indoor THz systems has been yet documented to be based not only on measurements conducted in multiple environments, but also on different time periods. Motivated by this, in this work the measurements of LoS and NLoS links of three indoor measurement environments were exploited. In more detail, it was made use of the measurements conducted within the premises of a shopping mall, an airport check in area and the entrance hall of Aalto university in Finland 12,13 . The shopping mall is located at Espoo Finland and the airport is in Helnsiki. In both of these scenarios the measurements were conducted in November 2016, meanwhile the measurements in the university entrance hall were performed in the time period from January to March 2021. For each link multiple channel www.nature.com/scientificreports/ gain measurements were recorded, which were then used in this work to perform small-scale fading statistics. As will be presented in the "Method" section, first the measurements of each link will be preprocessed to obtain the channel gain of the recorded multipath components. Then, to increase the number of the different channel realizations in each link, a method based on adding random phases to the path amplitudes will be employed. By making use of the resulting channel realizations of each link, the empirical probability density function (PDF) and cummulative density function (CDF) are fitted to the analytical distributions of α-µ , Nakagami-m, Rayleigh, Rice ,and lognormal 30,38,39 . The parameters of the analytical distributions are obtained by fitting them to the empirical ones of the channel gain. This is accomplished by means of non linear regression machine learning. The accuracy of the fit of the analytical distributions to the corresponding empirical ones is validated by means of the KS goodness of fit test and KL divergence test 40,41 . All the analytical distributions except Rayleigh passed the KS test. Moreover, it is observed that the α-µ distribution yields almost a perfect fit for all the links of all the examined scenarios, which is not the case for Rice, Nakagami-m and lognormal. To validate this observation the KL test is employed. According to this test, α-µ results to the least distance from the empirical PDF. Moreover, it should be noted that the aim of this work is not only to identify a fading distribution capable of accurately modeling the small-scale fading statistics of THz wireless channels, but also to be analytically tractable. The PDF of the FTR fading distribution is expressed as a series of a Legendre polynomial that contains the confluent hypergeometric function 42 , Eq. (13)]. Also, the definition of the PDF expression of FTR increases the complexity to find the distribution parameters needed to perform fitting to the THz channel data. Furthermore, the THz channel modeling by means of sum of N independent Gamma distributions has been found to yield very accurate fit to THz channel measurements 4 . However, to perform fitting to the channel measurements one should identify the suitable parameters for each of the N different Gamma distributions, which increases detrimentally the complexity of this process 4 , Eq. (8)].

Results
This section focuses on the presentation of the fitting results of α-µ , Nakagami-m, Rice and lognormal distributions to the empirical channel gain distributions of the links. In more detail, first a short presentation of the measurement setup and sites takes place. Then, it is followed by the statement of the superior fitting of α-µ to the empirical data when compared to Rice, Rayleigh, Nakagami-m and lognormal. Subsequently, some indicative figures illustrating the fit of α-µ to the empirical channel gain PDF and CDF to LoS and NLoS links of the three scenarios is presented.

Measurement setup and sites.
In both the shopping mall and airport check in measurement scenarios the transmissions were conducted at the center frequency of 143.1 GHz with a total bandwidth of 4 GHz , and the RX antenna was rotated with an angular step of 5 •12,13 . Furthermore the transmitter (TX) was equipped with an omni-direction antenna yielding a gain of 0 dBi , whereas the RX was equipped with a directional horn antenna achieving a gain of 19 dBi . During the measurement campaign in the shopping mall the paths of 18 independent TX-RX links were measured. More specifically the RX for all the experiments was placed at the same position, while the TX was also static but placed at 18 different positions each one corresponding to a different TX-RX link. Additionally, the experiments were conducted at a time of the day that no people were at the premises, hence the measured paths are not impaired by human blockage. The only blockers that could interrupt the LoS link were a pillar and an escalator. From the total of 18 measured links, only three were in NLoS conditions, namely the links 18, 20 and 22. Furthermore, Fig. 1a, illustrates the top-view of the shopping mall floor in which the measurements were conducted. Figure 1b, illustrates the top-view of the airport check in hall. During the measurement campaign 11 independent TX-RX links were measured 13 . In all the experiments the RX was placed at the same position. Meanwhile, the TX was also static but placed at 11 different positions, where each position corresponds to a different TX-RX link. The measurements took place at a time of the day that no people were at the premises, hence the measured paths are not impaired by human blockage. The only blocker present was a check in kiosk, which caused link 16 to be in NLoS.
The same channel sounder setup elaborated for the shopping mall and airport was utilized except for some key differences such as the center radio frequency (RF) is 142 GHz, RF power to antenna is 5 dBm, TX and RX antenna heights are 1.85 m and angular step is 10 • . The entrance hall map and antenna locations are illustrated in Fig. 1c. Note that there is a cylindrical building pillar and plants obstructing the LoS path of links 4 and 6, respectively. Hence, among the 12 TX-RX links considered, only TX 1 and TX 2 have LoS path to the RX. Measurements were also performed while the antennas are stationary and there are no moving objects in the whole entrance hall. Furthermore, for the links 1, 5, 6, 7, 9, 10 and 12 several repeated measurements were recorded, in order to investigate the repeatability of the channel characteristics.
The channel gain estimates of each path, from the channel sounding of all the scenarios, have at least 10 dB signal-to-noise-ration (SNR). The noise observed to the measured received power of each link is attributed to the vector network analyzer of the employed receiver. Despite this fact, many LoS and NLoS paths of the employed measurements, have SNR greater than 30 dB . Therefore, the effect of noise is minimal in the fading statistics studies performed in this work. Generally, it should be noted that, the effect of noise must be kept minimal when performing channel modeling studies. Moreover, the measurement method used to obtain the empirical channel data, allows spatio-temporal sounding, i.e., to see fading of channel coefficients over space and frequency. Those fading coefficients are what systems in realistic scenarios obtain.
The observed advantage of α-µ fitting over well known distributions in THz channel modeling. In wireless communications were the LoS paths are the dominant contributors to the received power, the small-scale fading is commonly modeled by a Rice, Nakagami-m, Rayleigh or lognormal distribution 15 www.nature.com/scientificreports/ this work the channel gain measurements of the three scenarios are fitted by Rice, Nakagami-m, α-µ , Rayleigh and lognormal fading distributions. The fitting of those distributions to the empirical ones of the data is evaluated by the goodness of fit KS and the KL divergence test. In Tables 1, 2, 3, 4, 5 and 6 the KL test of the α-µ , Rice, Nakagami-m and lognormal distributions is represented as KL α-µ , KL R , KL N and KL L , respectively. Additionally, the TX and d columns stand for the index of the transmitter antenna and the TX-RX distance, respectively. The columns α , µ and β represent the α-µ distribution parameters. Meanwhile, the columns K and K represent the Rice distribution parameters, whereas the columns m and N are the parameters of the Nakagami-m distribution. Also, the columns µ L and σ L stand for the lognormal distribution parameters. For the KS-test columns the check mark indicates that the links passed the KS-test, whereas the xmark that they did not. Moreover, for the LoS columns the check mark means that the corresponding link is in LoS conditions, whereas the xmark that it is in NLoS. Also, the number of samples of the empirical distributions CDF used in the KS test is given by the column N, in Tables 1, 3 and 5. From Tables 1, 2   www.nature.com/scientificreports/ for the empirical PDF, while the continuous red, green, orange and black lines represent the analytical PDF of α -µ , Nakagami-m, Rice and lognormal distributions, respectively. Moreover, from Fig. 2 it is observed that the tails of the analytical lognormal PDFs have severe discrepancy from their corresponding empirical PDFs. Also, from Fig. 2b it is observed that the empirical PDF of the airport scenario is moved to the left, when compared to the corresponding ones of the shopping mall and university entrance hall measurement scenarios. This is illustrated by examining the empirical PDFs of the links presented in Fig. 2a,c. This displacement of the airport check-in hall empirical PDF is caused by the existence of stronger multipath components in this environment, when compared to the other two measurement sites. Hence, the probability of observing deeper fades in the channel measurements performed in the airport, in comparison with those conducted in the shopping mall and university entrance hall is increased. In more detail, this finding can be further supported by the greater delay spread of multipath components in the airport check-in hall, as opposed to those of the shopping mall and university entrance hall scenarios 12,13 . Finally, it should be noted that, no parameters of the Rayleigh distribution are presented in Tables 1, 2, 3, 4, 5 and 6 for neither of the three measurements scenarios. This is due to the fact that no analytical Rayleigh PDF would pass the KS test.   x is normalized with respect to parameter β. Figure 3 depicts indicative examples for the PDF and CDF of the channel according to measurements of LoS links that were conducted in the shopping mall. In more detail, Fig. 3a,b present the analytical and empirical PDF and CDF, respectively. From Fig. 1a, the links 1 and 24 were chosen, because they have relatively short transmission distances, whereas the links 7 and 15 were selected, due to the fact that they have relatively large transmission distances. Of note the transmission distance of links 1, 7, 15 and 24 are respectively 5.1, 65.2, 25.03, and 3.1 m . Moreover, in Fig. 3a the cyan, pink, black and orange vertical lines indicate the 95% confidence interval of the median for the links 7, 1, 15, and 24, respectively. From Fig. 3a, it is observed that as the transmission distance increases, both the samples median as well as the range of their 95% confidence interval decreases. This is due to the fact that as the distance increases, the number of reflected paths that carry a measurable amount of power decreases. Meanwhile, from this figure, it becomes evident that the α-µ distribution provides an excellent fit with the experimental results. Furthermore, Fig. 3b provides an illustration that verifies the good fit results that are achieved by employing the KS test.
In Fig. 4 the PDF and CDF of the channel according to the measurements of the NLoS links that were conducted in the shopping mall are illustrated. More specifically, Fig. 4a,b present the analytical and empirical PDF and CDF, respectively. Also, links 20 and 22 have distances of 33.13 and 26.98 m , respectively. Additionally, the orange and black vertical lines indicate the 95% confidence interval of the median for the links 22 and 20, respectively. From Fig. 4a it is observed that as the transmission distance increases both the samples median and their 95% confidence interval range decreases. This is due to the fact that as the distance increases, the number of reflected paths that carry a significant amount of power decreases. Additionally the number of the reflected paths capable of being detected by the RX are further reduced by the obstacles, which absorb and scatter them. By taking this into account, it should be noted that link 20 has greater median range compared to link 22, because 22 is obstructed by a pillar made of solid material, whereas 22 is obstructed by a glass escalator. Meanwhile, from Fig. 4 it is evident that the α-µ distribution provides an excellent fit for the NLoS links. Furthermore, Fig. 4b provides an illustrative example to verify the good fit that the KS test yields.      www.nature.com/scientificreports/ In Fig. 5 indicative examples for the PDF and CDF of the channel according to measurements of LoS and NLoS links that were conducted in the airport are presented. In more detail, Fig. 5a,b present the analytical and empirical PDF and CDF, respectively. In Fig. 5a, the LoS link 1 has a transmission distance of 5.1 m, while link 16 was the only NLoS link measured in this scenario and has a transmission distance of 20.09 m. In this figure it should be noted that the orange and black vertical lines indicate the 95% confidence interval of the median for the links 16 and 1, respectively. From Fig. 5a, it is observed that as the transmission distance increases, both the samples median as well as the range of their 95% confidence interval decreases. This is justified by the fact that as the transmission distance increases, the number of the reflected paths that carry a significant amount of power decreases. Meanwhile, Fig. 5 shows that the α-µ distribution provides an excellent fit with the experimental results. Furthermore, by employing Fig. 5b it can be verified that the KS test yields a good fit.
In Fig. 6 indicative examples for the PDF and CDF of the channel according to measurements of LoS and NLoS links that were conducted in an entrance hall of the Aalto university campus are shown. Figure 6a,b present the analytical and empirical PDF and CDF, respectively. In Fig. 6a, the NLoS links 3, and 10 were selected, because they have transmission distances of 3.3, and 47.44 m, respectively. Additionally, in this figure the unique for this scenario LoS links 1 and 2 are presented. Moreover, it should be noted that the black, orange, cyan and pink vertical lines indicate the 95% confidence interval of the median for the links 1, 2, 3, and 10, respectively. From Fig. 6a, it is observed that for the NLoS links as the transmission distance increases, both the samples median as well as the range of their 95% confidence interval decreases. This is justified by the fact that, the increase of the transmission distance reduces significantly the number of the reflected paths that carry a measurable amount of power. Furthermore, the same observation about the median applies also for the LoS links 1 and 2. Meanwhile, Fig. 6 illustrates that the α-µ distribution provides an excellent fit with the experimental results. Additionally, Fig. 6b serves as an illustrative example to verify the good fit that is achieved by means of the KS test.    Tables 1, 3 and 5 show the parameters of the α-µ distribution that were found to fit the empirical measurement channel data for the measurements sites of the shopping mall, airport check in area and Aalto campus entrance hall. In all the presented scenarios the α-µ distribution parameters provide an adequate fit to the empirical data, by passing the KS-test, while yielding also low values for the KL test. The only exception are links 8 and 11 presented in Table 5, because for those links no paths were detected by the receiver. By observing the values of the parameter α in the three presented scenarios it can be seen that for the majority of the measured links, α ∈ [2 − 3] . An exception is made by the links 4, 5, 7, 14 of the shopping mall measurement scenario, where α ∈ [4 − 6.5] and for link 4 of the Aalto entrance hall measurements where α = 8.54867 . Furthermore, by observing the values of the parameter µ in the three scenarios it can be seen that µ ∈ [0. 23 − 8.5] . Moreover, it can be noticed that the NLoS links in all the three scenarios have µ ≤ 1.

Discussion
According to the literature, the THz channel modeling up until now was performed by employing fading distributions such as, Nakagami-m, Rayleigh, Rice, Weibull and mixture of Gamma distributions 4,[15][16][17][18] . In this work, the suitability of modeling the THz small-scale fading by means of the α-µ distribution is examined. Towards this direction despite the paramount importance of pathloss modeling, in this work, we normalized the path gain measurements of each link as in Eq. (2). This was administered in order to eliminate the effect of the deterministic pathloss and retain only the small-scale fading characteristics of the measured channels. Moreover, the deterministic THz pathloss and its dependency on the operating frequency, transmission distance, relative humidity, air temperature, and pressure has been studied extensively in previous works 3, [10][11][12][13][19][20][21]23,25,43 .
It is observed that, α-µ accomplishes a good fit to the channel gain measurements of all the links in the shopping mall, airport check in area and Aalto university entrance hall scenarios. In more detail, the goodness of fit of α-µ is compared to that of the Nakagami-m, Rice, Rayleigh and lognormal fading distributions and it is evaluated by means of the KS and KL tests. For the shopping mall scenario, the empirical PDF of the channel gain measurements of the links 4,5,7,12,14,16,18,19 and 21 is more accurately fitted by the α-µ distribution, when compared to Nakagami-m and Rice. This is verified by the values of KL α-µ , KL N and KL R , where by observing Tables 1 and 2, KL α-µ has the lowest value. Furthermore, by observing the KL values of Tables 3 and  4, α-µ accomplishes a better fit to the empirical channel gain PDF of link 1 of the airport check in area. For the Aalto university entrance hall link 4 from Tables 5 and 6 based on the KL value it is observed that α-µ yields a better fit to the empirical PDF compared to Rice and Nakagami-m. Meanwhile, by observing the KL values of Tables 1, 2, 3, 4, 5 and 6 it can be conducted that Nakagami-m performs the worst in terms of fitting to the empirical channel gain PDFs, when compared to α-µ and Rice distributions. Meanwhile, according to the KL test values the lognormal distribution performed the worst in terms of fitting to the empirical channel gain distributions, when compared to α-µ , Rice and Nakagami-m. Finally, it should be noted that, the Rayleigh distribution in all the measurement scenarios and for all the links did not yield an adequate fit even in terms of the KS test.

Methods
Preprocessing of the measurement data. The channel describing a wireless RF link is expressed in terms of a product of two coefficients, one deterministic and one stochastic. The deterministic part describes the large-scale fading effects of the propagation, i.e the pathloss. In more detail, the large-scale fading describes time-invariant phenomena of the signal propagation, whose effect remains constant during the signal propagation. Meanwhile, the stochastic channel coefficient describes the small-scale fading characteristics of the channel, which are time and frequency dependent. The small-scale fading behavior is of special importance because it can lead to unpredicted deep fades to the received signal power. In this sense in order to perform small-scale fading characterization of the channel one should eliminate the deterministic channel coefficient of pathloss. The channel sounding performed in the shopping mall, airport and Aalto university entrance hall measurements environments provides power angular delay profiles (PADPs) for each of their TX-RX links. In more detail, the PADPs of each link are given as a set of propagation paths where φ i , P i and t i stand for the azimuth angle at the RX, the propagation delay gain and time of the ith propagation path, respectively. The parameter G, known as the broadside angle, denotes the combined gains of the TX and RX antennas, while I and δ(·) are the Dirac delta function and the total number of multipath components of a link, respectively. Then, in order to eliminate the deterministic phenomenon of pathloss, the link path gain measurements by employing Eq. (1) to each link, are normalized to unity as where Generation of different channel realizations from a single measurement. In the THz band the wavelength of the transmitted electromagnetic waves is much smaller compared to the size of obstacles laid in www.nature.com/scientificreports/ the transmission path. This reduces the ability of the signals to diffract around obstacles leading to received signal power attenuation of even 40 dB 25,37 . Furthermore, the severe propagation losses due to the water vapor density and the temperature of the atmospheric medium make the THz wireless transmissions to heavily rely on the LoS component of the channel 12,19,43 . In this sense the THz band has non-rich multipath environments. However, still there are surfaces that can act scatters in the THz band 12,13,17,21 . This leads to some reflected multipath components with a significant amount of power that arrive to the RX from NLoS directions. Despite this fact, still the number of measured multipath components is not sufficient enough to perform small-scale fading statistics analysis for a THz channel. To tackle this limitation, one can generate different realizations of the transfer function by changing the phases of the multipath components 18,44 . The phases are assumed to be stochastic following a uniform distribution in the interval (0, 2 π) . Then, the channel coefficient of a single-input-singleoutput (SISO) system can be obtained as 44 where ψ i ∼ U(0, 2 π) represents the random phase of the ith multipath component. Moreover, by assuming that the amplitude of the channel coefficients does not change dramatically among the progressing t i , i.e. the channel can be considered as flat-fading then, t i = 0 44 . Also, the term U(·, ·) is the uniform distribution operator 30 .
The α-µ distribution. The α-µ distribution has been widely used in describing the small-scale fading statistics of RF wireless channels. It offers not only mathematical tractability, but also encapsulates as special cases important distributions of statistical analysis 38,45 . By setting the parameters α and µ to appropriate values one can obtain distributions such as Nakagami-m, Gamma, Rayleigh, Weibull, exponential and one-sided Gaussian. The PDF and CDF of α-µ are expressed as 38 where, β and µ are obtained as 38 The parameter α > 0 expresses the non-linearity of the received signal envelope due to the propagation environment, while the parameter µ > 0 stands for the number of the multipath components of the received signal 38 . The non-integer values of µ may be justified as non-zero correlation among the in-phase and quadrature parts of the multipath component, non-zero correlation between different clusters of multipath components, or non-Gaussianity of the in-phase and quadrature components of the fading signal 38 . Moreover, X is a random variable (r.v.) following the α-µ distribution, meanwhile Ŵ(·) and Ŵ(·, ·) stand for the gamma function and the upper incomplete gamma function, respectively 46  The Rice, Nakagami-m, Rayleigh and Lognormal distributions. The Rice, Rayleigh, Nakagami-m and lognormal distributions are widely used in modeling the fading statistics of RF wireless channels, while they have been also used to model the small-scale fading of THz wireless channel measurements 15,17,18 . The PDF and CDF of the Rice distribution is expressed as 30  where Q 1 (·, ·) is the fisrt order Marcum-Q function 47 . The parameter K represents the ratio of power of the LoS signal component to the other NLoS signal components, while R stands for the average received signal power. The PDF and CDF of the Nakagami-m distribution are obtained as 30 β = α E(X α ), where σ is the variance. The PDF and CDF of the lognormal distribution are obtained as 39 Eqs. (2.1) and (2.4) where µ L and σ L stand for the mean and standard deviation, respectively.
Evaluation of the fitting. To evaluate the fitting of the Rice, Nakagami-m and α-µ fading distributions to the empirical distribution of the channel gain of each link two goodness of fit methods were employed. Namely, the Kolmogorov-Smirnov and Kullback-Leibler divergence tests.

Kolmogorov-Smirnov goodness of fit test. The Kolmogorov-Smirnov goodness of fit test is
defined as 40 where F emp (x) and N stand for the empirical values of the channel gain CDF of the examined link and the number of discrete samples of F emp (x) , respectively. The parameter F(·) denotes the analytical CDF of the examined distribution, while A = 5% is the selected significance level.
Kullback-Leibler divergence test. The Kullback-Leibler divergence test is defined as the distance between the empirical PDF f emp (x) and the analytical PDF f (x) of the examined distribution 41 The closer the value of Eq. (18) to 0 the better is the fit of the analytical fading distribution to the empirical channel gain distribution.

Data availability
The data are owned by Nokia Bell-Labs and Aalto University. Any researcher affiliated to one of the ARIADNE project partners is allowed to access and use the shared data for research purposes. The shared data must however not be made accessible to any person not affiliated with any ARIADNE project partner. , .