## Background & Summary

Observations show that the Arctic is warming at a greater rate than the global average, a feature known as Arctic Amplification1,2,3. Complex mechanisms that are not yet fully understood contribute to the enhanced warming. Water vapour is the strongest greenhouse gas and plays a major role in several processes related to the amplification, but yet to an uncertain degree (i.e., downward longwave radiation flux, clouds, precipitation)3,4,5. The enhanced sea ice loss in the Arctic in summer and autumn causes a greater evaporation, increasing the water vapour load of the warmed atmosphere, which can contain more moisture2,4. Moist air frequently intrudes into the Arctic, impeding sea ice formation and driving the retreat of the sea ice edge by increased net radiative warming and mechanical forcing6,7,8. Strong moisture transports with a filamentary geometry are called Atmospheric Rivers9, where information with a high temporal resolution is needed to capture the water vapour variability.

Within the past decades, a robust increase of moisture has been detected in the Arctic for certain regions and seasons10,11,12. The increase of moisture content enhances the downward longwave radiation flux and therefore contributes to warming. Especially the autumn and winter months in the Barents Sea and Arctic Ocean are affected by positive moisture trends12. However, inconsistencies in the moistening trend among reanalyses call for reliable reference data to evaluate them in the data sparse region of the central Arctic. Radiosonde and satellite data are assimilated in reanalyses and therefore not suitable for independent evaluation. Additionally, water vapour estimations from different satellites disagree among each other, partly due to different measurement principles13. Despite the accuracy and high vertical resolution of water vapour and temperature profiles from radiosondes, low sampling rates (one to four sondes per day) and the poor spatial coverage of launch sites in the Arctic impede an adequate representation of the water vapour variability. Remote sensing in the microwave spectrum (satellite- or ground-based) is generally less accurate (lower vertical resolution) and faces several difficulties but has the potential to fill the gaps: Microwave radiometers (MWRs) on board polar orbiting satellites can sample the entire Arctic more than once per day even in cloudy conditions but suffer from uncertainties, for example, due to the lack of knowledge of the highly variably sea ice emissivity14 and coarse vertical resolution.

Robust reference water vapour data sets are required for process studies and to evaluate reanalyses and satellite products in the Arctic. The Multidisciplinary drifting Observatory for the Study of the Arctic Climate (MOSAiC) expedition15,16 from September 2019 to October 2020 offers a unique set of detailed measurements in the central Arctic. During the expedition the research vessel (RV) Polarstern17 from the Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research (AWI) drifted with the sea ice to investigate coupled atmosphere-ice-ocean processes in the central Arctic to ultimately improve climate models. In this data descriptor, we focus on measurements from the two MWRs MiRAC-P (Microwave Radiometer for Arctic Clouds - Passive), a high frequency MWR especially tailored for low water vapour conditions, and HATPRO (Humidity and Temperature Profiler), a standard MWR commonly used for monitoring of integrated water vapour (IWV). The multi-frequency HATPRO also allows for thermodynamic profiling. MWRs are the the only measurement systems to derive the total cloud liquid (liquid water path (LWP)) in all cloud conditions. From the MiRAC-P observations, we only present the IWV but humidity profiling and LWP derivation will be explored in the future.

The data introduced in this descriptor will be the base of upcoming studies within the Transregional Collaborative Research Centre TR 172 “Arctic Amplification: Climate Relevant Atmospheric and Surface Processes, and Feedback Mechanisms (AC)3”18 to study the influence of water vapour and its variability on Arctic Amplification. They can support process studies with high quality IWV and LWP, as well as examinations of boundary layer developments with temperature and humidity profiles with a temporal resolution of one second. Furthermore, the data sets can be used as reference for the evaluation of satellite water vapour products and reanalyses.

## Methods

In this section, we describe the two MWRs HATPRO and MiRAC-P and their measuring principles. Both radiometers were manufactured by RPG-Radiometer Physics GmbH (RPG). In the following, the regression for HATPRO and Neural Network for MiRAC-P to derive meteorological quantities from the raw sensor data are elaborated.

### Microwave radiometers on board Polarstern

The University of Cologne’s MiRAC-P27 (RPG-LHUMPRO-243-340 G5) is a passive MWR that measures atmospheric radiances as TBs at a temporal resolution of 1 s with six channels along the 183.31 GHz (G-band) water vapour absorption line and two window channels centered at 243 and 340 GHz. It was mounted next to HATPRO during the MOSAiC expedition. The six double-sided G-band channels are located at 183.31 ± 0.6, ±1.5, ±2.5, ±3.5, ±5.0, and ±7.5 GHz and, together with the window channels, can be used to derive IWV, LWP, and humidity profiles. The window channels at 243 and 340 GHz feature much higher opacities than the HATPRO window channels because the water vapour continuum absorption strength increases with frequency in the microwave spectrum24. At these frequencies, ice particles in clouds scatter atmospheric radiation causing uncertainties in radiative transfer modelling. The G-band water vapour absorption line is significantly stronger than the 22.24 GHz line and can get saturated if the water vapour load is sufficiently high, making the atmosphere opaque. Then, the TBs in the inner G-band (close to the 183.31 GHz line) are in the range of about 240–280 K, depending on the low-tropospheric temperature and moisture distribution. As it will be pointed out later, we can exploit the different absorption line strengths for a complementary usage of HATPRO and MiRAC-P. All MiRAC-P channels use a double side band heterodyne receiver design and have a half-power beam-width ranging from 1 to 1.3°. The off-axis parabolic mirror allows to point the radiometer to 0–180° elevation for sky view or to the internal ambient temperature calibration target (accuracy 0.2 K). During MOSAiC, MiRAC-P operated in zenith mode only. The measurement noise is below 0.5 K for all channels at one second integration time.

### HATPRO: Retrieval via regression

In order to apply the regression with linear or quadratic terms, an example of the latter is given in Eq. (1), coefficients that map TBs to the desired meteorological quantities (IWV, LWP, absolute humidity and temperature profiles) need to be derived by training (c0, c1, and c2). The IWV of the k-th sample in the training data set (N samples in total) is computed by

$$IW{V}_{k}={c}_{0}+\mathop{\sum }\limits_{i=1}^{m}\left({c}_{1,i}\cdot T{B}_{k,i}+{c}_{2,i}\cdot T{B}_{k,i}^{2}\right)\quad {\rm{with}}\,k=1,\ldots ,{\rm{N}}$$
(1)

### MiRAC-P: Retrieval via Neural Network

Given the saturated 183 GHz line, the retrieval problem is strongly non-linear for MiRAC-P. Therefore, we developed a Neural Network (NN), which is described following the published script39, based on Python’s tensorflow and keras modules to retrieve IWV. The idea of a NN is to process a given input (e.g., TBs) through one or more hidden layers, connected by so-called activation functions, to generate an output (e.g., IWV). We have refrained from using the Ny-Ålesund radiosonde data for the training of the MiRAC-P IWV retrieval because the dry conditions, where the sensitivity of this instrument is best, were not sufficiently represented. Instead, the training and test data consist of the ERA-Interim (ERA-I) reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF)40 and simulated TBs. The total number of samples is 24835, distributed over 8 virtual stations (certain grid points) north of 84.5° N and a period from 2001-01-01 to 2017-12-31 with data samples at 00, 06, 12, and 18 UTC41. A subset of 12 years from the entire data set, which has been provided by the instrument manufacturer RPG, has been randomly selected as training and the remaining 5 years as test data. Each double side band frequency of the simulated G-band TBs has been averaged to be comparable to the measurements of MiRAC-P.

To obtain a more robust result from the training, we performed the training and evaluation (with the test data) 20 times with different random number seeds. The 20 random seeds were obtained by producing a set of 20 random numbers that lie between 0 and 1000 (boundaries have been chosen arbitrarily). At the beginning of the loop, the seeds of numpy’s and tensorflow’s random number generator were set to the random value. This random value affects the choice of training and test years because a permutation of an index ranging over all years (0–16) defines which ones are selected for training and testing. Of the 17 permuted indices, the first 12 (last 5) mark the training (test) years, respectively. For example, the test data can be 2002, 2006, 2007, 2010, 2016 with the remaining years being used for training. Furthermore, the initialization of the weights in the NN is affected by the seed of tensorflow’s random number generator. As for HATPRO, we also added a random Gaussian noise to the synthetic TBs with a strength of 0.75 K for the G-band channels, 4.2 K for the 243 GHz, and 4.5 K for the 340 GHz channel, as recommended by the manufacturer. The higher noise for the two window channels reduced their weights in the retrieval and therefore diminishes the impact of signals from sources other than water vapour, such as cloud liquid emission or radiation scattered at ice particles. In correspondence with RPG, we chose the input vector of the NN to consist of all MiRAC-P TBs of a time step and the cosine and sine of the day of the year as additional information. The input was scaled to a feature range of −3 to 1 using the MinMaxScaler of the sklearn.preprocessing module. The input layer is connected to the only hidden layer, which has 32 nodes, with an exponential activation function. A linear activation function then links the hidden with the output layer, which only consists of the retrieved IWV. All layers are fully connected. The kernels of the layers are initialized with the default Glorot uniform distribution. Similar to a regression approach, the goal of the training procedure is to adapt the weights of the NN to minimize a loss function that evaluates the predicted with the target IWV. In our case, the mean squared error is used as loss function, minimized with the Adam optimizer42. The maximum number of training epochs (number of times the entire training data is cycled through) is 100 with a batch size (number of samples to estimate error gradient before weights are updated) of 64.

During the optimization process, the mean squared error of the test data is monitored to avoid overfitting. Once the test loss did not improve for at least 20 epochs, the training was stopped and the weights that resulted in the lowest test loss were saved. As mentioned before, the training procedure was performed with 20 different randomly chosen seeds to assess the robustness of the NN. Hence, we get the mean and spread of the retrieval performance, quantified by the standard deviation (see Eq. (2)), from the test data (0.55 ± 0.03 kg m−2). This value is also noted as a comment in each published retrieval file of the MiRAC-P43 and computed as the square root of the bias corrected test loss

$$\widetilde{\sigma }=\sqrt{\frac{1}{{\rm{N}}}\mathop{\sum }\limits_{k=1}^{{\rm{N}}}{\left(IW{V}_{{\rm{pred}},k}-Bias-\hat{IW{V}_{k}}\right)}^{2}}$$
(2)

with $$\hat{IW{V}_{k}}$$ being the test data and IWVpred,k the predicted IWV of the k-th sample. The bias is the mean difference between the target (in this case, ERA-I) and predicted IWV. After training, the model is applied to the observed TBs from MiRAC-P with the random seed that produced the lowest overall test loss (seed value: 558).

## Data Records

In this section, the data for the retrieval developments, the measured TBs, and retrieved products are presented for both HATPRO and MiRAC-P. The files for the retrieval development28,30,41 have been uploaded to Zenodo, while the remaining files43,44,45,46 have been published on PANGAEA. All data files are in netCDF format and summarized in Table 1.

The retrieval training data for HATPRO30 consists of one file that contains the entire training and test data for the retrieval of temperature (variable name in the file: ta) and humidity (hua) profiles, IWV (prw), and LWP (clwvi) from TBs (tb) measured by HATPRO. The data set is composed of meteorological observations from radiosondes, and simulated TBs. Elevation angles (ele) lower than 90° are only needed for the boundary layer temperature profile. Nomokonova et al.29 created the regression coefficients28 for zenith temperature (tze) and humidity (hze) profiles, boundary layer temperature (tel) profiles, and for IWV (iwv) and LWP (lwp) with this training data set.

The retrieval training data for MiRAC-P41 has been provided by the manufacturer RPG and consists of one file that contains the entire training and test data for the retrieval of IWV (prw) from TBs (tb) measured by the MiRAC-P. The sine and cosine of the day of the year, computed from the time variable, are also included. The outline of the data set has been given in the previous section.

The HATPRO TB data set44 contains daily files of atmospheric radiance measured as TBs (tb) during zenith (file name contains mwr00) and elevation (file name contains mwrBL00) mode. The retrieved products from HATPRO TBs include daily files of IWV (prw), LWP (clwvi), temperature (ta) and humidity (hua) profiles45. Temperature profiles have been retrieved from both zenith (filename contains mwr00) and elevation (filename contains mwrBL00) modes. The uncertainties of the variables are denoted by the expected standard error (prw_err, clwvi_err, hua_err, ta_err). The measured and retrieved data cover the period 2019-10-19 to 2020-10-02. Flag values indicate the quality of the data. The latitude and longitude coordinates of both instruments have been taken from RV Polarstern track data47,48,49,50,51.

The MiRAC-P TB data set46 is likewise structured as daily files of atmospheric radiation measured as TBs (tb). The TBs of the double side band frequencies (G-band) are averaged and labeled with the upper part of the band (e.g., 190.81 GHz instead of 183.31 ± 7.5 GHz). Similar to the training data41, the sine and cosine of the day of the year are included for the NN retrieval. The retrieved IWV (prw)43 from MiRAC-P TBs is also compiled into daily files. The IWV uncertainty computed from the retrieval test data is noted as a comment to the retrieved variable and is also given in three categories (dry: [0,5), intermediate: [5,10), moist: [10,100) kg m−2).

## Technical Validation

In this section, we first discuss the accuracy of TBs and subsequently demonstrate the quality of the derived products — IWV, LWP, absolute humidity and temperature profiles — by comparing them, where possible, to radiosonde observations. Since there is no direct measurement for LWP, we refer to past studies that show the quality of LWP derived from HATPRO19,52. Additionally, we compare our LWP with that from the Atmospheric Radiation Measurement (ARM) research facility MWR. The codes to analyze the derived products and generate Figs. 15 are openly available39 (see also Table 3).

The retrieved temperature and humidity profiles, as well as the IWV, will be compared with radiosonde measurements that have been gathered during the MOSAiC expedition53. The radiosondes have been launched from RV Polarstern at least four times per day. We have converted the relative humidity to specific and absolute humidity by using the saturation water vapour pressure method suggested by Hyland and Wexler54. Then, we integrated the specific humidity over the pressure levels and divided by the standard gravitational acceleration to obtain IWV. For the comparison of the temperature and humidity profiles with HATPRO, we interpolated each radiosonde onto the height grid of HATPRO profiles. Radiosondes that did not reach at least 10 km altitude and that contained missing values have been rejected in the analysis (23 out of 1522). Drifts of the radiosondes with wind and uncertainties of the temperature, relative humidity, and pressure sensors, which are 0.2–0.4 K, 3–4%, and 0.6–1.0 hPa, respectively53, are error sources for the comparison with MWR data.

### Brightness temperatures

Before the retrievals are applied, the quality of the measured TBs was checked following the procedure suggested by Löhnert et al.23. This involved the flagging of time steps when the rain flag is set, when the sun is within ±7° (elevation and azimuth) of the line of sight of the instrument, when TBs exceed the range 2.7–330.0 K, and when a receiver sanity check fails. The receiver sanity check is based on status flags of an internal procedure implemented by the manufacturer RPG in the housekeeping files of the MWRs, respecting also the receiver stability. Besides automated checks, a manual inspection of the TB data was performed to flag those time steps that show obvious artifacts not related to atmospheric signals (i.e., the crane at the bow of RV Polarstern causing sudden leaps in the TBs). In the following examinations, only time steps with good quality (flag = 0 or nan) have been used. The dates when the MWRs were calibrated with liquid nitrogen19, to ensure the absolute accuracy of the TBs, are given in Fig. 1. On 2019-10-19, 06:30 UTC, the first calibration of MiRAC-P was carried out but yielded values that differed significantly from previous tests or expectations because the calibration integration time exceeded the maximum value supported by the software. Therefore, the calibration was repeated on 2019-10-22, 05:40 UTC. The MiRAC-P did not require as many calibrations as HATPRO because it showed a negligible drift of TBs over time whereas HATPRO is a standardized instrument recommended to be calibrated every 3 months19. Slight jumps in the retrieved data can be found around calibration times. For example, the most noticeable and concurrently the highest absolute jump in IWV is 0.3 kg m−2 on 2020-03-01, 11:00 UTC in the HATPRO data.

### Derived products

After applying the retrieval algorithms, the meteorological quantities were inspected whether or not they lie within a reasonable range. LWP must be within [−200,3000] gm−2, IWV in [0.0,100.0] kg m−2, temperature in [180.0,330.0] K, and absolute humidity in [−0.5,30] gm−3, otherwise a flag value was set. The lower end of the thresholds for LWP and absolute humidity are chosen to respect slightly negative values that might result from the regression. For LWP, a further processing step is done. Potential offsets in LWP can be partly corrected using a clear-sky offset correction. Under clear-sky, i.e. here liquid-free, conditions, the LWP should be zero. To determine if a scene is liquid-free, the standard deviation of LWP within a 2-min time interval was analyzed. If this value is below a certain threshold, we assume that no liquid occurs. The threshold depends on the instrument and climate of the location. Based on visual inspection of the derived LWP and also cloud radar reflectivity, the best offset correction was achieved with a LWP standard deviation threshold of 1.5 gm−2 for almost the entire MOSAiC period. Only on 2020-07-10, −11, and −12 we used 0.9 gm−2 because the other value resulted in highly negative LWP. If all 2-min intervals within a 20-min time window indicate liquid-free conditions, the mean value of the retrieved LWP is calculated and subtracted from the original values. For cloudy periods, the estimated offset values during clear-sky periods are linearly interpolated and subtracted from the retrieved LWP.

#### Integrated water vapour

The MOSAiC expedition gave the opportunity for high quality water vapour measurements in the central Arctic for an entire year. This allows to capture the vast contrasts between winter (polar night) and summer (polar day). The contrast is nicely reflected in the IWV time series over all five MOSAiC expedition legs (measurement periods with a certain scientific crew) from both MWRs and the radiosonde data (see Fig. 1). In winter, the net outgoing longwave radiation and missing energy input from the sun can cause temperatures to drop to values below −35 °C55 making the air extremely dry due to the Clausius-Clapeyron relation. IWV is frequently below 4 kg m−2 from December 2019 to mid-April 2020 and can even be as low as 0.8 kg m−2 (February and March 2020). Only during occasional storms the IWV peaks above 5 kg m−2 (i.e., mid February 2020). As soon as the melt season commences in late spring (May 2020), the IWV shows much higher values (up to 30 kg m−2) and a greater variability on synoptic scales (few days). In general, when merely considering the time series, all three data sets capture the extreme differences between winter and summer very well, proving the capability of the MWRs to capture the full range of IWV conditions. During synoptic events, such as cold air outbreaks or moist air intrusions, the benefit of the MWRs compared to the radiosondes is obvious. The MWRs capture the temporal evolution of IWV much better with their resolution of 1 s than the radiosondes, which were mainly launched four times a day during the expedition. IWV variabilities, gradients and extreme values, of which the latter might be missed by radiosondes, can be resolved at time scales of minutes or even seconds56. The extraordinarily strong moist air intrusion that occurred in mid-April 2020 is shown in greater detail in the Usage Notes as an example of the retrieved products.

To analyze the differences between radiosondes and the MWRs, the data sets are displayed against each other in Fig. 2. For the comparison, the MWR data has been averaged over 15 minutes starting from the radiosonde launch times. The standard deviations of these 15-minute periods are shown as error bars and indicate the noise but also the variability of the retrieved products. When we omit radiosondes that failed the quality check (as noted above) and MWR data where the flag value does not indicate good quality, a total of 1327 (1404) radiosonde launches are left to compare to HATPRO (MiRAC-P) data. From Fig. 2, the complementary nature of HATPRO and MiRAC-P is visible. The MiRAC-P agrees better with radiosondes in dry conditions compared to HATPRO, which indicates the superior sensitivity of the strong G-band water vapour absorption line. To point out the complementary precision of MiRAC-P and HATPRO, Table 2 summarizes the standard deviations (computed as in Eq. (2), but with $$\hat{IW{V}_{k}}$$ representing the radiosonde and IWVpred,k the MWR), biases, and root mean squared errors with respect to the radiosonde IWV for three IWV classes (dry: [0,5), intermediate: [5,10), moist: [10,100) kg m−2). On average, HATPRO shows a bias of 0.35 kg m−2 for IWV smaller than 5 kg m−2 (see Table 2). Below 3.5 kg m−2, the bias ranges from 0.25 to 0.75 kg m−2. Here, higher biases occur in the drier conditions (lower IWV). Due to the superior sensitivity of MiRAC-P in dry conditions, a bias nearly three times lower (0.12 kg m−2 instead of 0.35 kg m−2) can be seen for IWV smaller than 5 kg m−2. In the dry regime, the MiRAC-P features a considerably lower standard deviation (0.08 kg m−2) than HATPRO, which shows 0.19 kg m−2. Even in the range 5–10 kg m−2, the majority of the MiRAC-P data denotes differences to the radiosondes within [−0.25, +0.25) kg m−2 resulting in a bias of 0.0 kg m−2, while the standard deviations of both MWR retrievals are similar (≈0.3 kg m−2) in that IWV range. When the IWV is greater than 10 kg m−2, the retrieved IWV from MiRAC-P starts to scatter because the atmosphere becomes opaque to the G-band channels close to the absorption line. In other words, these channels become saturated and an increase in IWV does not change the TB any longer (e.g.57,58). The higher the IWV, the more channels further away from the absorption line are affected by this saturation effect. The radiative transfer simulations of the training data have shown that the 183.31 ± 7.5 and 243 GHz channels are the only frequencies that can still detect IWV increases through TB changes for IWV above 15 kg m−2. But in these frequencies and moist conditions, many TBs map to the same IWV so that no clear relation between the TBs and IWV can be inferred. This could explain the strong scattering of IWV from MiRAC-P when compared to the radiosonde measurements in moist conditions as seen in Fig. 2, resulting in a standard deviation of 1.47 kg m−2 (see Table 2). HATPRO shows the opposite behaviour for high IWV, having an uncertainty of 0.39 kg m−2, which is almost a factor of 4 lower than the uncertainty of MiRAC-P.

When considering the entire IWV range, the bias of the MiRAC-P (HATPRO) product is −0.11 kg m−2 (0.02 kg m−2), with a standard deviation of 0.82 kg m−2 (0.46 kg m−2). Compared to the Global Navigation Satellite System (GNSS) IWV retrieval performed by Männel et al.59, who found a bias of 0.08 ± 0.04 kg m−2 and a root mean squared error of 1.47 kg m−2, the two MWRs yield more precise estimates of IWV, and HATPRO also a higher accuracy, when considering the entire range. The ARM research facility also derived IWV from their two-channel MWR, which was also located onboard RV Polarstern60. Their retrieval (MWRRET) combines a statistical and physical approach (Optimal Estimation), that also takes surface observations and radiosonde IWV into account, to generate a best estimate IWV data set (for a detailed description, please see61). ARM’s MWR provides a IWV record with a lower temporal resolution (26 seconds on average) and a roughly 20-day long data gap in August 2020. Both the lower resolution and the gap have to be taken into account when comparing their product with ours from HATPRO and MiRAC-P. Reducing the radiosonde and the three MWR data sets to a common time grid, where all quality flags indicate good quality, leaves us with 813 radiosondes to compare. The MWRRET best estimate yields a bias (standard deviation) of −0.21 kg m−2 (0.44 kg m−2), while our products show −0.01 kg m−2 (0.44 kg m−2) and −0.08 kg m−2 (0.75 kg m−2) for HATPRO and MiRAC-P, respectively. Below 5 kg m−2 (10 kg m−2), the performance of our products is especially good, having a standard deviation of 0.19 kg m−2 (0.29 kg m−2) and 0.07 kg m−2 (0.28 kg m−2) for HATPRO and MiRAC-P, respectively, while it is 0.40 kg m−2 (0.48 kg m−2) for the MWRRET best estimate.

#### Liquid water path

The LWP is an important quantity for the evaluation of reanalyses and radiation balance. HATPRO, MiRAC-P and the two MWRs from the Atmospheric Radiation Measurement research facility (of which the three-channel MWR did not operate during most of the time16) are the only instruments onboard RV Polarstern capable of retrieving LWP in all cloud conditions. Throughout the MOSAiC expedition, the LWP features a distinct seasonal variability (see Fig. 3) with seasonally averaged daily mean LWP of 8, 25, 91, and 40 gm−2 for winter (December–February), spring (March–May), summer (June–August), and autumn (September–November). Also the variability of the daily mean within a season, computed as seasonal standard deviations of the daily mean LWP, shows an annual cycle with 15, 38, 67, and 49 gm−2 for winter, spring, summer, and autumn, respectively. In summer, daily average LWP can exceed 250 gm−2. This seasonality was also seen at Ny-Ålesund by Nomokonova et al.29. Higher values of LWP frequently occur in conjunction with high IWV because the moister air masses tend to generate more or deeper clouds. Former studies have proven the quality of the retrieved LWP, having an uncertainty of merely 14–23 gm−2 19,52. In winter, when LWP is frequently within the uncertainty range (see Fig. 3), the LWP estimates must be considered with care. Although retrieval noise might still result in slightly negative LWP, the clear-sky offset correction improved LWP biases. Comparing LWP derived from HATPRO with the best estimate from ARM’s two-channel MWR (MWRRET)60, we find that more than 81% of the data values agree within ±17.5 gm−2 and 93% within ±27.5 gm−2. For the comparison both data sets have been merged onto the same time grid due to differences in temporal resolution and data availability (as for IWV, see above).

The data availability in Fig. 3 shows the fraction of non-flagged (flag = 0 or nan) values to the total number of data points of a day. During MOSAiC leg 1 (2019-09-20–2020-12-13), the internal sanity check of HATPRO frequently indicated a problem with the receiver of the V-band channels. The problem did not persist beyond the calibration on 2019-12-14, 18:30 UTC from where on the fraction of flagged values decreased significantly.

#### Humidity profiles

Humidity profiling from HATPRO data is more challenging than estimating the integrated amount because of the low information content (usually 1 to 3 independent pieces of information23). The dry conditions of the Arctic and the frequent occurrence of strong vertical gradients and moisture inversions62,63,64,65 impede it further. The retrieved absolute humidity profile may still contain slightly negative values in high altitudes because of retrieval noise but flags are set for values below −0.5 gm−3.

As for the comparison of IWV from HATPRO and radiosondes, the HATPRO data has been averaged over 15 minutes, starting from each radiosonde launch time, to evaluate the retrieved absolute humidity profiles. Systematic differences (bias) are expressed as the mean difference of absolute humidity over time on each height level (Δρv = ρv,HATPROρv,radiosonde) in absolute and relative terms (Fig. 4a). The latter has been normalized by the mean absolute humidity from radiosondes after averaging. The standard deviation of absolute humidity with radiosonde data as reference is also given in relative and absolute terms. As above, the relative term of absolute humidity standard deviation shown in Fig. 4b has been computed by normalization with the mean radiosonde absolute humidity after determining the absolute term. While this procedure (normalizing after averaging) may not capture the individual relative differences for each radiosonde (normalizing before averaging), it is sufficient to give an idea of the relative uncertainty of the retrieved humidity profiles. We computed the bias and standard deviation for each MOSAiC leg so that Fig. 4 displays the mean (standard deviation) of these quantities over the legs as black lines (shading).

In the lowest 1.5 km, HATPRO overestimates the absolute humidity with the highest bias (0.6 gm−3 or 25%) at the surface. Further above, the bias becomes negative, up to about −0.1 gm−3 at 2–3 km height, and approaches zero in the remaining atmospheric column (up to 10 km). This is a typical behaviour when humidity inversions or strong moisture gradients are smoothed out in the retrieved profile. The integrated humidity content (IWV) stays free of bias when a positive bias at the surface is balanced by a negative one in greater heights. In winter, when the humidity is low, the relative bias is usually higher than in summer.

At the surface, the standard deviation is 0.59 gm−3 or 25% in absolute or relative terms, respectively (see Fig. 4b). Because of the general decrease of absolute humidity with height, the standard deviation in absolute values also approaches zero with values of 0.41 gm−3 at 2 km height and 0.02 gm−3 at 8 km height. However, the relative standard deviation increases to 27% and 58% at those heights, respectively. Therefore, above 5 km altitude, when the standard deviation is near 50%, the retrieved profile from HATPRO must be considered with care.

Past studies have found standard deviations of HATPRO-retrieved to radiosonde profiles of 0.9–0.6 gm−3 in the lowest 2 km66,67, which are slightly higher than those found here (0.59–0.41 gm−3). However, their studies were carried out with data in the mid-latitudes, where the water vapour load is much higher. Ebell et al.66 found a relative uncertainty of 12% in the lowest 2 km, while our analysis shows 22–27%. To reduce humidity profile uncertainties and improve the information content, we are thus working on a synergetic retrieval of IWV and humidity profiles combining the measurements of both HATPRO and MiRAC-P.

#### Temperature profiles

HATPRO temperature profiles from the zenith mode have been averaged the same way as absolute humidity profiles for the comparison with radiosonde data. The measurements in boundary layer mode, performed only once every 30 minutes, were averaged over ±30 minutes around radiosonde launch times due to the lower sampling rate. In Fig. 5, the bias and standard deviation profiles can be seen for both measurement modes. Shading indicates the variability over the MOSAiC legs as described in the previous section and also shown in Fig. 4. In the lowest 800 m, both modes show biases that quickly change with height (zenith: 1.2 to 2.0 to −2.8 K, elevation: −1.4 to 0.2 to −1.2 K). Radiative cooling over sea ice causes strong surface temperature inversions persisting almost throughout the entire winter10,68,69. The presence of low clouds may also generate inversions due to cloud top cooling64,70. In summer, when the solar energy is used to melt the snow and sea ice71, temperatures remain close to the freezing point at the surface despite the possible presence of warmer air masses aloft. Therefore temperature inversions are weaker and located at greater heights in summer68. Although the rapid changes of temperature over a few hundred metres below 1 km altitude cannot be resolved by HATPRO, the boundary layer mode denotes lower biases (standard deviations) than the zenith mode by up to 2 K (0.5 K). The standard deviation in 0–2 km height is 0.7–1.7 K (1.2–2.7 K) for the boundary layer (zenith) mode and therefore higher than the values found by Löhnert and Maier26 (0.5–1.4 K). This is likely due to the nearly permanent presence of inversions in the low Arctic troposphere. The switching signs of the biases up to 2 km height suggest that the inversions are smoothed out with a warm bias at their lower and a cold bias at their upper end for the zenith mode. Averaged over height and time, the boundary layer mode features a consistent cold bias.

Above 2 km altitude, the bias of the zenith mode is smaller compared to the boundary layer scan (−1.0 to 0.4 K vs. −2.6 to 0.0 K). Up to about 4.5 km altitude, the standard deviations are similar, increasing from 1.4 to 1.8 K (see Fig. 5). In greater heights, the standard deviation of the zenith temperature profile increases (up to 2.5 K in 8 km) because information comes from more height levels at once (broader weighting functions) providing less distinctly resolved levels in these altitudes. At heights between 2 and 4 km, we found similar uncertainties as in Löhnert and Maier26, who identified standard deviations of 1.4–1.7 K.

## Usage Notes

In this section, we give some recommendations on handling the MWR data and provide an example of their capabilities for using them in a case study. When importing either the TB data or the retrieved products, data where the flag is not 0 or nan must be considered with care (remark: Python library xarray converts the good quality indicator to nan while netCDF4 leaves it at the fill value 0). Importing the zenith temperature and humidity profiles from HATPRO over a long time period might result in large memory usage when using a library like numpy (one month of zenith temperature profiles results in roughly 100 million data points of type float32). Therefore, downsampling or the usage of a library like xarray, which compresses the data, is highly recommended. The comparison of the HATPRO temperature and humidity profiles with radiosonde measurements is just one example of downsampling. Regarding the IWV, MiRAC-P should be used for values lower than 5 kg m−2 and HATPRO for values greater than 10 kg m−2 to optimally exploit the data sets. A transition zone from MiRAC-P to HATPRO IWV could be established in the range 5–10 kg m−2 where both instruments work similarly well. For temperature profiles, a combination of zenith and boundary layer modes (0–2 km: boundary layer mode, 2–10 km: zenith mode) yields the best estimate.

The MOSAiC data policy requires a moratorium for the TB and retrieved data products until 2023-01-01. Only researchers that are a part of the MOSAiC community will have access before that date.