Observation based climatology Martian atmospheric waves perturbation Datasets

The Martian atmospheric waves perturbation Datasets (MAWPD) version 2.0 is the first observation-based climatology dataset of Martian atmospheric waves. It contains climatology-gridded temperature, gravity waves, and tides spanning the whole Martian year. MAWPD uses the Data INterpolating Empirical Orthogonal Functions method (DINEOF) reconstruction method for data assimilation with the observational data from the Mars Global Surveyor (MGS), Mars Reconnaissance Orbiter (MRO), Mars Atmosphere and Volatile EvolutioN (MAVEN), Mars Pathfinder (MP), Mars Phoenix Lander (MPL), Mars Exploration Rover (MER) and Mars Express (MEX) temperature retrievals. The dataset includes gridded fields of temperature (Level 1 data) as well as the physical quantities of GWs (Level 2 data, amplitude, and potential energies), SPWs and tides (Level 2 data, amplitude, and phase). The MAWPD, based entirely on multiple reliable observations, provides climatological background atmospheric information of temperature and wave disturbances on Mars. The dataset is not only useful for observation-based scientific studies concerning Martian atmospheric waves, e.g., circulation, dust storms, and wave excitation mechanism, but also for cross-validating with model-based datasets or model results.


Methods
The MAWPD 29 is based on the grids established by Martian atmospheric temperature retrievals from instruments on multiple orbiters and landers, while the DINEOF reconstruction method is used to fill in the missing data of grids. The reconstructed grids are then used to calculate all the waves including tides, GWs, and SPWs.
Step1: Data acquisition and preparation of Multiple Spacecraft. To fully exploit the existing observation data, various spacecraft are used. Based on the temperature profiles of the radio occultation (RO) 45 and Thermal Emission Spectrometer (TES) [46][47][48][49] data of MGS 30,50,51 , the Mars Climate Sounder (MCS)'s nearly continuous limb viewing data of Mars Reconnaissance Orbiter (MRO) 31,32 , the Imaging Ultraviolet Spectrograph (IUVS)'s occultation data 52,53 of MAVEN 33,34 , the RO 54 and Spectroscopy for the Investigation of the Characteristics of the Atmosphere of Mars (SPICAM) 38,55 data of MEX 56 , the entry ACC data of MP 35,57 , MPL 36,58 , MER1 and MER 37,59 (in order of the number of profiles used), the climatology gridded temperature data is calculated.
In the dataset, the MGS/TES data including two hundred and thirty million TES profiles (both the nadirand limb-geometry) covering from Ls = 103.  they carry by polylines. The arrows connected to each instrument point to the data type (limb viewing data, nadir viewing data, radio occultation data, solar occultation data, stellar occultation data, and entry accelerometers data) they provide. The output data (MAWPD) is the reconstructed climatology gridded temperature data (Level 1 data) and the derived gravity waves, tides, and stationary planetary waves (Level 2 data).
www.nature.com/scientificdata www.nature.com/scientificdata/ retrieve temperature profiles. The nadir-geometry data have a relatively higher horizontal spatial resolution but its retrieval is limited to daytime regions and fails to cover the relatively cold polar cap, while the limb-geometry data has limited horizontal spatial resolution without geographic and time restrictions 60,61 . The MGS performed a 3.6-cm wavelength RO experiment with the signal transmitted by the spacecraft and received by a tracking station of the NASA Deep Space Network (DSN) 62 . The base of a typical RO profile is 500 m, the top is at the 10-Pa pressure level (40 km) with the sample spacing of 700 m, providing accurate measurement, especially during diametric occultation 30 .
Besides, nearly 5 million profiles from the MCS Derived Data Record (DDR) across a range of time from Ls = 111.3° of MY 28 to Ls = 38.9° of MY 36 continuously with time intervals less than the order of 10 −3 degree of Ls are used. The observation covers the altitude range from approximately 0 to 85 km and the local time (LT) is mainly between 1-5 and 13-17 due to the special in-track and cross-track or off-track observations of MCS. The two observation methods are performed alternately to get more LT coverage, which enables the coverage of six LTs in the lower and middle latitudes and seven or more LTs in the high latitudes 4 . Spatially, MCS is designed to acquire high vertical resolution, horizontally contiguous measurements (5 km vertical and 8.6 km horizontal at the limb 32 ). Its appropriate resolution and LT coverage are suitable for research of waves, having been conducted on tides 4,15,63 , GWs 64,65 , and SPWs 15,66 in several studies. The IUVS onboard MAVEN has also detected wave perturbations at altitudes between 20 and 140 km. We use 1193 temperature profiles in the middle atmosphere during the period from Ls = 314° MY 32 to Ls = 309.6° of MY 35 with a latitude range from 70° N to 80° S. As the geometric parameters of the tangential trajectory of the occultation were changing during the whole task, the variations obtained here include these geometrical changes as well as real atmospheric variations 53 68 . Despite the limited number of profiles, the high resolution of entry accelerometers data is beneficial to sustain the information of small-scale fluctuations like GWs. The trajectory reconstruction procedures of the entry accelerometers data of MP had enhanced its accuracy and can derive atmospheric temperature relatively consistent with the PDS archive without any other aerodynamic information 69 . The atmospheric density, pressure, and temperature of MPL 70 and MER 71 were calculated from the same reconstructed trajectory with MP.
Step2: Filtering out global dust storm activity. The entire premise of the dataset as a climatological average of all years requires that abnormal years must be excluded to ensure the universality of the results. Mars experiences regional dust storms, whose exact timing varies somewhat through the year, and interannually variable global dust storms. These storms affect temperatures by up to 60 K locally as well as gravity wave activity, as demonstrated by papers from the Mars Climate Sounder team or recently by researchers 72,73 .
In our present work to present interannual variability in a merged climatology, results are mainly affected by 25 and 28 years of global dust storms, and to a lesser degree by regional dust storms that occur in autumn and winter every year. To improve the issue, the second half (Ls 180-360) of the MY 25 and 28 are deleted to filter out global dust storm activity.
Regional dust storms are preserved due to its relatively slight impacts compared to the global dust storm and that it can happen throughout the year, so it is difficult to remove and does not make much sense to results after removal. In addition, wave activities during regional dust storms are meaningful to study so the removal of the global dust storms could also make sure that regional dust storm effects were not confounding.
Step3: Super observation and weighting of data from Multiple Spacecraft. The main issue here is to solve that how much certain datasets were weighted relative to others. For instance, the entry profiles of the MER spacecraft were used, and their ability to sample gravity wave activity was noted. However, these profiles are one per spacecraft. They would be a drop in the bucket on a single day compared to the > 1000 profiles per day provided by Mars Climate Sounder. If they are not somehow weighted, there would be no point in including them all. And indeed, the same concern would apply to radio occultation data. A thermal mapping instrument like TES or MCS makes two to three orders of magnitude more profiles than an occulting instrument. There is almost no point in using occulting instrument data if there is no weighting algorithm being applied; only the thermal mappers will matter. Therefore, the dataset needs weighting of data from Multiple Spacecraft.
This dataset is divided into two types of data (lander data and orbiter data) according to the spacecraft type, and weighted according to the data type. Lander's entry accelerometers data are given the highest weight due to their accuracy, i.e., when there are both lander and orbiter data at the same point (at specific Ls, Lt, Altitude, Latitude, Longitude), only the lander at that point is considered and not the orbiter. When there is only orbiter data at the same point, the results of the orbiter are considered. Data of the same type have the same weight here, that is, when there is both MCS and TES data at the same point, the two are averaged.
www.nature.com/scientificdata www.nature.com/scientificdata/ To accomplish the weighting of instruments according to the observation error, we create the super observations 44,74,75 for each grid. The super observation, assumed to be at the grid centre, is the weighted average of all types of inner observation. The larger the observation error, the smaller the weight of the observation. Specifically, for each grid, if the observation errors are the same, the super observation value is the mean observation value; or the observation values with smaller (larger) error would have greater (smaller) weight when averaging. The disagreements in the temperature between datasets that result from instrumental or sampling biases are handled in the process. The mean standard deviation in each grid is used as the super observation standard deviation, which is useful in determining the general quality of data and is used as a proxy for the super observation error of grid here. The super observation standard deviation of temperature in each grid is provided in the data file 'MAWPD_v2.0_T.nc' for users' reference.
The sources of the adopted observation error should be mentioned. The MCS, MGS/RO, MEX/RO, MEX/ SPICAM, MPL, and MER1/2 all provide 1-sigma uncertainty along with the temperature data, while the MP provides the 3-sigma uncertainty (easily converted to 1-sigma uncertainty). However, the TES observations in the PDS did not provide estimates of the observation uncertainties but selected best quality (estimated instrument-noise uncertainty < 1%) temperature data. As the TES nadir atmospheric temperature for pressure levels greater than 0.1 mbar must be between 100 and 300 K while the temperature below (tropospheric air temperature) would barely exceed 300 K, the instrument-noise is then < 3.0 K (300 K·1%). Considering the random error, it is reasonable to assign a TES observation error of 3.0 K to the super observations, as EMARS team did 44 .
Step4: DINEOF reconstruction and data integration. The DINEOF method is used in the MAWPD 29 to fill the missing data. This method is based on the Empirical Orthogonal Function (EOF) to track features of the Martian atmosphere with high spatial and temporal variability. While the EOFs are obtained by a Singular Value Decomposition (SVD) representation of the data matrix X (L × M × N). The L represents the length of vertical spatial dimension, M represents the length of horizontal spatial dimension and N represents the length of temporal dimension, e.g., a pixel of such data X is the data X l,m,n at m-th zone in the horizontal section and l-th altitude of X in the n-th time. As the DINEOF was designed to work with two-dimensional fields evolving in time, the reconstruction of the three-dimensional Martian atmosphere is done by calculation of the horizontal section of the three-dimensional atmosphere at each height on a section-by-section basis. These horizontal sections at the l-th altitude are recorded as X l . Consequently, the SVD of each X l is X I = UDV T . Let matrix I be the set of missing data points and m n I ( 0, 0) ∈ represents that there is no observation available or the observations are unreliable in X l,m0,n0 , then the element X l,m0,n0 corresponding to the flagged missing data at the l-th altitude of m0-th zone in the n0-th time could be replaced by the result of EOF decomposition with only the first N EOFs as the following equation: Repeat the replacement until the DINEOF reconstructions converge, which demands the variation of EOF decomposition amount obtained in 2 consecutive SVD less than 10 −8 . Since the reconstruction of the Martian atmosphere involves the reconstruction of a four-dimensional matrix of altitude, latitude, longitude, time in high resolution actually, the Lanczos method 25,26 was chosen to ensure the effectiveness of DINEOF application.
After reconstruction, the temperature data is integrated into a matrix (Ls × Lt × Altitude × Laitutde × Lon gitude) for followed calculation. It is worth mentioning that all temperature data used are reconstructed from the EOFs, but results in the grid points constrained by observation data would definitely be more reliable. Here, the number of instruments (I) and data point (D) in each grid are used to show how heavily the grid data is constrained by data, denoted by (I,D) as shown in the Fig. 2. For example, a grid detected by MCS and TES (I = 2) simultaneously with each contributing two data points (D = 2I = 4) has the (I,D) of (2,4), while a grid totally reconstructed from the EOFs without being constrained by data has the (I,D) of (0,0) (see the unobserved grids of (0,0) in the Fig. 2a,c).
The more data points in the grid point, the more heavily the grid is constrained by data. But with the same number of data points, the results in the grid with more instrument types are generally more reliable as long as its instrument error is not too large. We have saved the variable I (instrument number) and D (data point number) in the 'MAWPD_v2.0_T_I.nc' and 'MAWPD_v2.0_T_D.nc' , respectively.
Step5: Gravity waves calculation. In this paper, the relative perturbation and the potential energy of gravity waves are used to characterize the intensity of gravity wave activity. To get the value of gravity wave potential energies of each profile, it is necessary to extract the perturbations caused by gravity wave from background temperature firstly, given by ′ = − T T T where the wave-induced perturbations T′ are calculated by subtracting the background temperature T from the measured instantaneous temperature T.
The seventh-order polynomial fit, which has been widely applied to extracting the gravity wave in the terrestrial [76][77][78] and Martian atmosphere 79,80 with credibility, is used to obtain the background temperature in the dataset at first. Then, the perturbations are filtered using a 6th-order Butterworth filter with a bandpass width of 3-15 km to remove the effects of large-scale waves including tides and planetary waves. Considering that the vertical wavelength of gravity waves on Mars reported by previous studies does not exceed 15 km 10,80-82 , we then choose a vertical bandpass filter in the range of 3-15 km. In addition, data points where the potential energy is greater than 5000 J/kg or is negative [83][84][85] is deleted as outliers. The relative perturbations (δT T / ) of gravity waves are get by normalizing the filtered perturbations with the background temperature. (2023) 10:4 | https://doi.org/10.1038/s41597-022-01909-y www.nature.com/scientificdata www.nature.com/scientificdata/ The gravity wave potential energy is used to measure the magnitude of perturbations in the dataset, where N is Brunt-Väisälä frequency in Eq. (3), g is the acceleration of gravity, and c p is the specific heat at constant pressure, which is usually taken as 0.844 kJ/(kg·K) for the Martian atmosphere, Our work focuses on the extraction of low and intermediate-frequency gravity waves. The vertical wavelength scales of the gravity waves extracted from the reconstructed temperature profiles are in the range of 3-15 km due to band-pass filtering.
Step6: SPWs and Tides calculation. The SPWs and tides are calculated by 8,15,86 : where λ, Φ, p, and t LST are four dimensions of the temperature data (T), namely longitude, latitude, height, and universal time (LT at 0° longitude), respectively. While s is the zonal wavenumber and σ is the frequency (units of sol −1 ), thus T σ,s and ϕ σ,s are corresponding amplitude and phase of the wave (σ, s), the wavenumber m is calculated by m s σ = − . The modes of SPWs and tides were summarized well in charts that are worthy of reference in previous research 15,87,88 . The notation 'DW + number' or 'DE + number' is used to denote a westward-(W) or eastward-(E) propagating diurnal (D) tide with given zonal wavenumber. For semidiurnal oscillations, S replaces D. Thus, the diurnal westward propagating tides with the zonal wavenumber of 1 is recorded as DW1, while the semidiurnal eastward propagating tides with the zonal wavenumber 2 is recorded as SE2. The diurnal and semi-diurnal zonally symmetric (S) oscillations are denoted DS0 and SS0, respectively. The migrating tides are the westward propagating tides with equal zonal wavenumbers and the frequency (day −1 ) of the wave, e.g., the DW1 and SW2. In addition, we use 'SPW + number' as a substitution for the stationary planetary wave, and the number is the zonal wave number. A spectral decomposition method called FFT (fast Fourier transform) is used for each wave mode to extract the climate state wave from the Martian temperature. Due to the regularly distributed dimensions of the model results, the FFT method could obtain each wave (σ, s) 89 .

Data Records
The MAWPD 29 has been deposited in the Dryad open-access world data centre, which is publicly available and can be downloaded at https://doi.org/10.5061/dryad.59zw3r2bh. The 'MAWPD_XX.nc' file contains three kinds of variables including 'dimension variables' , 'Five dimensional variables' , and 'Three dimensional variables' (see also the Brief Data Descriptor in the supplementary file for further information).
Three dimensional variables 'amp_T_XXX' and 'phs_T_XXX' are the daily and zonally mean amplitude and phase of the specific tidal wave. For example, 'amp_T_DW1' and 'phs_T_DW1' are the daily and zonally amplitude and phase of the migrating diurnal tide (DW1). Three-dimensional variables are calculated by the Five-dimensional variable 'Temperature' . They lost two dimensions (Lt and Longitude) due to the tidal calculation described in the Eq.  The Tides in the NetCDF file include the amplitude and phase of wave number 1 to 5 westward propagating migrating diurnal tide (DW1 to DW5), wave number 1 to 5 eastward propagating migrating diurnal tide (DE1 to DE5), wave number 1 to 5 westward propagating migrating semidiurnal tide (SW1 to SW5), wave number 1 to 5 eastward propagating migrating semidiurnal tide (SE1 to SE5) and diurnal zonally symmetric tide (DS0). While the SPWs products include the amplitude and phase of wave number 1 to 5 stationary planetary waves. In  www.nature.com/scientificdata www.nature.com/scientificdata/ addition, the amplitude and phase of zonal mean temperature are calculated for reference. The GWs perturbation and potential energy representing the intensity of GWs activity are also stored as Level 2 data in the dataset.

technical Validation
It should be pointed out that the distribution of Martian atmospheric observation data is very uneven in space-time. The data is dense in some time and space but scarce in others (e.g., the MCS data is mainly between 1-5 and 13-17 due to its cross-track or off-track observation method). Fully capturing the signature of the available data, the DINEOF algorithm extracts the physical characteristics of the data in the space and time domains through the empirical orthogonal function, effectively reconstructing the missing physical observation data with the optimal number of modes. Consequently, the DINEOF could fill in the missing point when there is a large gap in the data with accuracy as long as the total observation rate exceeds 20% (29.68% in the dataset).
Effects of DINEOF reconstruction. In order to obtain the climate state of the Martian atmosphere with limited coverage of observation, the DINEOF reconstruction method was used to fill the missing part and give the results a high degree of confidence. Remarkably, the distribution of Martian atmospheric observation data is very uneven in space-time. Restricted by the observation method, the data is dense in some LT and space but scarce in other LT and space (e.g., the MCS data is mainly between 1-5 and 13-17 due to its cross-track or off-track observation method). But the general interpolation algorithm has no way to fill data when there is a large gap in the data, let alone guarantee the quality of the data after the vacancies are filled. The DINEOF algorithm extracts the physical characteristics of the data in the space and time domains through the empirical orthogonal function, and can effectively reconstruct the missing physical observation data by retaining the optimal number of modes while maintaining the temporal and spatial distribution characteristics of the physical data. Consequently, the DINEOF could fill in the missing point when there is a gap in the data with great accuracy.
Data of the edge and inner part of the Tharsis Bulge 91,92 (43°S-55°N and 60-145°W), for instance, is under severe incompleteness circumstance near the equator within 0-12 local time at 30 km altitude and 30° Ls (Fig. 2a). But after reconstruction, the data become complete and a mixture of wavenumber 2 and wavenumber 3 modes is evident in the fields prior to DINEOF (Fig. 2a) and thereafter (Fig. 2b). Such a distribution is very familiar in observations and largely reproduced in models as persisting, if shifting, through the day. In addition, the data incompleteness near the Hyblaeus Fossae (centred at 20°N and 140°E) in the southwest of Elysium Mons is also filled (Fig. 2d) without the loss of general wavenumber 2 and wavenumber 3 modes prior to DINEOF (Fig. 2c). The DIEOF method can complete the data while maintaining the temperature distribution characteristics of the www.nature.com/scientificdata www.nature.com/scientificdata/ middle and low latitudes. The information in the time domain is passed between different moments while filling in the missing data, which enables the reconstruction of data when there is a large missing gap at a certain momen.
Performance of representing seasonal scale features of Martian atmospheric waves. The climatology distributions of the reconstructed temperature (Level 1 data) represent the zonal and meridional signatures of temperature (Fig. 3). The high-temperature region in the mid and low latitude areas at 30 km, the cold area in the south (north) polar region before (after) the autumnal equinox, the relatively cold Tharsis Rise (~−100° longitude), the relatively warm Amazonis Planitia (~−150° longitude), Chryse Planitia (~−25° longitude), Hellas basin (~75° longitude) and Utopia Planitia (~120° longitude) as well as the seasonal changes in temperature due to solar radiation, circulation, and topography, can be seen in the results.
The temporal and spatial characteristics of Martian atmospheric tides are also maintained well. In Fig. 4, 5, the high-value area of DW1 and DE1 amplitude at nearly 50 km (or nearly 1 Pa) and near-surface due to the heat generated by solar absorption of dust particles 93 during dust seasons are represented (Figs. 4g,h, 5g,h). The Doppler shift effects of zonal wind on the tides sustained by the dataset can be seen in the opposite hemisphere of DW1 and DE1 since the tide with opposite propagating direction to the zonal wind will be Doppler-shifted to higher frequencies and become less susceptible to dissipation 94 . The tropospheric westerly (easterly) in the mid-latitudes of the northern (southern) hemisphere induces the relatively strong DW1 (DE1) there near the tropopause, which is readily evident during the dust storm 72 Performance of representing wave signatures at different altitudes. The climatology GWs at five altitudes (three in the troposphere: 5, 10, 30 km and two in mesosphere: 50, 70 km) covering the whole Martian year show the GWs perturbations and potential energy with similar zonal distribution (Fig. 6). In the first half of the year (Ls = 0~180°), the maximum value of GWs perturbations in the lower atmosphere below 30 km altitude appears in the equatorial tropic region, which may be related to the mountainous topography in this area. This result is consistent with the conclusion of Creasey 10 . Above 30 km, the large GWs perturbations extend to the middle to high latitudes of the southern hemisphere and even to the polar regions. The most significant feature is the appearance of a large value belt in the northern polar region in the second half of the year (Ls = 180°~360°), but it is not obvious at 70 km. Throughout the year, GWs activity is more active in autumn and winter than in spring and summer. From an altitude perspective, the GWs in the lower layers of Mars is more active than that in the upper atmosphere in the range of 5-70 km.
As shown in Fig. 7c-e, the DW1 amplitudes from 30 km to 70 km are enhanced obviously near 210° Ls due to the dust storm. If we look at the changes at each altitude on the 210° Ls axis, we see that the tides first increase www.nature.com/scientificdata www.nature.com/scientificdata/ with altitude and then weaken with altitude after 50 km in the dust storm background, but are still significantly stronger than in the absence of a dust storm background. The dense staggered vertical bars of positive and negative values after 180° Ls were caused by the enhancement of upward propagation of thermal tides in the dusty season, which is more obvious at 30 and 50 km altitude (Fig. 7h,i) corresponding to the evident strong tides in Fig. 7c,d. Observational filter effects. The instrument visibility and observation geometry would influence the parts of the gravity wave spectrum observed 96,97 , leading to the observational filter effects of the dataset. Mixed gravity wave spectrum would be sampled by mixing perturbations from background temperature taken from a mixture of limb, nadir, occultation and entry accelerometer retrievals. Indeed, it is likely that different parts of the spectrum would be mixed in different ratios, resulting in a highly heterogeneous gravity wave results of the dataset. Despite all this, the dataset has to focus on the general composite gravity wave instead of the frequency dependent gravity wave due to the severe incompleteness of data coverage of single spacecraft. A single spacecraft could provide data with coverage on the order from ten percent to one thousandth, which is too small to retain the gravity wave of its specific spectrum and thus it is difficult to gain the gravity wave in the frequency domain. In addition, it was found that the observation filtering effects would not affect data reflection of the real gravity wave state in the atmosphere 96 . Consequently, the gravity wave calculated in the dataset represents the general composite gravity wave rather than gravity wave in the frequency domain.