Abstract
We present Python Statistical Analysis of Turbulence (PSAT), a lightweight, Python framework that can automate the process of parsing, filtering, computation of various turbulent statistics, spectra computation for steady flows. PSAT framework is capable to work with single as well as on batch inputs. The framework quickly filters the raw velocity data using various methods like velocity correlation, signaltonoise ratio (SNR), and acceleration thresholding method in order to despike the velocity signal of steady flows. It is flexible enough to provide default threshold values in methods like correlation, SNR, acceleration thresholding and also provide the end user with an option to provide a user defined value. The framework generates a .csv file at the end of the execution, which contains various turbulent parameters mentioned earlier. The PSAT framework can handle velocity time series of steady flows as well as unsteady flows. The PSAT framework is capable to obtain mean velocities from instantaneous velocities of unsteady flows by using Fouriercomponent based averaging method. Since PSAT framework is developed using Python, it can be deployed and executed across the widely used operating systems. The GitHub link for the PSAT framework is: https://github.com/mayank265/flume.git.
Introduction
We have developed PythonStatistical Analysis of Turbulence (PSAT), an opensource, lightweight Python framework that can despike (identify spikes and replace them) the raw velocity time series data obtained from an acoustic Doppler velocimeter (ADV) device using various filtering methods. The PSAT framework also computes a range of turbulent statistics like mean velocities, variance, skewness, kurtosis, Reynolds stresses, third order correlations, 2D as well as 3D fluxes of turbulent kinetic energy, turbulent kinetic energy dissipation, conditional statistics of quadrants, octants and their corresponding probabilities, spectrum of a given signal etc. The PSAT framework also exports all the turbulent statistics into a .csv so that it can be used in future for analysis.
Authors of the present study believe that the PSAT framework would be of a significant help to those researchers who are associated with research involving statistical analyses of the turbulent flows in steady as well as in unsteady flow environments. The motivation behind development of the PSAT framework was the authors’ experience on working with various third party tools and utilities that are commercially available in the scientific community. The authors found few major issues that led to the development of PSAT framework : (1) To the best of our knowledge, there is no single tool which can handle instantaneous velocity signals of steady flows and unsteady flows together, and is able to compute majority of the turbulent statistics that the PSAT framework computes.
As working with the turbulent flows involves complex computations (such as: calculating mean velocities in study/unsteady flow environments, Reynolds shear stress (RSS) calculations, turbulent kinetic energy (TKE) calculations, finding third order moments of velocity fluctuations, TKE dissipation, calculations of contributions towards total RSS production from different quadrants, octant analysis and determination of the octant probabilities, calculations of spectral density functions of velocities etc.), it becomes cumbersome to compute these quantities in \(Microsoft \ Excel\) or write custom scripts for the same. (2) Support for batch processing of files: Experiments performed on a laboratory flume often consists of multiple readings at multiple points thus requiring a tool to work on batch of input files. (3) Commercial tools are expensive to use and are often closed source. Also, the flexibility provided to the end users is often limited.
The PSAT framework is capable of handling all of the above issues. It provides a rich turbulent statistics for the given set of input files. It is an open source tool developed in Python with the source code available on GitHub for the scientific community to use it and modify it. We have chosen Python for developing PSAT because Python has been widely adopted in the recent years because of the availability of various libraries and extensive community support. Many modules for varying applications like modeling, analysis, and optimization of electric power systems^{1}, SciPy^{2}, Gene Ontology analyses^{3}, ice sheet models^{4}, machine learning^{5}, data logger for coupled fluidstructure simulations^{6}, have used Python exclusively and made available to the scientific community.
The authors have been studying fluvial geomorphology and transport behavior of sediments (erosion/deposition) in alluvial channels, where turbulence plays as a key parameter in various fluvial environments such as: flow in a channel with curvilinear crosssection^{7}, flows over bedforms^{8}, flows with vegetation^{9}, flows around bridge piers^{10}, and flows in sand mined channels^{11}. For the present study, we carried out extensive experiments in a laboratory flume setup at Department of Civil Engineering, Indian Institute of Technology Guwahati. The experimental setup is shown in Fig. 1 and more details regarding the experiments and measurements are explained in Sect. 3. Instantaneous velocity readings along the vertical plane were taken using a fourbeam, downlooking, Vectrino+ acoustic Doppler velocimeter (ADV) probe manufactured by Nortek. Measurements were carried out on multiple heights, and at each height, 30,000 samples were collected for 5 min (i.e., sampling rate of 100 Hz).
ADVs demonstrate a proven technology for capturing 3D velocity signals in environmental flows e.g.,^{12,13,14,15}. However, the raw velocity signals captured by ADVs may contain spikes caused by aliasing of the Doppler signal. Another issue with the raw signal maybe the Doppler noise floor also. To clean the velocity record from the previously mentioned issues, postprocessing of the raw velocity signal may be necessary. These spikes have an adverse effect on the turbulent statistics that are computed and hence, they must be removed from the time series data collected by the ADV. The process of removal of spikes is known as despiking and there are various approaches for the same^{16}. The PSAT framework implements three despiking approaches: velocity correlation, SNR and acceleration thresholding method.
The paper is organized as follows: Sect. 2 consists of definitions of various statistical parameters of turbulence for the steady flow. It also provides an insight into the computation of the mean velocity component from the instantaneous velocities for highly unsteady flows. Section 3 describes the experimental setup in details and provides the various measurements involved. The PSAT framework is described in Sect. 4. It details out all the requirements for the PSAT framework, the data format, execution steps, and the parameters computed by the PSAT framework. Section 5 provides an insight into the discussion about the PSAT framework. Section 6 concludes the paper and provides a possible future course of work for the PSAT framework. We also include an appendix section where more information about the individual Python files that are used along with the files that are created by the PSAT framework are described. This would be useful for the developers who wish to enhance the PSAT framework and add features to it.
Theoretical foundations
This section has been divided into two subsections: In Sect. 2.1, definitions of various statistical parameters of turbulence for the steady flow are presented. In Sect. 2.2, calculation of the mean velocity component from the instantaneous velocities for highly unsteady flow is presented.
Definitions of various statistical parameters of turbulence for steady flow
Since the tool calculates several statistical parameters from raw data, it is better if these statistical parameters are defined first. All the notation relating to the statistical parameters are depicted in Table 1. Timeaveraged streamwise (U), lateral (V), and vertical (W) velocities have been calculated as:
where, \(U_i, V_i,\) and \(W_i\) are the instantaneous velocities in the streamwise, lateral and vertical directions, respectively, and n is the number of samples taken. Velocity variance for all three components of the velocity can be given by:
where \(u', v'\) and \(w'\) are the fluctuating components of velocities in the streamwise, lateral, and vertical directions, respectively. The square root of the velocity variance is the standard deviation. The shape of the probability density function is illustrated by its skewness. The skewness also depicts the relative contribution of positive and negative velocity fluctuations to the formation of the velocity pattern. Skewness for the velocities in all three directions can be given by:
Kurtosis is a statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values. Kurtosis of the probability distribution of a realvalued random variable (3D velocity in the present case) is given by:
Reynolds stresses furnish very important information about the transfer of momentum in turbulent flows. Reynolds stresses are the components of a symmetric second order tensor where the diagonal components are called the Reynolds normal stresses (RNS) and the off diagonal components are called the Reynolds shear stresses (RSS)^{17}. Reynolds shear stresses (\({{\tau }_{uw}}, {{\tau }_{uv}},\) and \({{\tau }_{vw}}\)) can be calculated as:
where, \({{\rho }_{water}}\) is the density of water. The degree of flow anisotropy is measured by the ratio, \({\sigma }_{w}/{\sigma }_{u}\) and can be given by following expression:
where, \({\sigma }_{w}\) and \({\sigma }_{u}\) are the standard deviations of the vertical and streamwise velocities, respectively.
The third statistical moments or the skewness indicate nonsymmetric distributions. Zero skewness indicate symmetric distribution about the mean or the Gaussian distribution, while negative and positive values of skewness show that the distribution is skewed towards left and right, respectively, around the mean. According to Raupach^{18}, the third moments of velocity fluctuations, \({{M}_{jk}}=\overline{{{\widehat{u}}^{j}}{{\widehat{w}}^{k}}}, {\text {where}}j+k=3\), and \({\widehat{u}}=\frac{u'}{{{\left( \overline{u'u'} \right) }^{0.5}}}\), \({\widehat{w}}=\frac{w'}{{{\left( \overline{w'w'} \right) }^{0.5}}}\) can be expressed as:
The total Reynolds shear stress (\(\overline{u'w'}\)) at any given point is the sum of different types of bursting events. Thus, depending upon the relative sign of instantaneous values of velocity fluctuations \(u'\) and \(w'\), the bursting events can be plotted in four different quadrants \((i = 1, 2, 3, 4)\) of the \((u', w')\) plane^{19} i.e., outward interactions \((i = 1: u'> 0, w' > 0)\), ejections \((i = 2: u' < 0, w' > 0)\), inward interactions\((i = 3: u'< 0, w' < 0)\), and sweeps\((i = 4: u' > 0, w' < 0)\). At any point in the flow field, the contribution to the total Reynolds shear stress through different ways of momentum transfer can be calculated as:
where angle brackets correspond to conditional averaging, T is the sampling time, and \({\delta }_{i,H}\) is the indicator function. The definition of the indicator function can be given as:
where H is the parameter defined by the hyperbolic hole region^{20} which allows the investigation of larger contribution to the total Reynolds shear stress from various quadrants. Fractional contribution to the total Reynolds shear stress from different quadrants can be defined as:
\(S_{i,H}\) is negative for outward and inward interactions \((i = 1, 3)\) and is positive for ejections and sweeps \((i = 2, 4)\). Eq. (26) implies that for \(H = 0\) when hole size disappears, sum of the fractional contributions from all the quadrants equals to unity (\(\sum \limits _{i=1}^{4}{{{S}_{i,0}}}=1\)).
Octant analysis is advantageous when the user wishes to analyze turbulent structures with a strong three dimensionality. Octant analysis is carried out by considering all three components of velocity fluctuations (Table 2).
2D fluxes of the turbulent kinetic energy in the streamwise (\(2D{{f}_{TKEu}}\)) and vertical (\(2D{{f}_{TKEw}}\)) directions^{18} can be calculated as:
Further, these 2D turbulent kinetic energy fluxes have been made nondimensional by dividing them by the shear velocity (\(U_*\)):
3D fluxes of the turbulent kinetic energy in the streamwise (\({3D{f}_{TKEu}}\)) and vertical (\(3D{{f}_{TKEw}}\)) directions can be calculated as:
Similarly, these 3D turbulent kinetic energy fluxes can be made nondimensional by dividing them by the shear velocity (\(U_*\)):
The turbulent kinetic energy (TKE) is defined as half the sum of the variances of the velocity components and can be calculated as:
Dissipation of the turbulent kinetic energy (e) and its nondimensional form (ED) can be calculated by the following expressions:
The PSAT framework also allows user to compute the power spectra of filtered/unfiltered threedimensional velocities by converting a time domain signal into frequency domain using a discrete Fast Fourier Transform (FFT). The spectrum of a filtered signal represents the mean square amplitude of that signal. In other words, spectrum shows the energy of a signal at any given frequency^{21}. Energy in turbulence is received at large scales, and its dissipation occurs at small scales. Spectra allows us to think about the way, in which the energy is exchanged among eddies of different sizes.
Calculation of the mean velocity component from the instantaneous velocities for highly unsteady flow
In order to analyze structure of the unsteady flows, determination of the mean velocity component \(\overline{(U_{uf}}\)) from the instantaneous velocities (\({{U}_{uf}}\)) is an essential parameter. From the definition point of view, instantaneous velocities of the unsteady flow can be decomposed into the mean velocities and fluctuating components in a following way:
There are several methods available to find out the mean velocity component from the instantaneous velocities^{22}. However, the Fouriercomponent method has been found most suitable for the determination of the mean velocity component in unsteady flows^{23}. The determination of the mean velocities in unsteady flows using the Fouriercomponent method can be done as follows: Time dependent instantaneous velocities \({{U}_{ufi}}\) (where i = 1, 2, ..., n) are transformed into the frequency domain by using a discrete Fourier transform and only the frequency components lower than a cutoff frequency (\({{f}_{cutoff}}\)) are taken as the representative values of the mean velocities \(\overline{(U_{ufi}}\)) as follows:
where,
and
for \({j} = 0, 1, 2, ..., ({k}  1)/2\). Here, n is the number of samples collected in the time period (T) of a measurement. Nezu et al.^{22} have suggested to adopt the value of cutoff frequency (\({{f}_{cutoff}}\)) to be smaller than the burst frequency of turbulence, and they considered to select the number of Fourier components (k) as seven.
Experimental setup and measurements
The dataset used in the present study to test the PSAT framework, has been obtained from the experimental study carried out by Deshpande and Kumar^{24}. Experiments were performed in a glasswalled tilting flume of dimensions 20 m X 1 m X 0.72 m (length X width X depth). A schematic diagram of the flume has been provided in Fig. 1.
An upstream collection tank of dimensions 2.8 m X 1.5 m X 1.5 m (length X width X depth) was provided with a couple of wooden baffles installed in it to quieten the flow before entering the channel. Uniform river sand of median diameter \(d_{50}\) = 1.1 mm was used as bed material in the experiments. Table 3 provides information on various physical characteristics of bed material and experimental parameters.
Velocity measurements
Instantaneous velocity readings along the vertical plane were taken using a fourbeam, downlooking, Vectrino+ acoustic Doppler velocimeter (ADV) probe manufactured by Nortek. The instrument collects data in a cylindrical remote sampling volume located at 5 cm below the central transmitter. The height of the sampling volume was set at 1 mm when measurements were taken very near the bed such that the sampling volume did not touch the particles on the bed surface, and at 4 mm when measurements were taken away from the bed. Data were collected at the center line of the channel crosssection at a distance of 8 m from the downstream end of the flume to minimize the effects of flow entrance and exit conditions on the measurement location. Measurements were carried out on multiple heights, and at each height, 30,000 samples were collected for 5 min (i.e., sampling rate of 100 Hz).
PSAT configuration, usage, features, performance and limitations
Flow diagram of the PSAT framework
The PSAT framework has been designed for working with steady as well as unsteady flows. Figure 2 shows the flow diagram of the PSAT framework. The complete flow can be divided into the following main components.

Steady flow analysis

1.
Spike detection and removal methods It consists of three Spike Detection and Removal Methods: (a) Velocity correlation filter, (b) SignaltoNoise Ratio (SNR) filter, (c) Acceleration thresholding method^{16}. The PSAT framework despikes the contaminated data points obtained from the ADV and replaces them via interpolation between ends of spike method. Each of the above filtering method, requires taking an input threshold from the user for filtering purposes (a default value is also stored in the program in case the user does not wish to specify or is unsure, for example, the default value of Correlation Filter is set as 70). Once the input threshold is set, the PSAT framework scans for the .dat files and processes them one by one. For each filtering method, it computes the spikes and replaces them by interpolation between the ends of spikes. The algorithm for the above three filtering methods along with replacements algorithm is explained in Sect. 4.4. After running each filtering method, the relevant output files are saved into .csv format by the PSAT framework.

2.
Turbulent statistics The filtering methods mentioned earlier, despikes the noisy raw signal and converts it into a clean signal. The filtered signal is then further used for the calculations of various turbulent statistical parameters such as: timeaveraged 3D velocities, velocity variances, skewness, kurtosis, Reynolds shear stresses (RSS), third order correlations, 2D as well as 3D fluxes of the turbulent kinetic energy, turbulent kinetic energy dissipation, conditional statistics of quadrants, computation of octants^{25} and their corresponding probabilities of occurrence. All these are also saved for each input file and upon every execution, these parameters are appended in the file, so that previous computation are intact with the user.

3.
Computation of the spectral density functions of velocities This component is used for the computation of the spectral density functions for the given input velocities. Since the ADV provides 3D velocities, the PSAT framework computes the spectra for the velocities in streamwise, lateral, and vertical directions. A final spectra that combines all the above spectra is also plotted for user convenience. The PSAT framework saves both the data and visualization in individual files for future reference.

4.
File organization The PSAT framework creates ‘14’ files for every input .dat file it reads (for steady flow data). In order to ensure that it does not clutter the workspace of the user, the PSAT framework creates a folder and organizes all the saved file and appends timestamp to all files. This is convenient from the user point of view as the user can perform further computation on the saved files. It also deletes any temporary files that are created during the program execution.

1.

Unsteady flows

1.
Determination of the mean velocity component As has been discussed in Sect. 2.2, several methods are available for the calculation of the mean velocity component from the instantaneous velocities in unsteady flow environments^{22}. The PSAT framework computes mean velocities using the Fouriercomponent based method, which has been found most suitable for the determination of the mean velocity component in unsteady flows^{23}. The PSAT framework can work with multiple files of unsteady flows specified in the \(``input\_ensemble\_files.txt''\). This file contains list of all those files that contain unsteady flow data. For each of the file, the PSAT framework asks for the number of components and computes the mean velocity computation via Fourier Component based Averaging method. The user has to provide a .csv file having 4 columns: t, u, v, w representing time and the 3D velocities.

1.
Importing data from raw files

Input files (*.dat) : PSAT framework requires that the input should be in .dat format and the data should be comma separated. This restriction has been kept since it ensures that the large code is easily managed. There are various tools available that can help to convert a space/tab separated file(s) into a comma separated format. Table 4 shows a sample .dat file obtained from the Vectrino+ software.
When the raw velocity data is collected in any environmental flow experiment using a Nortek made Vectrino+ ADV and the Vectrino+ software, a .vno file is generated. This .vno file is further processed by the Vectrino+ software and five more files are created with .adv, .dat, .hdr, .pck, and .ssl extensions. The .dat files are standard ASCII files that contain velocity time series data obtained from Vectrino+ ADV. Typically the .dat files obtained from this software consists of the data as shown in Table 4. The columns as explained below:

Time Contains the time information, depending on the sampling frequency. In this case the sampling frequency was 100 Hz and so the time interval changes every \(\frac{1}{100}\) s.

SL Routine serial counter from 1 till n.

Counter A value provided by Vectrino software. It is not required by the PSAT framework.

\(U_i\), \(V_i\), \(W_i\), \(W1_i\) : \(U_i\), \(V_i\), \(W_i\) represent the three dimensional velocity data collected by the ADV. \(W1_i\) is a redundant component of W and is not required for the current study. These velocities are raw signal (contains spikes) that need filtering methods to despike the noisy data.

AMP\(U_i\), AMP\(V_i\), AMP\(W_i\), AMP\(W1_i\) : The signal amplitude of the \(U_i\), \(V_i\), \(W_i\), \(W1_i\) as measured by the Vectrino+ ADV instrument. These values are not used by the PSAT framework.

SNR\(U_i\), SNR\(V_i\), SNR\(W_i\), SNR\(W1_i\) : The SNR values of the \(U_i\), \(V_i\), \(W_i\), \(W1_i\) as measured by the Vectrino+ ADV instrument. These values are used for the SNR filtering method.

Corr\(U_i\), Corr\(V_i\), Corr\(W_i\), Corr\(W1_i\) : The correlation values of the \(U_i\), \(V_i\), \(W_i\), \(W1_i\) as measured by the Vectrino+ ADV instrument. These values are used for the correlation filtering method.
It is important for the PSAT framework that the columns of the input .dat files must be exactly in the same order for the successful execution.

Dataset processing
A key component of the PSAT framework is that it uses the Numpy arrays that loads the .dat file efficiently into the memory. The Numpy module enables us to easily import specific columns from the input .dat files and process them efficiently. After the raw .dat files are filtered, the PSAT framework also converts them into usable Microsoft Excel file format. This feature was added, if the user wants to do some analysis on the filtered data using Microsoft Excel. We used .xls format so that we can maintain compatibility with the older version of Microsoft Excel. The final turbulent statistics are exported in a .csv format. The .csv format can be opened in any text editor or in Microsoft Excel for plotting various visualizations. Keeping .csv format enables the scientific community to write their own codes for further manipulations.
Despiking and replacement algorithms
The PSAT framework implements three despiking method: (a) Correlation Filter (shown in Algorithm 1), (b) SNR Filter (shown in Algorithm 2), (c) Acceleration Thresholding Filtering (shown in Algorithm 3).
We will explain the algorithm for detecting spikes using Correlation method. The others filtering methods can be explained similarly. The input for Correlation Method Algorithm are the 3D velocities measured using ADV, \(U_i\); \(V_i\); and \(W_i\) represents instantaneous velocities in the streamwise, lateral and vertical directions, respectively, and their corresponding correlation values (Corr\(U_i\), Corr\(V_i\), Corr\(W_i\)). For every row (Line 1, Algorithm 1) that is read by the PSAT framework (a sample row consists of the data shown in Table 4), it checks if any of the correlation values for \(U_i\); \(V_i\); and \(W_i\) is less than the Correlation Threshold (Line 2, Algorithm 1), the corresponding velocity point is marked as spike (Line 3, Algorithm 1). The PSAT framework provides flexibility: it provides default values for correlation threshold and also provides an option to allow the end user to enter a custom value for correlation threshold. The spike is replaced by the interpolation between the ends of spike (Line 4, Algorithm 1). The replacement algorithm is depicted in Algorithm 4. If the correlation of all the velocities are above the correlation threshold, the velocities are untouched (as they are not spike) and the next row is processed (Line 6, Algorithm 1).
The other two filtering methods are described below.
SNR filter Similar to the correlation filter but uses SNR (SignaltoNoise ratio) values (using values of SNR\(U_i\), SNR\(V_i\), SNR\(W_i\) as explained earlier).
Acceleration thresholding method We have used the method as proposed in Goring and Nikora^{16}. The acceleration thresholding method first detects the spikes and then replaces them in two phases. First phase detection is done for the negative accelerations (Lines 1–4, Algorithm 3), while the second phase detection is performed on positive accelerations (Lines 6–8, Algorithm 3). In each of the phase, there are multiple passes made through the raw data until all the data points of the given sample conform to the acceleration criteria. General formula for acceleration is \(a_i = \frac{{U_i  u_{i1}}}{ \Delta t}\). For negative accelerations all those points where \(a_i <  \lambda _a g\), are marked as spike and replaced by linear interpolation between the ends of spike. The process is repeated until no more negative spikes are left. The same procedure is repeated for the positive accelerations. However, the check for positive spikes is \(a_i > \lambda _a g\). The replacement strategy continues to be the same as that applied for the negative spikes (linear interpolation between the ends of spikes). The criteria is \(\lambda _a g\), where \(\lambda _a\) is a user defined value and g is the universal gravitational constant. Goring and Nikora^{16} mention that values for \(\lambda _a\) should be kept between 1 and 1.5.
Finally, the spike replacement used is linear interpolation between ends of spike and is shown below. Given a spike data point, the algorithm takes the last good point that was not identified as spike and the next good point that was not identified as spike and performs spike replacement via linear interpolation between last good point and the next good point.
Figure 3 shows the raw velocity signal captured by the ADV and filtered signals by the PSAT framework in a stepwise manner. The raw velocity signal at a perticular measurement location has been depicted in Fig. 3A. Whereas, the raw signal filtered by using the SNR method, velocity correlation method, acceleration thresholding method, and by the combination of all the methods mentioned above are shown in Fig. 3B–E, respectively.
Sample code execution
For testing the PSAT framework, we took the raw .dat files generated by the Vectrino+ software. A sample row from the raw file is displayed is in Table 4. For user convenience we have presented the sample code which is ready for the execution. After downloading the entire GitHub repository (https://github.com/mayank265/flume.git), the user just needs to extract the zip archive (Password: “PoTs_Turbulence” (password does not contain quotes)) execute \(python3\ pots\_module.py\). We have taken two sample .dat files and have shown the execution of the PSAT framework. All .dat files that need to be processed must be in the same folder where the \(pots\_module.py\) is placed. Also the input_files.txt and input_files_corresponding_depths.txt should be in same folder and should be correctly mapped (see Fig. 4). The PSAT framework creates a Logs folder that contains all the errors that were recorded during the execution of the PSAT framework. This can help for the debugging purposes.
External libraries
The PSAT framework requires the following libraries for the successful execution.

.csv^{26} Useful for writing files into .csv format. The final analysis is written into a .csv file.

datetime Computes datetime for generation of various time format. In order to prevent conflicting filenames, we append a timestamp format to every file that is created.

glob Organizing files and folders using regular expression patterns.

numpy^{27} The core library for manipulating the .dat files and applying various operations on them. Numpy processes the .dat files efficiently by converting them to numpy arrays.

scipy Provides libraries for various statistical parameters like kurtosis, skewness etc.

os All the filtered files are stored in a “Filtered_Timestamp” folder which is generated by the os module.

timeit For computing the code execution time.

xlwt^{28} For converting .dat files into Microsoft Excel format .xls.
The user can use \(pip\ install\ package\_name\) to install the libraries. The authors would recommend installing the Anaconda^{29} environment which is cross platform package and installs all the above packages with the inbuilt installer.
Performance
The PSAT framework is an open source module written in Python 3+ and is tested both on Linux and Windows platforms. We have used few common libraries that are easily available. Python packages are easy to install using the standard Python package installer pip.
We tested the performance on the following system: OS Name: Microsoft Windows 10 Pro, OS Version: 10.0.17763 Build 17763, System Type: x64based PC, Processor: Intel(R) Core(TM) i56200U CPU @ 2.30 GHz, RAM: 12 GB DDR4. We have executed the PSAT framework and provided it with a set of 33 .dat files with their respective depths. The statistical parameters of turbulence obtained via PSAT for these 33 files are shown in Tables 5, 6, 7, 8, 9 and 10. The table header is kept selfexplanatory for easier understanding. Due to space constraint, the statistical parameters of turbulence obtained via PSAT are split across multiple tables. However, the PSAT framework stores all the statistical parameters of turbulence in a single file.
Experimental conditions are given as follows: The velocities were measured in an open channel flow experiment at the sampling rate of 100 Hz using the Nortek made Vectrino+ ADV. The ADV was suspended from a position in the flow in order to get the 3D velocities. The \(U_i, V_i, W_i,\) represent the instantaneous velocities in the streamwise (\(U_i\)), lateral (\(V_i\)), and vertical (\(W_i\)) directions, respectively. Each experiment was run for a period of 300 s. So in a given run, the total sample velocities obtained are \(100 * 300 = 30,000\). For 33 measurements the total number of sample readings we obtained is \(30000 * 33 = 990,000\) values. The PSAT framework takes each input .dat file as input, does the preprocessing, then applies three filters to denoise the velocity time series data and then calculates various turbulent parameters. During this process the PSAT framework generates .xls files for all input files, the files obtained after each filtering, and a final analysis file (Parameters.csv) having the turbulent characteristics of all the input files.
The execution time for the code was 12 min 22 s. The memory usage varied between 30 MB \(\) 160 MB in a single execution. Considering the various parameters the PSAT framework calculates, and the conversion it does, the time taken can be justified. During different runs the time taken may reduce or increase depending on the quality of the raw file. As seen in the acceleration thresholding method, the process needs to be repeated till all the spikes are removed. A noisy velocity time series data may have a large number of spikes taking more time, while a sample having few spikes is executed quickly. So, we can conclude that the PSAT framework does not hog system resources and can be run even on a basic desktop machine.
Tables and visualizations from the PSAT framework
As mentioned already, the PSAT framework generates a rich set of turbulent statistics like mean velocities, variance, skewness, kurtosis, Reynolds stresses, third order correlations, 2D as well as 3D fluxes of turbulent kinetic energy, turbulent kinetic energy dissipation, conditional statistics of quadrants, computation of Octants and their corresponding probability. Based on the rich statistics obtained, we plotted few visualizations as shown below.
Tables 5, 6, 7, 8, 9 and 10 show all the turbulent statistics generated by the PSAT framework. We have split the tables into multiple pages as all the turbulent statistics generated by the PSAT framework cannot be displayed using a single table.
The visualization of the turbulent statistics shown in Tables 5, 6, 7, 8, 9 and 10 is depicted in Figs. 5a, 6, 7 and 8b. Timeaveraged velocities in all three directions (U, V, and W) are plotted against flow depth in Fig. 5a. Vertical distributions of three components of velocity variances are shown in Fig. 5b. Fig. 5c, d depict the vertical distribution of velocity skewness, and velocity kurtosis, respectively. Reynolds shear stresses (off diagonal components of a symmetric second order tensor) are plotted against the flow depth in Fig. 5e. Vertical distribution of the flow anisotropy is shown in Fig. 5f. Figure 6a shows the distribution of the third statistical moments (\(M_{30}, M_{03}, M_{12},\) and \(M_{21}\)) plotted against the flow depth. Vertical distributions of the turbulent kinetic energy dissipation (e) and its nondimensional form (ED) are shown in Fig. 6b. 2D as well as 3D fluxes of the turbulent kinetic energy are plotted in Fig. 6c and d, respectively. Fractional contribution to the total Reynolds shear stress from different quadrants (quadrant analyses) for hole sizes (H) = 0, and 2 are presented in Fig. 6e and f, respectively. The 3D view of octants is shown in Fig. 7a, and vertical distributions of the octant probabilities are presented in Fig. 7b. Power spectral density functions for all three components of velocities are depicted in Fig. 8b.
PSAT framework is also able to compute the mean velocities \(\overline{(U_{uf}}\)) of unsteady flows from the instantaneous velocities of the unsteady flows (\({{U}_{uf}}\)) by using the Fouriercomponent method described in the theoretical development section. Figure 8a shows an example of the time series of instantaneous velocities \({{U}_{uf}}\left( t \right)\) and the mean velocity component \(\overline{{{U}_{uf}}}\left( t \right)\). The PSAT framework allows the user to select any random number of Fourier components. It can be observed from Fig. 8a that the mean velocity component comes very close to the instantaneous velocities for the number of Fourier components (k) at ninety one.
Limitations and precautions to be taken while using the PSAT module
Following points needs to be taken care by a user executing the PSAT framework.

1.
The PSAT framework requires a strict file naming convention for the input .dat input files. It is preferred that the input files should not have spaces or any special charter except “_”.

2.
Ensure all source files are strictly comma separated files. That implies that all source files have values separated by ‘,’.

3.
The input_files.txt and input_files_corresponding_depths.txt are correctly mapped (see Fig. 4).

4.
The libraries mentioned in Sect. 4.6 must be installed before running PSAT framework.

5.
Ensure that no file/filename is changed while the PSAT framework is executing. It may result in an erroneous output.
Discussion
There are various commercial tools as well as free tools available for calculation of the turbulent parameters. Commercial software provide users with a simple Graphical User Interface (GUI) and allows them to filter the raw velocity time series data, but they are usually expensive. If we check out the formulas provided in Sect. 2, it can be seen that few of them can be computed in Microsoft Excel while for others formulas can be built. In fact the authors initially began with turbulent analysis on Microsoft Excel as it provided a lot more flexibility in terms of plotting graphs and was helpful for analysis. However, the authors faced the following issues while working with Microsoft Excel: (a) For Quadrant and Octant analyses, the formulation in Microsoft Excel was tedious. (b) Also, an oversight in one of the formula can hamper the results of all subsequent formulas. (c) With the data size growing (as in our case, nearly \(\approx\) 1 million values over 33 files), the computations began to get heavy resulting in Microsoft Excel occasionally freezing. (d) The process to merge all the statistical data after analysis was to be done manually making the whole process cumbersome. (e) Working on unsteady flows requires Fourier averaging which is challenging to compute in Microsoft Excel and to the best of author’s knowledge, no such tool exists that provides for the computation of unsteady flows with a user chosen values of k (number of components). All this took a lot of time, and there were always room for errors.
We wanted to provide the community with a free and open source module that can perform all the tasks hassle free. The PSAT framework is completely open source, which allow the developers and scientific community to extend it to meet their needs and purposes.
Conclusion and future work
We provide the end user with an open source PSAT framework that can enable the user to filter the raw velocity time series data obtained from the Nortek Vectrino+ ADV and compute various turbulent parameters. We believe that PSAT framework is a first of its kind framework that can completely automate the process of parsing, filtering, computation of various turbulent statistics, spectra computation for steady flows. For unsteady flows the PSAT framework obtains mean velocities from instantaneous velocities by using Fouriercomponent based averaging method. PSAT framework also saves all the processed data files so that the end user can use it for future purposes without actually needing to run the PSAT framework again on the same set of files.
The authors have put the source code on GitHub and would welcome suggestion and improvements for the same. We believe that the PSAT framework will help the scientific community working with environmental hydraulics by providing a means to easily filter and compute the parameters. As the core code is developed in Python, it runs smoothly on Windows, Linux and MAC based operating system.
Right now the PSAT framework works on a command line based environment, in the future we would like to build a GUI module for the same. The GUI module will allow for richer user experience, however, for this paper, we restrict it to command line environment only.
Methods
We have uploaded the required dataset and the source code for the PSAT module on Zenedo. The url for accessing the data and code is: (https://doi.org/10.5281/zenodo.4097839). After downloading the zip file from the Zenedo repository, the user just needs to extract the zip archive and execute \(python3\ pots\_module.py\).
References
Thurner, L. et al. Pandapoweran opensource python tool for convenient modeling, analysis, and optimization of electric power systems. IEEE Trans. Power Syst. 33, 6510–6521 (2018).
Virtanen, P. et al. Scipy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17, 261–272 (2020).
Klopfenstein, D. et al. GOATOOLS: a python library for gene ontology analyses. Sci. Rep. 8, 1–17 (2018).
Kennedy, J. H. et al. LIVVkit: an extensible, pythonbased, land ice verification and validation toolkit for ice sheet models. J. Adv. Model. Earth Syst. 9, 854–869. 2017, doi: 10.1002/2017MS000916.
Ukkonen, P. & Mäkelä, A. Evaluation of machine learning classifiers for predicting deep convection. Journal of Advances in Modeling Earth Systems 11, 1784–1802. 2019, doi: 10.1029/2018MS001561.
Thomas, D. et al. Cupydo: an integrated python environment for coupled fluidstructure simulations. Adv. Eng. Softw. 128, 69–85. 2019, doi: 10.1016/j.advengsoft.2018.05.007.
Deshpande, V. & Kumar, B. Effect of downward seepage on the shape of an alluvial channel. In Proceedings of the Institution of Civil EngineersWater Management Vol. 170, 3–14 (Thomas Telford Ltd, 2017).
Patel, M., Majumder, S. & Kumar, B. Effect of seepage on flow and bedforms dynamics. Earth Surf. Process. Landf. 42, 1807–1819 (2017).
Devi, T. B. & Kumar, B. Channel hydrodynamics of submerged, flexible vegetation with seepage. J. Hydraul. Eng. 142, 04016053 (2016).
Lade, A. D., Deshpande, V., Kumar, B. & Oliveto, G. On the morphodynamic alterations around bridge piers under the influence of instream mining. Water 11, 1676 (2019).
Barman, B., Kumar, B. & Sarma, A. K. Impact of sand mining on alluvial channel flow characteristics. Ecol. Eng. 135, 36–44 (2019).
Guala, M., Singh, A., BadHeartBull, N. & FoufoulaGeorgiou, E. Spectral description of migrating bed forms and sediment transport. J. Geophys. Res. Earth Surf. 119, 123–137 (2014).
Parsheh, M., Sotiropoulos, F. & PortéAgel, F. Estimation of power spectra of acousticdoppler velocimetry data contaminated with intermittent spikes. J. Hydraul. Eng. 136, 368–378 (2010).
Lacey, R. J. & Roy, A. G. A comparative study of the turbulent flow field with and without a pebble cluster in a gravel bed river. Water Resour. Res. 43 (2007).
Nikora, V. & Goring, D. Flow turbulence over fixed and weakly mobile gravel beds. J. Hydraul. Eng. 126, 679–690 (2000).
Goring, D. G. & Nikora, V. I. Despiking acoustic doppler velocimeter data. J. Hydraul. Eng. 128, 117–126 (2002).
Pope, S. B. Turbulent flows (Cambridge University Press, Cambridge, 2000).
Raupach, M. Conditional statistics of reynolds stress in roughwall and smoothwall turbulent boundary layers. J. Fluid Mech. 108, 363–382 (1981).
Lu, S. & Willmarth, W. Measurements of the structure of the reynolds stress in a turbulent boundary layer. J. Fluid Mech. 60, 481–511 (1973).
Nezu, I., Tominaga, A. & Nakagawa, H. Field measurements of secondary currents in straight rivers. J. Hydraul. Eng. 119, 598–614 (1993).
Hinze, J. O. Turbulence: an introduction to its mechanism and theory. McGrawHill (1959).
Nezu, I., Kadota, A. & Nakagawa, H. Turbulent structure in unsteady depthvarying openchannel flows. J. Hydraul. Eng. 123, 752–763 (1997).
Nezu, I. Turbulent structure over dunes and its role on suspended sediments in steady and unsteady openchannel flows. In Proceedings of International Symposium on Transport of Suspended Sediments and its Mathematical Modeling, 165–189 (IAHR, 1991).
Deshpande, V. & Kumar, B. Turbulent flow structures in alluvial channels with curved crosssections under conditions of downward seepage. Earth Surf. Process. Landf. 41, 1073–1087 (2016).
Keylock, C. J., Lane, S. N. & Richards, K. S. Quadrant/octant sequencing and the role of coherent structures in bed load sediment entrainment. J. Geophys. Res. Earth Surf. 119, 264–286 (2014).
jasontrigg0. pythoncsv 0.0.11. https://pypi.org/project/pythoncsv/ (2018).
Oliphant, T. NumPy: a guide to NumPy. USA: Trelgol Publishing (2006–). Accessed 15 Dec 2019 (Online).
Newman, M. E. J. Network data. http://wwwpersonal.umich.edu/~mejn/netdata/ (2013).
Anaconda  the world’s most popular data science platform. https://www.anaconda.com (2020).
Acknowledgements
This research was undertaken with the assistance of resources from the Water Resources Engineering Laboratory, Department of Civil Engineering, Indian Institute of Technology Guwahati and Unit of Environmental Engineering, BenGurion University of the Negev, BeerSheva, Israel. We are thankful to the assistance provided by both the research labs in conducting the experiments.
Funding
The authors have no external funding agency to mention.
Author information
Authors and Affiliations
Contributions
M.A.: Writing, Data Curation, Software. The entire Python coding work for computing turbulent intensities has been done by M.A. V.D.: Conceptualization, Data Curation, Writing. Carried out all the experiments under various conditions and obtained the results. D.K.: Technical advice and instruments for capturing the parameters relating to unsteady flows. B.K.: Experiments relating to the steady flows were performed at the laboratory flume setup at Department of Civil Engineering, Indian Institute of Technology Guwahati. B.K. helped us setup the entire flume.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Agarwal, M., Deshpande, V., Katoshevski, D. et al. A novel Python module for statistical analysis of turbulence (PSAT) in geophysical flows. Sci Rep 11, 3998 (2021). https://doi.org/10.1038/s41598021832121
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598021832121
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.