Storage method of multi-channel lidar data based on tree structure

The multi-channel lidar has fast acquisition speed, large data volume, high dimension, and vital real-time storage, which makes it challenging to be met using the traditional lidar data storage methods. This paper presents a novel approach to storing the multi-channel lidar data based on the principle of the tree structure, the adjacency linked list, the binary data storage. In the proposed system, a tree structure is constructed by the four-dimensional structure of the multi-channel lidar data, and a data retrieval method of the multi-channel lidar data file is given. The results show that the proposed tree structure approach can save the storage capacity and improve the retrieval speed, which can meet the needs for efficient storage and retrieval of multi-channel lidar data, and improve the data storage utilization and the practicality of multi-channel lidar system.

www.nature.com/scientificreports/ definition, and internal relationship structure, the file storage space is large, which leads to the low efficiency in data retrieval and application of multi channel lidar systems. To solve the above problem of multi-channel lidar data storage, this paper analyzes the multi-dimensional characteristics of multi-channel lidar data, and the data output format of multi-channel lidar system, the hierarchy structure in terms of recording time, channel number, signal intensity, detection distance. Then, we proposal a data storage structure for the multi-channel lidar based on the principle of the tree structure, the adjacency linked list, the binary data storage, and the similar hierarchy between the multi-channel lidar data and tree structure. The practical application result shows that this method can meet the performance requirements of multi-channel lidar data storage in terms of speed and retrieval speed. It improve the data storage utilization and the practicality of multi-channel lidar system.

Methods
Characteristics of multi-channel lidar data. At the operation of lidar, a narrow pulse laser beam is emitted from the laser to the atmosphere to interact with the measured parameter target in the atmosphere. Then after the scattered light is received by the telescope with splitting and filtering, the laser echo signal is converted into an electrical signal for subsequent processing. The lidar equation of single scattering is expressed as follows 17 .
where r is the detection distance (m); P(r) is the power of echo signal (W), P 0 is initial laser power (W); Y(r) is a constant between 0 and 1, and it is the geometric overlap coefficient of the optical path between the transmitter and the receiver in lidar system; c is the light speed (3 • 10 8 m/s); t p is laser pulse width(nm); A 0 is the aperture area of a telescope(cm 2 ); β(r) and α(r) are the atmospheric backscatter coefficient (km −1 sr −1 ) and extinction coefficient(km -1 ) respectively, which are related with atmospheric conditions. The intensity of lidar data represents the state information of atmospheric parameters at different detection distances along the lidar direction, which refers to the data structure of atmospheric parameter profiles corresponding to the distance point r i (i = 1,2,…,n, n is the total point number along the detection direction) and the intensity value of laser echo signal p i . Then the data value of the atmospheric parameter at t j can be expressed as where j = 1,2,…,m, j is the index of lidar data, and m is the maximum index number. v j is called a lidar data unit (LDU), and each LDU is a group of lidar profile data.
In multi-channel lidar system, the specie spectral signals are separated and extracted by a hyperspectral discriminator, it is synchronously recorded in each channel 18,19 . So, the multi-channel lidar data includes the data information such as echo signal intensity, detection range, recording time, channel number, etc.
So, at t j (j = 1,2,…,m) within the k th channel (k = 1,2,…,q, q is the maximum number of data acquisition channels in the multi-channel lidar system), the laser echo signal data at the distance point r i can be expressed as the multi-channel lidar data V can be presented as follows then Each column in data V corresponds to the channel unit of the multi-channel lidar data. The four-dimensional structure of the multi-channel lidar data is shown in Fig. 1.
From the above analyses, we know that the multi-channel lidar data consist of several single-channel lidar data. The multi-channel lidar data have adds the channel dimension information to the single-channel lidar data. It can get an LDU in each channel of the multi-channel lidar data, and each LDU has a channel-time relationship. Therefore, the multi-channel lidar data have a large amount of data and a complex structure, the storage methods of single-channel lidar systems do not apply to the multi-channel lidar data.
Tree structure of lidar data storage. Tree structure. The tree structure is a typical nonlinear data structure with a multi-level nested relationship, which is often used to represent the data set with the characteristics of a "one to many" relationship 20 .
As shown in Fig. 2, a tree is a limited data set composed of h (h > 0) nodes. The first node of the tree is a root, and node without children is leaf. The intermediate node between the root and the leaf is an internal node. When h > 1, the remaining nodes of the tree can be regarded as multiple disjoint finite sets, and each set can be (2) v j = r 1 , p 1 j , r 2 , p 2 j ,..., r n , p n j , www.nature.com/scientificreports/ considered as a subtree of the root. The tree structure can classify and sort data effectively, and each node has a unique address. The subtrees are independent of each other, and the operations of the subtree do not affect each other. The tree structure can clearly express the relationship between data with multi-level and multi-category attributes. It can organize data with the complex relationship efficiently.
Tree structure of multi-channel lidar data (TSMLD). There is a four-dimensional relationship (channel, time, range, intensity) between multi-channel lidar data, and its arbitrary data vk i,j shows the different hierarchical distributions in other dimensions.
In general, the multi-channel lidar collects data synchronously from all channels in time. However, the spectral information of laser echo signals and the data properties in different channels are different.
As shown in Fig. 1, at t 1 , t 2 , …, t m , q channels obtain q sets of data synchronously, and each collection of data can draw a profile with laser echo intensities. The four-dimensional structure of lidar is similar to the tree structure with the origin of the coordinate system as the root node is shown in Fig. 2, also including the root node Root, branch nodes Node, and leaf nodes Leaf.
To show the hierarchical relationship of multi-channel lidar data, the virtual node sets such as root node, detection time node-set, and channel node-set are introduced, as shown in Fig. 3. The nodes, from first to third layer in Fig. 3 are virtual notes (the gray node). The nodes, from fourth to last layer are the multi-channel lidar data set, which represents the signal data from r 1 to r n (the white node). Let the first-layer node of multi-channel lidar data be the root node of the tree, denoted as D, it corresponding to the Root node, and the second-layer nodes are the detection time node sets, indicated as T, the third-layer nodes are the channel node sets, indicated as C, and the other layer nodes are the echo data node sets, marked as v. The second-layer and the third-layer are branch nodes of tree, it corresponding to the Node nodes. The detection data nodes from r 1 to r n have direct internal relationships and consistent data meanings, and they are regarded to form the leaf nodes. It corresponding to the Leaf nodes.
The TSMLD shown in Fig. 3 is a connected acyclic undirected graph, denoted as G, G = {G 1 , G 2 , G 3 ,…, G q }, G j ∈ G, 1 ≤ j ≤ m, G j is the subtree of detection time in G, representing the data of all channels at the jth detection time. The subtree G j is shown in the dotted box in Fig. 3.  Figure 1. Four-dimensional structure of the multi-channel lidar data. www.nature.com/scientificreports/ The adjacency list of TSMLD. The adjacency list is the shared storage method for a graph. Based on the hierarchical relationship of the tree structure G of the multi-channel lidar data and the structure of the linked table and the array, the paper uses adjacency list structure to represent the storage structure of the multi-channel lidar data. An array requires contiguous memory, it is used to store a small number of nodes, such as the root node, time node, and channel node. A linked list requires distributed storage space, it is used to store many detection data nodes. The node of the tree structure can be retrieved quickly by the subscript of the array and the address of the linked list 21,22 . Therefore, the adjacency list structure of the subtree G j is constructed as shown in Fig. 4. The root node of the adjacency list of G j is expressed as the detection time node t j , and the sub-node of node t j is the channel node-set. The sub-node of the channel node c k is the multi-channel lidar echo data set, representing the intensity value of detection data. For the adjacency list of tree structure G of multi-channel lidar data, the root node D is created, and the subtrees G 1 , G 2 , G 3 , …, G m are added to the sub-nodes of node D. The continuous spatial storage is used to deal with the nodes t 1 , Therefore, the generation procedure for the adjacency list of the tree structure G of the multi-channel lidar data is described as follows. To save memory space, we represent the detection data by binary code. The algorithm is shown in Table 1. The total number of nodes for each multi-channel lidar data is len = q*m*n. The time complexity of the generation algorithm for the adjacency list of TSMLD is O(len), that is, each data node needs to be accessed once.
Next, the tree structure of the multi-channel lidar data is stored by binary code in the adjacency list. Then, it is converted into a data file, and coded by binary using the traversal method of the tree structure.  www.nature.com/scientificreports/ Binary format of TSMLD storage files. Binary formats offer advantages in terms of speed of access. While the basic unit of information is very straightforward in a data file stored in characters (one byte equals one character), finding the actual data values is often harder. This means it is usually necessary to read the entire file to find any value [23][24][25] . For binary files, a format description, or a mapping, is required to find the location of any value in the file. However, the advantage of this map is that any value can be found without reading the entire data file.
In addition, in terms of memory, the binary file stored data by numeric format instead of character, and it often requires less memory.
To save the storage space and improve the retrieval efficiency of TSMLD, we present a binary coding file structure of TSMLD. The binary coding file can store many tree structures, so, its structure includes some header information, both for the overall file, and subsections within the file. This header information contains information such as follows: 1. The number of the tree structure G. 2. The beginning tag for G. 3. The file size. 4. The number of bytes used for each data value (the data value size). 5. The byte location within the file where a set of TSMLD values begins (a pointer).
After the header information, some TSMLDs are stored in the tree segment. Each tree segment is a tree structure G, and has a header that includes some information such as the number of detection time nodes, the data size (the length of data segment), and the tree segment number. The back part of the tree segment is the data segment, which contains the data node set of the tree structure G.
Each data segment is a subtree G j , and has a header that includes the number of channel nodes q, the data size (the length of sub segment), and the data segment number. Behind the header of the data segment are some sub segments.
Each Leaf node is stored as a sub segment, and each sub segment consists of data value and header. The data value includes n echo data values. Similarly, the sub segment header contains the length of the data value, the record number, etc. www.nature.com/scientificreports/ The binary file structure of TSMLD is shown in Fig. 5.

Storage of TSMLD.
In each detection experiment, the data collected from all channels in the multi-channel lidar system constitute a tree structure object of detection time sequence. A tree data storage method (TDSM) of TSMLD is given as follows.
In a TSMLD, the tree structure G must be traversed first. According to the structural characteristics of the tree structure G, the traversal methods can be divided into two ways: time-first storage (TFS) and channel-first storage (CFS) [26][27][28] . The TFS method preferentially stores the data collected by each channel at the same detection time. The CFS method preferentially stores the data collected by each detection time at the same detection channel. The data sequence of the TFS method and the CFS method is shown in Fig. 6.
The detection time information of multi-channel lidar data is given priority in the TFS method. Therefore, the multi-channel lidar data is stored in the order of detection time. The data acquired at a particular detection time is appended to the data received at the previous detection time, and the final data storage sequence is {v 1 1 The detection channel information of multi-channel lidar data is given priority in the CFS method, and the multi-channel lidar data is stored in the order of detection channel. Therefore, the data acquired in each detection channel is appended to the data obtained in the previous detection channel, and the final data storage sequence is {v 1 1 ,…v m q …}. The set of multi-channel lidar data {v 1 j , v 2 j , . . . , v q j } (j = 1,2,…,m) on detection time t j is consistent with the minimum data storage unit obtained by a single-time multi-channel lidar. In the CFS method, both channel and node tags need to be added to the stored data for data splitting, while only node tags need to be added in the TFS method. So, in data storing and reading, both above methods require additional operation tags to address or split the data, which leads to many redundant operations and reduces storage efficiency. In this paper, a cache storage mechanism is introduced by combining TFS and CFS method, the TSMLD is converted and stored to a data file and the data is coded by binary. The process of conversion and storage is shown in Fig. 7.
The main steps of conversion and storage method are described as follows:    [l] to the data file File coded by binary, and add some header information to the data file, then let a = a + 1, and go back to step (4). 6. When a > N, the TFS and CFS are integrated to store the data cache space. 7. Close the File and clear all cache space.
The main process of conversion and storage method is shown in Fig. 8. By reading the multi-channel lidar data storage files, any data can be retrieved according to the number of channels and time sequence.
Due to the multi-channel lidar data storage files being encoded in binary, so that we can get the details of data by the fix-length byte and the structure of the data file that shown in Fig. 6. Then, any data value can be read by the definition of header information for the data segments, and the data file can be scanned by tree structure of a data file.
The main steps of the data retrieval are described as follows: 1. Load the binary storage file of the multi-channel lidar system, let bfile is the file pointer, get the file header length fhLength form the file definition.; 2. Read the file header, get the number of tree structure n, the file size sTree, the beginning pointer bPointer, etc. 3. Read tree segment by index; let i = 0; 4. Read the i-th tree segment, that is the i-th tree, donated as T i ; 5. Read the header information of T i ., get the number of detection time nodes m, the length of data segment in i-th tree segment,etc. 6. Read data segment by index; 7. Let j = 0, get the j-th data segment, that is the sub tree G j of T i . 8. Read the header information of G j , get the number of detection channel nodes q, the length of sub segment in j-th data segment, etc. 9. Read sub segment by index, let k = 0, 10. Get the k-th sub segment, that is the data list in detection channel c k of G j . 11. Traverse the data value of channel c k according to the storage method of the adjacency list; 12. If k < q, let k = k + 1, go back to (10); 13. If k = q, j < m, let j = j + 1, go back to (8); 14. If j = m, i < n, let i = i + 1, go back to (5); 15. Close the bfile.
The data retrieval process is shown in Fig. 9.

Experimental
The experimental data comes from the ultraviolet Raman lidar system in the Center for Lidar Remote Sensing Research of Xi'an University of Technology 29 . In the experiment, an industrial control cabinet Pxie-1071 and a data acquisition card Pxi-5105, developed by NI company, are used as data acquisition equipment. their main parameters are shown in Table 2. The storage and retrieval experiment for lidar echo data is performed under the multi-channel mode. The hardware system and the user interface of the software system for data acquisition are shown in Fig. 10.

Results
In the storage capacity test, we consider four storage methods in Sect. 1. There are two method in character file, the text sequence storage method (TSM) 30,31 and the table structure storage method (TSSM) 32,33 . The data file contains only character or text in the TSM, and text with delimiters in the TSSM. The text with delimiter can be www.nature.com/scientificreports/ divided into table structure by delimiter. The database storage method (DSM) 34 , and the tree data storage method (TDSM) given in this paper also to be considered. A detailed comparison of these four storage methods is conducted regarding the storage capacity and retrieval speed of the multi-channel lidar data. The data in the TSSM, TSM, DSM, and TDSM are stored in table format files, text files, MySQL database and binary files, respectively. Figure 11 presents the variation of the file storage capacity of the multi-channel lidar echo data with the four storage methods. Table 3 presents the test data in the file storage capacity of the multi-channel lidar echo data with the four storage methods.
As Fig. 11 and Table 3 shows, with the increase of multi-channel lidar data, the storage capacities of TSM and TSSM are almost the same and increases linearly, since both are character-based storage, and the storage capacity of each character is fixed. The DSM has the largest storage capacity because the structured approach is utilized to improve the retrieval speed in the MySQL database system Still the building of data indexes in the relational model results in the multiplied increase of storage space. With the same data volume, the TDSM has the minimum storage capacity owing to the compressibility of the binary storage method in comparison to the text character method. The text character method focuses on the distribution of storage space for each character, while the binary data aims to compress and store all the data into a more compact file with more space saved in the meanwhile. Figure 12 and Table 4 shows the storage capacity reduction rate of the TDSM compared with the TSSM, TSM and DSM. The TSSM and TSM have similar trends in the reduction rate of storage capacity, ranging from 60 to 64%. However, the DSM with the maximum storage capacity has a significant reduction rate of about 92%.
In the retrieval speed test, we mainly test the multi-channel lidar data retrieval speed of four methods for 1000 random retrieval visits under different data volumes. The test result is shown in Fig. 13 and Table 5.
From Fig. 13 and Table 5, we can find that the TSSM is the most time-consuming method, followed by the TSM, and the time consumption of DSM and TSDM methods is kept at a low level with the least time of fewer than 10 s. With the increase of multi-channel lidar data, there is a linear increase in the data retrieval time of TSSM and TSM methods since a linear increase is also shown in the data storage capacity, and the data retrieval www.nature.com/scientificreports/ is linearly correlated with the data volume. Similarly, the TSSM, with the rise of multi-channel lidar data, is affected by the reading and writing speed of I/O and the retrieval speed of characters. It leads to an apparent reduction in retrieval speed and an increase in time consumption. Based on the professional database management system, the DSM uses the structured approach to deal with field data and create indexes for field data. Despite the increase in data storage space, a noticeable optimization effect is shown in the improvement of data retrieval efficiency. The time of data retrieval of the DSM is less than 10 s, and the TDSM takes less than 5 s in the experimental test. Due to a combination of the tree structure traversal method and binary coding, the data at any position in the data file can be quickly read based on the detection time and the channels. The process is less affected by the amount of data, and it saves the time of data retrieval. In other words, this method reduces the time consumption of multi-channel lidar data storage. In addition, the large amount of multi-channel lidar data needs less memory to be the buffer during storage. www.nature.com/scientificreports/    Table 6 shows the reduction rate of data retrieval time based on TDSM. It turns out that the reduction rate of TDSM reaches 98% because of the apparent improvement of retrieval efficiency compared to the TSSM and TSM.
In addition, the data retrieval time of the TDSM and DSM remained at a low level, with a decrease of about 70%, which fluctuated between 65 and 72% compared to the DSM.
There is a multi-channel lidar data set, the length of the multi-channel lidar data is len = q*m*n, q is the number of the channel, m is the number of the detection time, n is the number of the data value. In the TDSM, TSSM, TSM, each data is accessed at least for once, their time complexity is O(len). In the DSM, the multi-channel lidar data is stored in three tables at least, channel table, time table, and data table, the number of rows is q, m, n. Normally, each data should be traversed, and the time complexity is O(len). However, if the data index is not created in database, the time complexity is O(q*log(q) + m*log(m) + n*log(n)).

Discussion
The software system for data acquisition of multi-channel lidar system integrates the TSSM, and the programming language is C++. Due to technical limitations, it can only run-on the windows series operating system. The operating system used in the experiment is Windows 10. However, a complete multi-channel lidar system contains multiple functional subsystems. The control subsystem controls the hardware devices in the multi-channel lidar   www.nature.com/scientificreports/ system, and it usually runs on Linux systems such as Ubuntu and Debian. If the data acquisition system and the control software system can be integrated and run across platforms, the work efficiency of the multi-channel lidar system can be further improved. The cross-platform operation of the data acquisition system requires drivers for different operating systems to connect to the data acquisition card and the other programming language or software framework is used to program the data acquisition system. But the replacement of operating systems and programming languages will inevitably affect the performance of the data acquisition system. How to be involved or to be affected by what factors, that will study in the next research work.

Conclusion
Through the analysis of relational characteristics and storage requirements for lidar data, the present paper develops a storage method for the multi-channel lidar data based on the tree structure for the multi-channel lidar system. Drawing on the hierarchical relationship structure of the channel, time, and range of multi-channel   www.nature.com/scientificreports/ lidar data, this method combines the linked list and the adjacency list with an array structure to construct the storage method, and the multi-channel lidar data is encoded by binary code in the adjacency list. Finally, the multi-channel lidar data is stored in binary format files. This study can be used to build data processing and storage systems for the multi-channel lidar system or similar systems. In addition, it can be an example of a solution to a similar lidar system when a selection from a list of alternatives is required. The experimental results show that this method, compared with the traditional list structure and the text character storage method, can save at least 60% of the storage capacity and increase the retrieval speed by about 98%. The superior advantages of the technique lay a solid foundation for the effective use of multi-channel lidar data.

Data availability
All materials and data used should be available at Xi'an University of Technology/China. The data used to support the findings of this study are available from the corresponding author upon request.