Dataset of three-dimensional traces of roads

We present a dataset consisting of three-dimensional traces, captured by Global Navigation Satellite System techniques with three-dimensional coordinates. It offers 138 traces (69 going and 69 returning), in addition to the actual mean axis of the road determined by precise surveying techniques to be used as ground truth for research activities. These data may serve as a test bed for research on data mining applications related to Global Navigation Satellite System multitraces, particularly the development and testing of algorithms intended for mining mean axis data from road multitraces. The data are suitable for the statistical analysis of both single-trace and multitrace datasets (e.g., outliers and biases).

been created under a controlled design within a research project. This dataset offers 138 traces (69 going and 69 returning). The trajectories were surveyed on a set of roads that define a circular circuit with high altimetric differences, slopes and sharp curves. The actual mean axis of the road, which was determined by precise survey techniques, is supplied to be used as ground truth for research activities. Statistics about the multitraces and the axis dataset are included.
This 3D GNSS-road trace dataset could be of great help to researchers in data mining related to GNSS multitraces (e.g., algorithm development). The main application of this dataset is the development and testing of algorithms intended for mining mean axis data from road multitraces. Additionally, the dataset is suitable for the statistical analysis of single-trace and multitrace datasets, including the determination of outliers, biases and so on. The inclusion of the actual mean axis of the road determined by precise techniques allows the development of quality controls for the results. This dataset facilitates the abovementioned work because it is not necessary to invest time and money to obtain such expensive data. In addition, this dataset will facilitate comparisons with future studies and their results, which is considered extremely important for the advancement of research in this field.

Methods
In this section, we provide a definition of the data, an explanation of the design and a description of the production methods.
Multitraces and control axis. A trace is a recorded GNSS path, and a multitrace set is a set of such recorded paths, as can be observed in Fig. 1.
We must define some terms to provide a better explanation of the content within the 3D GNSS-road trace dataset. The terms to be defined are GNSS trace, multitrace, axis and control axis.
Given a geographical phenomenon such as the centreline of a road, railway, shoreline, border, river or stream, a line string L or polygonal representation is determined by a set {P 1 , … P n } of n ordered points (vertexes) that determine an ordered set S of n − 1 segments (S i,i+1 ) formed by two consecutive vertexes (P i , P i+1 ) (see Fig. 2). Each vertex is represented by a 3D point with coordinates {X, Y, Z} in a specific coordinate reference system. In an analytical way: www.nature.com/scientificdata www.nature.com/scientificdata/ where: n: number of vertexes. If a line string L represents the path through a road section of a vehicle (e.g., a car or a motorcycle) captured by a GNSS device, L is called a GNSS trace (simply called a trace) and is denoted by T. If several traces T are available for the same road section, a multitrace MT set is established.
If a line string L represents the mean axis of a phenomenon (e.g., the central axis of a paved road), L is called the axis and is denoted by A. If an axis A is determined in such an accurate way (i.e., at least three times more accurate than an MT set), this axis can be used as a control axis for the MT set and is denoted CA MT .
Design. The purpose of this dataset is two-fold; on the one hand, the dataset exists to offer a set of actual GNSS multitraces to work with and to examine all the problems corresponding to their use (omissions, outliers, bias, axis mining, etc.); on the other hand, the dataset is intended to offer an accurate mean axis for a road that can be used to control the result of mining an estimated axis from the multitrace dataset. The design covers both the area and device selection and the data processing method. area selection and description. To offer an MT dataset characterized by sufficient complexity and variability, an appropriate study area was sought. The criteria for the selection of the study area were as follows: • Proximity to the continuously operating GNSS reference station at the University of Jaén to obtain differential corrections for the precise GNSS survey of the accurate axis. • Circular travel route and design to facilitate the logistics of the survey.
• Roads with little traffic to facilitate field work.
• Roads exhibiting a considerably variable slope with areas of both curved and straight segments.
Finally, a circular path was found on the outskirts of two small villages (Cárchel and Carchelejo) near Jaén (Spain). The total length of the path is 12.2 km, and it has a mean slope of 6%. This path is composed of three different road sections from primary and tertiary roads (Table 1). Figure 3 shows a general view of the area, and Fig. 4 presents a profile of the circular path.
Multitrace production. The MT dataset was captured using a Columbus V990 device (http://cbgps.com/ v990/index_en.htm), which is a GNSS data logger that allows the use of a memory card to record a vast number of points and is designed for navigation and in-car applications. The accuracy specifications of the device indicate a 5.0 m circle of error probable (CEP) (95%) for non-differential GNSS applications using the Global Positioning System (GPS) and a 2.5 m CEP (95%) for differential GNSS applications using either EGNOS (European Geostationary Navigation Overlay Service) or WASS (Wide Area Augmentation System) as supplementary systems. However, the device is able to report only on the dilution of precision (DOP) of each point (no precision is offered). The device was placed on the front dashboard of a car, and the survey was performed using the non-differential configuration.
The car was driven in a normal way while taking into account all traffic signals and road conditions (e.g., slopes). A total of 69 traces were obtained in each direction (going and returning). Each trace was assigned an identifier and a label indicating whether it was a going trace or a returning trace.
Control axis production. The centreline of a road section is the objective sought through the MT sets; for this goal, we need ground truth to assess the positional accuracy. This ground truth is what we call the control  www.nature.com/scientificdata www.nature.com/scientificdata/ axis. The production of the control axis involves two main steps: first, a precise differential GNSS survey is conducted following the white roadside lines; then, the mean axis to be used as the ground truth (control axis) is calculated.
The GNSS survey was performed using a Leica 1200, which has a horizontal accuracy of 10 mm + 1 ppm and a vertical accuracy of 20 mm + 1 ppm for kinematic surveying 28 . Post-processing was carried out using the corrections provided by the continuously operating GNSS reference station at the University of Jaén. This survey was executed on foot using the device 29 shown in Fig. 5 that allowed the GNSS antenna to be kept vertical. The post-processed points had a planimetric precision better than 40 mm (1 sigma) and an altimetric precision better than 50 mm (1 sigma).  www.nature.com/scientificdata www.nature.com/scientificdata/ Once both white roadside lines had been obtained, the mean road axis was derived using the Fréchet distance 30 (see Gil de la Vega 31 for more details). After this, a visual inspection of the mean axis was performed to check for the inexistence of artefacts. Figure 6 shows an example of the two white roadside lines (in blue) and the control axis (mean axis) derived (in red). A precision value was obtained for each point along the mean axis by composing the precision of the post-processed GNSS data from the survey of the two white roadside lines. The points selected for each composition were determined through an interpolation process.    www.nature.com/scientificdata www.nature.com/scientificdata/

Data records
The 3D GNSS-road trace dataset is composed of three files: two contain the spatial data, while the third file contains the metadata. The spatial data files are provided in SHP (shapefile) format 32 . The data are offered in two different coordinate reference systems (LatLon + UTM projected) to facilitate their use in research applications (the coordinate reference systems are EPSG4979 and EPSG25830, respectively). The ISO 19115-1 metadata are provided following the ISO 19115-1 standard 33 and include the purpose, lineage and usage, in addition to many other technical characteristics. The dataset is available from Figshare 34 as a compressed (.zip) file that contains the geospatial data in shapefile format. The fields of the SHP files and their explanations are provided in Tables 2 through 5.
To describe some characteristics of the dataset, some statistical information about both the MT set and the control axes can be seen in Table 6. Please note that the MT data are raw data, and for this reason, values such as the minimum distance are 0 while the slope variation rises to 1136%. These artefacts in the MT data have not been removed for the reason indicated in the Introduction and will be noted in the Technical Validation.

technical Validation
This section presents relevant information on the 3D GNSS-road trace MT dataset and the control axis data and processing method to assure the readability and quality of the dataset. Following the quality description framework of the ISO 19157 standard 35 , there are five categories of quality elements employed to describe the quality of geospatial data: completeness, positional accuracy, thematic accuracy, temporal quality, and logical consistency. However, neither the MT dataset nor the control axis data can be evaluated with respect to temporal quality because the complete dataset was surveyed over a very short period of time, and thus, no temporal quality aspect Fig. 6 The results of the survey of the two white roadside lines (in blue) and the mean axis or control axis (in red).

3DGRT_ID
Integer Unique identifier.  www.nature.com/scientificdata www.nature.com/scientificdata/ is involved or relevant to the whole dataset. However, we can provide a discussion about the other four quality elements for the MT dataset and for the control axis: 1. Completeness. Raw GNSS data points from the data logger have been provided. These data are offered without processing to ensure the presence of all possible artefacts that may occur in any survey. This is of great interest for future research (e.g., the treatment of outliers and errors). However, the MT points were inspected visually, as can be seen from the general view in Fig. 7a and the detailed view in Fig. 7b. In addition, the control axis has been visually inspected by three operators and is complete (Fig. 8). 2. Positional accuracy. The accuracy of the raw GNSS data points has been provided as registered by the data logger. With respect to the control axis, the surveying device and processing method guarantee a positional accuracy that is at least three times greater than the positional accuracy of raw GNSS data; hence, the raw data can be used as a control axis. Moreover, the per point accuracy has been provided as the RMSE compositions of the points used to define each vertex of the mean axis. In addition, a visual control has been executed, and a complete comparison of the mean axis with an independent official source (at a 1:25000 scale) (Fig. 8) clearly shows that the provided control axis boasts a better performance. 3. Thematic accuracy. The only thematic attribute that appears in this dataset is the 3DGRT_DI field. The assignment of all the values has been reviewed and found to be correct in 100% of the cases. 4. Logical consistency. Among all the logical consistency aspects of geospatial data, only the format consistency can be checked in this dataset. To test the Shapefile format, all files were loaded using ArcMap from ESRI ™ and GDAL (http://gdal.org). Both pieces of software loaded the Shapefiles, including the height values from the geometry, thereby validating the file structure and compatibility. However, we have included the geometry tuple in the attributes of each feature to ensure the ability to analyse this third coordinate.

Usage Notes
The dataset is distributed as Shapefiles that contain the data organized as described in Tables 2 through 5. Shapefiles are the standard for exchanging and storing spatial data. Many Geographic Information System tools (e.g., ArcGIS ® , QGIS ® , and GRASS ® ) are able to load such files. The use of scripting languages (e.g., Python) within these software tools can be of great help for the processing of multitrace datasets because there are no standard capabilities for such types of data. The GDAL library for R (https://www.R-project.org/) allows access to the data and the use of many other R packages with powerful capabilities for dealing with trace and multitrace datasets. The 3D GNSS-road trace dataset is available for free use/reuse. There are no restrictions to support the widest possible use. The main application of this dataset is the development and testing of algorithms intended for mining mean axis data from road multitraces 4,7,[11][12][13]27,36 . Multitraces are provided as raw data, and thus, this dataset is suitable for the statistical analysis of both single-trace and multitrace datasets, including the determination of outliers and biases and the development and testing of filtering algorithms focused on problems pertaining to traces 31,37 . Finally, the number of traces allows the use of simulation procedures (Monte Carlo, Bootstrap, etc.) to derive and estimate the distributions of some characteristics; for instance, these simulation techniques can be applied to determine the adequate sample size.   Table 6. Statistics of the dataset.