OutFin, a multi-device and multi-modal dataset for outdoor localization based on the fingerprinting approach

In recent years, fingerprint-based positioning has gained researchers’ attention since it is a promising alternative to the Global Navigation Satellite System and cellular network-based localization in urban areas. Despite this, the lack of publicly available datasets that researchers can use to develop, evaluate, and compare fingerprint-based positioning solutions constitutes a high entry barrier for studies. As an effort to overcome this barrier and foster new research efforts, this paper presents OutFin, a novel dataset of outdoor location fingerprints that were collected using two different smartphones. OutFin is comprised of diverse data types such as WiFi, Bluetooth, and cellular signal strengths, in addition to measurements from various sensors including the magnetometer, accelerometer, gyroscope, barometer, and ambient light sensor. The collection area spanned four dispersed sites with a total of 122 reference points. Each site is different in terms of its visibility to the Global Navigation Satellite System and reference points’ number, arrangement, and spacing. Before OutFin was made available to the public, several experiments were conducted to validate its technical quality.

Despite its low complexity and ability to produce accurate location estimates, the main drawback of fingerprinting is the laborious and time-consuming site surveying task. This drawback has led many studies to resort to either simulated 16 or crowdsourced data 17 , where the former never fully reflects the real world and the latter may suffer from integrity and consistency problems. The proposal of OutFin aims at addressing these drawbacks by making real-world measurements and reliable ground truth coordinates publicly available. Table 1 summarizes the main aspects of publicly available fingerprinting datasets published since 2014. Compared to these datasets, OutFin combines several features that place it in a unique position: • To the best of our knowledge, OutFin is the first multi-modal, outdoor fingerprints dataset to be publicly available. • The data was collected using two contemporary smartphones rather than outdated smartphones or custom-built platforms. • The data was collected at highly granular RPs with 61 to 183 centimeters (cm) spacing.
• OutFin not only provides location fingerprints, but it also provides information about the devices that generated them (e.g., the service set identifier of an access point, the communication protocol of a Bluetooth device, and the number of neighboring cells of a serving cell). • OutFin is accompanied by an interactive map that provides various information about the collection environment, such as RP coordinates (both ground truth and Global Positioning System (GPS) estimates) and building ground elevations and heights.
In addition to facilitating the research and development of outdoor positioning solutions that are based on the fingerprinting approach, OutFin might spur innovation in other research realms, including but not limited to: machine learning 18 , Bayesian optimization 19 , simultaneous localization and mapping 20 , and map-matching 21 .

Methods
Data acquisition platform. OutFin was created using two smartphones for data acquisition: Samsung's Galaxy S10+ (Phone 1) and Google's Pixel 4 (Phone 2). The former was released in the U.S. market on March 8, 2019, while the latter was released on October 24, 2019. Both smartphones ran on Android 10, released on September 3, 2019. The motivation behind choosing Android-powered smartphones was twofold. First, Android provides application programming interfaces (APIs) that allow for acquiring raw data at the hardware level. Second, Android-powered smartphones account for over 74 of the market share worldwide 22 . The two smartphones were attached to a tripod head using a dual mount that horizontally separated them by 10 (see Fig. 2 (Site 1)). Both smartphones were in portrait mode. The tripod kept them at a fixed height of 132. The tripod head was adjusted to tilt the smartphones at a ∼40 degree (°) angle to the vertical plane. The same set of third-party apps used for data collection were installed on both smartphones. These apps, which can be downloaded from the Google Play Store, included: WiFi Analyzer Pro (App 1) 23 , Bluetooth Scanner Extreme Edition (App 2) 24 , NetMonitor Pro (App 3) 25 , and Physics Toolbox Sensor Suite Pro (App 4) 26 . The apps allowed for conveniently collecting and exporting WiFi, Bluetooth, cellular, and sensor data, respectively. Data collection environment. Data collection was performed at the University of Denver's campus where four separate sites were considered. The motivation behind collecting data at separate sites was to offer diversity. For instance, each site is different in terms of its reference points' number, arrangement, and spacing. Also, due to different ground elevations and heights of surrounding buildings, each site has different visibility to the GNSS. This is reflected by GPS errors produced at a given site. The mean GPS error was 12.1 meters (m), 11.4 m, 4.3 m, Fig. 1 A graphical representation of the fingerprinting approach for positioning. and 12.7 m for the first, second, third, and fourth site, respectively. GPS estimates are provided in OutFin to help researches compare their system's performance to that obtained by GPS. A description of the data collection sites is provided below: Site 1: Site 1 represents a portion of a covered sidewalk next to the east side of the 11.8 high Boettcher Auditorium (see Fig. 2). Site 1 contained 31 RPs arranged in three north-to-south lines (see Fig. 3). The spacing between RPs in each line was fixed at 152.5 and the distance between lines was fixed at 76.25. Site 2: Site 2 is ∼245 north of Site 1 and represents a portion of a covered sidewalk next to the north side of the 11.5 high Sie International Relations Complex (see Fig. 2). Site 2 contained 23 RPs arranged in a single east-to-west line (see Fig. 3). The spacing between RPs was fixed at 101.5.  www.nature.com/scientificdata www.nature.com/scientificdata/  Fig. 2). Site 4 contains 33 RPs arranged in a three-column and eleven-row grid (see Fig. 3). The spacing between column RPs was fixed at 183, while the spacing between row RPs was fixed at 146.5.
Each RP is uniquely identified by an integer (an ID number) that symbolizes its order in the collection campaign. For example, data collection started with RP 1 on November 3, 2019, and ended with RP 122 on November 9, 2019. The ground truth locations of RPs belonging to a site are expressed with respect to a local frame of reference. Additionally, the easting and northing (X,Y) coordinates of all RPs were provided with respect to a global coordinate system (i.e., NAD83(2011)/Colorado Central). This was accomplished with help from the university's Department of Geography & the Environment and by using a geographic information system software 27 . procedure. Data collection spanned six days (3-5/11/2019 and 7-9/11/2019) and involved four sites with a total of 122 RPs. Due to the fact that rain could severely affect wireless signal measurements, we did not collect any data on rainy days. The RPs surveyed each day are indicated in Fig. 3. The sequence of steps performed during a day of data collection are described below: Step 1: Before mounting the smartphones to the tripod, App 4 was launched to collect magnetic field measurements by rotating the smartphones around their X, Y, and Z axes multiple times (see Fig. 4). This process was performed for at least two minutes at a sampling rate of 1 Hertz (Hz). The resultant data was exported as a comma-separated values (CSV) file, named with the smartphone's name and date (e.g., Phone1_051119.csv). Such data can be used to offset the hard-iron distortion caused by placing the smartphones close to each other. After this process, the smartphones were mounted to the tripod and placed at the RP where data was to be collected.
Step 2: App 1 was launched to collect WiFi data, ensuring that at least two WiFi scans were performed along the four cardinal directions by routing the tripod head counterclockwise, ∼90 at a time. A WiFi scan recorded the received signal strength (RSS) from all access points (APs) in range in addition to information about the APs themselves. Android only supports passive scanning, and the duration of a scan varies depending on the smartphone's WiFi hardware and firmware. However, Google recently released a restriction that limits the frequency of scans that an app can perform to only four times in a 2-minute period 28 . This restriction applies to Android   www.nature.com/scientificdata www.nature.com/scientificdata/ Collecting data along four directions mitigates the shadowing effect caused by the body of the data collector who is constantly facing the smartphone screens. Scan outcomes were exported as a CSV file, named with the smartphone's model as a prefix and the RP's ID as a suffix (e.g., Phone2_WiFi_73.csv).
Step 3: App 2 was launched to collect Bluetooth data. Android allows active Bluetooth scanning; thus, scans can be triggered by a user-level app. A Bluetooth scan involves an inquiry scan of approximately 12 seconds, followed by a page scan for each discovered device to retrieve its information and the RSS 29 . The duration of a scan, for both smartphones, took anywhere between 15 and 30 seconds, primarily depending on the number of discoverable devices in the area. As in Step 2, the shadowing effect was accounted for by performing two scans along each cardinal direction. Scan results were exported as a CSV file with a naming convention like that described in Step 2 (e.g., Phone1_Bluetooth_29.csv).
Step 4: App 3 was launched to collect cellular data. A smartphone's cellular modem constantly scans the cellular network for cell selection/reselection and handover purposes. Android provides APIs to extract information associated with scans such as Reference Signal Received Power (RSRP) and cell identity information 30 . The sampling frequency can be set manually and was fixed to 1. As noted in Step 2, the shadowing effect was accounted for by collecting at least fifteen samples along each cardinal direction. Collected data was exported as a CSV file with a naming convention like that described previously (e.g., Phone2_ Cellular_14.csv). Moreover, App 3 allowed for collecting GPS data as part of the data record. The GPS readings corresponding to RPs belonging to the same site were extracted and stored under a CSV file named with the site's name as a prefix and the smartphone's model and app name as a suffix (e.g., Site1_GPS_Phone1_App3.csv).
Step 5: App 4 was launched to collect sensor data. A smartphone's built-in sensors can be classified as either hardware-based, such as the magnetometer and gyroscope, or software-based, such as the gravity and linear acceleration sensors. Android provides APIs for accessing and acquiring raw sensor data at defined rates 31 . The sampling frequency was set to 1. Although sensor measurements are not subject to the shadowing effect, data was collected along the four cardinal directions to both conform with the survey pattern established above and diversify the dataset since magnetic field strength can vary greatly even within a small area (in the orders of a few centimeters or less) 32 . At least fifteen samples were collected along each direction, following the same directions described in Step 2. Sensor data was exported as a CSV file with a naming convention like that described previously (e.g., Phone1_Sensors_58.csv). App 4 also allowed for collecting GPS data as part of the data record. As in Step 4, the GPS readings corresponding to RPs belonging to the same site were extracted and stored under a CSV file with a naming convention like that described in Step 4 (e.g., Site3_GPS_Phone2_App4.csv).
Step 6: The tripod was moved to the next RP and Steps 2-5 were repeated. This process continued until all RPs designated for a given day were surveyed.

Data records
On April 2, 2020, the OutFin dataset was made publicly available on figshare 33 . Figure 5 shows the dataset's file structure and presents an overview of all CSV file types, their field labels, and a data record example. A description of the CSV file types and their field labels is provided below:    www.nature.com/scientificdata www.nature.com/scientificdata/ 7. Protocol: The Bluetooth protocol that the device uses for communication; can be CLASSIC (Basic Rate/Enhanced Data Rate (BR/EDR)), BLE (Bluetooth Low Energy), or DUAL (BR/EDR + BLE). 8, 9. Minor_Device_Class, Major_Device_Class: Indicates the device's minor and major classes, respectively, as specified by the Bluetooth Special Interest Group (SIG) 34  IV. <phone>_Sensors_<RP>.csv contains sensor data collected by a smartphone via App 4: 1. Time: The time the sample was captured. The time format is as described above. 2-4. ax, ay, az: The linear acceleration, in meters per second squared (m/s^2), along the smartphone's X, Y, and Z axes, respectively. 5-7. wx, wy, wz: The angular velocity, in radian per second (rad/s), around the smartphone's X, Y, and Z axes, respectively. 8-10. Bx, By, Bz: The magnetic field strength, in microtesla (μT), along the smartphone's X, Y, and Z axes, respectively. 11-13. gFx, gFy, gFz: The g-force measured as the ratio of normal force to gravitational force (FN/Fg), along the smartphone's X, Y, and Z axes, respectively. 14-16. Yaw, Pitch, Roll: The angle of rotation, in degrees (°), around the smartphone's X, Y, and Z axes, respectively. 17. Pressure: The atmospheric pressure in hectopascal (hPa). 18. Illuminance: The illuminance in lux (lx).
V. <site>_Local.csv contains the local coordinates of RPs belonging to a site. Each site has its own frame of reference and the origins are at RPs 10, 122, 60, and 99 for Sites 1, 2, 3, and 4, respectively.  Table 3. Results of the correlation analysis between the measurements obtained from Phone 1 and Phone 2 for three different days. Spearman's ρ varies between −1 and +1 with 0 implying no correlation, while values of −1 or +1 imply an exact monotonic relationship. Kendall's τ varies between −1 and +1. Values close to +1 indicate strong agreement, while values close to −1 indicate strong disagreement. For WiFi, the results were generated using the averaged RSS readings of fifty randomly selected APs that were observed by both phones for a given day. For Bluetooth, the results were generated using the averaged RSS readings of fifteen randomly selected devices that were observed by both phones for a given day. For Cellular, the results were generated using averaged readings of UMTS neighbors, LTE neighbors, RSRP strongest, frequency, EARFCN, RSRP, and RSRQ of a cellular base station that both phones connected to for a given day. For Sensors, the results were generated using the averaged readings of linear acceleration, angular velocity, magnetic field strength, g-force, angle of rotation, atmospheric pressure, and illuminance for a given day. The p-value of all results ranged between 0.0 and 0.01.
www.nature.com/scientificdata www.nature.com/scientificdata/ IX. <phone>_<date>.csv contains sensors data collected by a smartphone via App 3 before the smartphone is mounted to the tripod. Field labels are identical to that described in IV (<phone>_Sen-sors_<RP>.csv).

technical Validation
The technical quality of the OutFin dataset was evaluated using experiments that consider two basic requirements that any high-quality dataset should satisfy, i.e., reliability and validity. Additionally, as a demonstration of the dataset's potential for positioning applications, a number of practical usage examples are presented. www.nature.com/scientificdata www.nature.com/scientificdata/ Measurement reliability. A data acquisition platform is said to be reliable if it provides consistent measurements at different points in time. To this end, before the collection campaign, WiFi, Bluetooth, cellular, and sensor data was captured over three different days at the same location. Spearman's and Kendall's correlation coefficients were then used to quantify the degree of consistency between temporal measurements for a given phone. Table 2 shows Spearman's and Kendall's correlation coefficients for the two smartphones for all possible pairs of days. Given that correlation results are high (i.e., close to the maximum value of 1.0), it can be concluded that the dataset possesses a high degree of reliability.

Measurement validity. A data acquisition platform is said to be valid if it accurately measures what it is
intended to measure. In some cases, this requires the presence of theoretically-derived data to compare experimental data against. For example, WiFi RSS values can be computed using a path loss model. An input to the model is the distance between the transmitter and receiver. However, obtaining such inputs is not feasible since the exact location of all APs in the environment needs to be known. In the absence of theoretically-derived data, validity can be assessed by comparing data generated by different sources and checking for consistency. Accordingly, for a given  Table 4. Descriptive statistics of the OutFin dataset. These include the minimum, maximum, mean, and standard deviation of the most important variables. Reference values are provided where applicable. Small variations in results between the phones are mainly attributed to device heterogeneity 63 (e.g., the sensitivity of the radio receiver or sensor). The reference value for the magnitude of the magnetic field represents the Earth's magnetic field around Denver, Colorado. The reference values for atmospheric pressure represent, respectively, the minimum, maximum, and mean recorded atmospheric pressure in Denver, Colorado, during the data collection period. The reference values for illuminance represent the light intensity for sunlight, daylight, and twilight, respectively. An hour-by-hour description of other weather conditions, such as temperature, humidity, and visibility at the time of data collection can be retrieved from 64 .

Fig. 7
Interpolated magnetic field magnitude of Site 3 using linear interpolation (left) and cubic interpolation (right). The maps were generated using calibrated magnetic field measurements from Phone 1 and Phone 2.
www.nature.com/scientificdata www.nature.com/scientificdata/ day, Spearman's and Kendall's correlation coefficients were used to quantify the degree of consistency between the measurements obtained by the phones. The correlation results for the foregoing three days are shown in Table 3. These results demonstrate high levels of consistency, which attests to the validity of the dataset.
As graphical evidence of measurement validity, Fig. 6 compares some of the data generated by the smartphones at randomly selected RPs side-by-side. Plots of the same data type exhibit the same profile despite corresponding to two different smartphones. Table 4 reports descriptive statistics of the data collected by each phone with respect to various variables. These statistics are compared against previously reported reference values, where applicable. The statistics displayed in Table 4 further support the validity of the dataset by ruling out the possibility that the dataset contains unrealistic, erratic, or random data.

Usage Examples
This subsection provides a brief demonstration of some of the application domains that OutFin can be used for. These include fingerprint interpolation, feature extraction, performance evaluation, and signal denoising.

Fingerprint interpolation.
Building a fingerprint map is usually required to provide positioning in a continuous fashion. The resolution of a map depends highly on the RP granularity (the higher the RP granularity, the better the map resolution). However, collecting fingerprints at highly granular RPs is time-consuming and labor intensive. Thus, interpolation methods are often employed to calculate the fingerprints between the locations of known fingerprints 36 . The choice of an interpolation technique is pivotal to the resulting map. For example, Fig. 7 compares the magnetic field maps created for Site 3 by two different interpolation techniques, namely linear and cubic interpolation. Clearly, the resulting maps are not identical, which suggests that a positioning algorithm would exhibit a difference in performance depending on the employed map.

Feature extraction.
A WiFi fingerprint has entries for all APs detected in an entire environment, but only a subset of these APs is observed at different locations. This is especially true for large-scale environments. For example, OutFin contains measurements from 1,379 unique APs; however, on average, only 10 of these APs are observed at any given RP. Consequently, feature extraction techniques are often utilized to reduce the dimensionality of the fingerprint space in order to achieve efficient and robust positioning 37 . Figure 8 compares two dimensionality reduction methods, i.e., the autoencoder and principal component analysis (PCA). The reconstruction cost obtained by the autoencoder is lower than that obtained by PCA. This suggests that the autoencoder is better at compressing the fingerprint space into a lower dimensional representation that comprises the informative content of the fingerprint space. performance evaluation. When proposing a new positioning method, the performance of the proposed method is often evaluated against the performance of previously proposed methods. It is often the case that at the heart of many of the methods benchmarked against is a machine learning algorithm, such as k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Decision Tree, or Naive Bayes 38 . Therefore, with the purpose of comparing the performance of such algorithms, the positioning problem was casted as a classification task where each RP is treated as a unique class. Various performance metrics were considered, including classification metrics, positioning error, and computational complexity. For the sake of fair comparison, the parameters of each algorithm were fine-tuned using grid search and cross-validation. Evaluation results, shown in Table 5, are reported on the Bluetooth measurements collected from Site 4. The results demonstrate that different algorithms can be www.nature.com/scientificdata www.nature.com/scientificdata/ ranked differently depending on the chosen performance metric. For example, the best classification accuracy was achieved by RBF SVM, while the lowest mean positioning error was achieved by k-NN.
Signal denoising. Signal loss can negatively impact the performance of a positioning system. Thus, denoising techniques are often integrated as a preprocessing step to enhance positioning 39 . As an example, a denoising autoencoder was utilized as a denoising agent where the feature vector of a cellular fingerprint is corrupted to emulate randomized loss of data. The degree of corruption is controlled by a predefined probability (p loss ) where, for example, a p loss of 0.03 indicates a 3 chance of setting a feature to zero. Figure 9 demonstrates the differences in performance between using noisy cellular features and their denoised versions for positioning in Site 2. On average, the use of the denoising step resulted in a 1.43 improvement in accuracy and a 13.25 reduction in positioning error.

code availability
Well-documented scripts, written in Python 3.6.4 40 , are present alongside the dataset (also available on GitHub 41 ). These include the scripts used to generate the results described in the Technical Validation section as well as a script to calibrate magnetic field measurements against hard/soft-iron distortions. The data required to replicate the experiments reside in OutFin/Code/temporal_data. Depending on the script, some of the following libraries may be required: os, pandas, scipy, random, sklearn, matplotlib, numpy, statistics, keras, math. Additionally, a thorough description of the collection environment in the form of an interactive map (developed using QGIS 3.10 27 ) is provided. The map is composed of several layers that display information such as RP coordinates (both ground truth and smartphone estimated), pictures of the collection sites, and building height and ground elevation (as provided by the City and County of Denver 42 ). High-resolution aerial imagery (3-inch), provided by the Denver Regional Council of Governments 43 Table 5. Performance evaluation of commonly used algorithms for positioning with respect to various metrics. The results were generated using 530 Bluetooth samples (60 training and 40 testing) collected by both phones from Site 4. RBF: radial basis function; n: number of training samples; p: number of features; n sv : number of support vectors.

Fig. 9
Noisy vs. denoised features for positioning. For a given p loss value, the results were generated using 3,111 cellular samples collected by both phones from Site 2. A k-NN algorithm is used for comparison where ∼60 of the samples were used for training and the remaining ∼40 for testing.