The MetroPT dataset for predictive maintenance

The paper describes the MetroPT data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected in 2022 to develop machine learning methods for online anomaly detection and failure prediction. Several analog sensor signals (pressure, temperature, current consumption), digital signals (control signals, discrete signals), and GPS information (latitude, longitude, and speed) provide a framework that can be easily used and help the development of new machine learning methods. This dataset contains some interesting characteristics and can be a good benchmark for predictive maintenance models.


Background & Summary
The occurrence of faults in public transport vehicles during their regular operation is a source of numerous damages, mainly when they cause the interruption of the trip. The negative impacts affect not only the operator company but the clients, who are thereby disappointed with their expectations of transportation trust. In this context, the early detection of such faults can avoid the cancellation of trips and the withdrawal of service from the respective vehicle and thus is of enormous value. Only in 2017, more than 170 trips were cancelled for this reason.
The Air Production Unit (APU) installed on the roof of Metro vehicles feeds units that perform different functions. One of these units is the secondary suspension, responsible for maintaining the height of the vehicle level regardless of the onboard number of passengers. The APU is a highly demanded element on the vehicle throughout the day. The absence of redundancy causes its failure to result in the immediate removal of the train for repair. The failures are typically undetectable according to traditional condition-based maintenance criteria (predefined thresholds).
From the operational point of view, the objective of Predictive Maintenance is to reduce operational problems, reduce the number of unforeseen stops and the stopping time, and change the maintenance paradigm: from reactive to predictive.
In the last few years, many works have been published about Predictive Maintenance (PdM) with the development of machine and deep learning techniques. Recent publications include a survey in Predictive Maintenance 1 that covers the main issues in data-driven PdM; another survey 2 describing advances using machine learning and deep learning techniques for handling PdM in the railway industry; and a manuscript 3 that identifies three key open research lines for the PdM domain: failure prediction, remaining useful life (RUL), and root cause analyses (RCA).
The final goal of PdM consists in timely predicting developing and unexpected failures based on the continuously monitored condition of the equipment. The maintenance plan is dynamically scheduled to reduce unplanned downtime and associated costs. Additionally, by identifying the components involved and the severity of the failure ultimately yields more effective recovery plans.
The MetroPT dataset is a real-world dataset where the ground truth of anomalies is known from the company's maintenance reports. The objective is that it can be used as a benchmark dataset for Predictive Maintenance, where It will allow for fair comparisons between Machine Learning algorithms developed to detect anomalies based on sensor data collected as a continuous data flow.

Methods
A signal acquisition system was installed in the Air Production Unit (APU) of a train. The acquisition system follows a rigorous set of protocols and norms required to be used on railway vehicles: www.nature.com/scientificdata www.nature.com/scientificdata/  Figure 1 depicts the components of an APU. The data acquisition rate is 1 Hz, and the information is sent to the remote server every five minutes using the GSM network.
The data collection of the the unit began on 12 March 2020 and is continuously operational to date (presently July 2022). Every day, a report is generated with the information on the sensor signals.
The system installed in the vehicle's APU collects data from eight analog sensors and eight digital signals. The selection of the sensors was based on an FMEA (Failure Mode and Effects Analysis) and FMECA (Failure Mode, Effects and Criticality Analysis) of the APU. These two analyses were developed by the maintenance teams of Metro of Porto. analog sensors. As for the analog sensors, we have pressure, temperature and electric current consumed at different components of the APU, as detailed below.
• H1 4 -Valve that is activated when the pressure read by the pressure switch of the command is above the operating pressure of 10.2 bar (bar). • DV_pressure 4 -Pressure exerted due to pressure drop generated when air dryers towers discharge the water. When it is equal to zero, the compressor is working under load (bar). The digital sensors installed in the APU assume only two different values: zero when inactive or one when a specific event activates them. The considered digital sensors are the following. www.nature.com/scientificdata www.nature.com/scientificdata/ • COMP -Electrical signal of the air intake valve on the compressor. It is active when there is no admission of air on the compressor, meaning that the compressor turns off or working offloaded. • DV_electric -Electrical signal that commands the compressor outlet valve. When it is active, it means that the compressor is working under load; when it is not active, it means that the compressor is off or offloaded. • TOWERS -Signal that defines which tower is drying the air and which tower is draining the humidity removed from the air. When it is not active, it means that tower one is working; when it is active, it means that tower two is working. • MPG -Is responsible for activating the intake valve to start the compressor under load when the pressure in the APU is below 8.2 bar. Consequently, it will activate the sensor COMP, which assumes the same behaviour as the MPG sensor. • LPS -Signal activated when the pressure is lower than 7 bars. • Pressure_switch -Signal activated when pressure is detected on the pilot control valve. Regarding the GPS Information, the train was equipped with a secondary GPS antenna to collect the following: • gpsLong -Longitude position (°).
When the train is inside a tunnel and loses the satellite information, the acquisition system sets the GPS signal to 0.

Data Records
The MetroPT dataset (available at Zenodo 8 ) is included in a single file and reports data collected from the APU of an operating train from January to June 2022, which performs, on average, 26 trips per day. With a data acquisition rate of 1 Hz, the dataset is composed of 10979547 data points described by the above-referred 20 variables derived from analog and digital sensors installed in the train's APU and its GPS coordinates, with no missing values (no pre-processing technique was applied on the dataset). Figure 2 depicts a snapshot of the data collected by the eight analog sensors on a normal operating day (Jan 1, 2022) from 8:00 to 10:30. Figure 3 depicts a snapshot of the data collected by the digital sensors referring to the APU for the same period reported in Fig. 2, i.e. on a normal operating day (Jan 1, 2022) from 8:00 to 10:30.
In Fig. 4, we show the data collected from the GPS module, which includes latitude, longitude, speed and GPS signal quality, again for the same period reported in Fig. 2. The positional data is important to derive if the train is parked or in operation (cf. Table 1). The parking zones are typically located at the end of each line or in some underground parks. There are no missing values in this data. When the satellite information is lost by entering a tunnel (cf. Figure 5), it is set to 0. www.nature.com/scientificdata www.nature.com/scientificdata/   www.nature.com/scientificdata www.nature.com/scientificdata/ technical Validation Reported failures. The ground truth was provided by the company using maintenance reports. According to the reported information, the dataset has three catastrophic failures (cf. Table 2) during six months. Two failures are related to air leaks in the system, and another is an oil leak. This technical information can be used to annotate the dataset.
Failure 1 -Air leak on clients. This failure was an air leak on a pipe that feeds several clients on the systems, such as breaks, suspension, etc. The report provided by the maintenance teams showed a picture of a pipe that was blown up. In the second failure, the train recovered from the malfunction. In this case, the train needed to move to the maintenance building. Figure 6 shows a catastrophic drop on the air pressure near 23:00 due to a broken air pipe. This problem was classified as a severe malfunction, and the train needed to be removed from operation.
Failure 2 -Air leak on air dryer. The second failure is provoked by a malfunction of the pneumatic pilot valve that opens the drain pipes during the operation of the compressor. Figure 7 shows some anomalies on the regular fill of air tanks and consumption by the train clients. Between 12:00 and 14:00, we can observe huge drops in air pressure, provoking an alarm for the train driver (LPS warning variable), the compressor tries to compensate for the lack of air pressure, and the train continues in operation. After 15:00, the APU behaviour stabilises due to the return of the normal pattern of the pneumatic pilot valve.

Failure 3 -Oil leak on compressor.
Regarding the oil leak, due to hardware design, there was not any signal system related to oil to warn the train driver. The oil leak provoked severe damage to the engine of the compressor, and subsequently, due to the inoperable compressor, it was observed a drop on the air pressure and the train needed to be removed from the tracks. Figure 8 shows irregular patterns since 12:00 on the oil temperature, indicating that there is some issue with the oil system, we can also observe strange patterns on the air system, signalling that maybe the oil is escaping to the air system or the compressor is losing their efficiency.   www.nature.com/scientificdata www.nature.com/scientificdata/ evaluation protocol. The dataset can be used for two primary purposes: 1. predicting the occurrence of failures; 2. identifying the components involved in the failure.
For the first task predicting failures, the goal is to predict when it starts and the duration of the failure. The company established the need to detect the failure at least two hours before the train becomes non-operational to safely remove it from the tracks. In this scenario and for validation purposes, a failure is a time interval: start-end. The company also suggest the following evaluation protocol, also illustrated in Fig. 9: • True Positive (TP) -when the predicted failure interval overlaps with the observed failure interval. The overall goal is to minimize the number of false alarms (FP) and missed failures (FN), to avoid failures during the operational context and unnecessary maintenance actions.
For the first task of predicting failures, the objective is to discover the problems as early as possible after it manifests, i.e., to increase the overlap between the prediction and the ground truth. The second task is to identify the type of failure and in which component the failure occurs. Finally, it is crucial to compute the remaining  www.nature.com/scientificdata www.nature.com/scientificdata/ useful life of the components to help the management team when they need to remove the train without provoking disruptions to the service.
Two recent works used the MetroPT dataset to propose methods for the failure prediction problem. In the first work 9 , the authors constructed a rule-based system to produce some alerts about the state of the compressor. The second work 10 explores the usage of deep learning autoencoders to produce alerts. In both cases, the results are satisfactory, but there is a vast space to improve accuracy and explanation.

code availability
The raw data collected and stored in a MySQL database was then converted to CSV formats using custom scripts in Python 3.8 with the library pandas for data manipulation. All plots supporting the fault descriptions were performed using Plotly library.