Remote drain inspection framework using the convolutional neural network and re-configurable robot Raptor

Drain blockage is a crucial problem in the urban environment. It heavily affects the ecosystem and human health. Hence, routine drain inspection is essential for urban environment. Manual drain inspection is a tedious task and prone to accidents and water-borne diseases. This work presents a drain inspection framework using convolutional neural network (CNN) based object detection algorithm and in house developed reconfigurable teleoperated robot called ‘Raptor’. The CNN based object detection model was trained using a transfer learning scheme with our custom drain-blocking objects data-set. The efficiency of the trained CNN algorithm and drain inspection robot Raptor was evaluated through various real-time drain inspection field trial. The experimental results indicate that our trained object detection algorithm has detect and classified the drain blocking objects with 91.42% accuracy for both offline and online test images and is able to process 18 frames per second (FPS). Further, the maneuverability of the robot was evaluated from various open and closed drain environment. The field trial results ensure that the robot maneuverability was stable, and its mapping and localization is also accurate in a complex drain environment.


Overview of proposed system
shows the overview of the drain inspection framework. Here, a deep learning-based object detection framework was used to perform the visual drain inspection task from our in-house developed drain inspection robot Raptor collected images. The details of drain inspection algorithm and robot architecture are described as follows. www.nature.com/scientificreports/ SSD Inception object detection framework. SSD Inception is an object detection framework that is used for the drain inspection task. It's a lightweight framework and widely used in mobile applications and onboard object detection in robotics. In our drain inspection application, SSD automatically detects the bushes, trash, and silt from raptor transferred images. Typically, VGG-16 or MobileNet is used as a feature extractor for SSD. In our implementation, VGG-16 is replaced by the Inception feature extractor to achieve optimal detection accuracy and faster detection. The detail of Inception and SSD functionality are described as follows.
Inception. In our proposed framework Inception v3 was utilised after evaluating its performance and computational complexity. A brief discussion of Inception v3, Inception v4 and Inception-ResNet v2 is mentioned below. As seen in Table 1 a 2 times jump in parameters is observed when comparing Inception v3 with Inception v4 35 and Inception-ResNet v2 35 Table 4 and further elaboration is done in the experimental section. Figure 2a shows the neural network architecture of the Inception framework. Inception v3 36 is an updated version of the Inception v1 37 algorithm and uses batch normalization, dropping dropout, and removed the local response normalization function. In contrast with other feature extractor, the Inception algorithms use wider networks with filters of different kernel sizes in each layer, making it translation and scale-invariant. In Inception v3, 5 × 5 convolution is factorized by the two 3 × 3 convolutions and 7 × 7 convolutions is replaced by series of 3 × 3 convolutions. This increases the performance of the architecture and reduces the computational time (typically 5 × 5 convolution is 2.78 more expensive than 3 × 3 convolution). Further, to reduce the representational bottleneck issue, the feature banks of the module were expanded instead of making it deeper. Here, the algorithm factorizes convolutions filter size where 3 × 3 convolutions function is converted into 1 × 3 then followed by 3 × 1 convolution. This factorize process takes 33% lesser computation compared to 3 × 3 convolution and prevents the loss of information that is caused when a deeper architecture is used. A 17 × 17 × 768 feature map is extracted  www.nature.com/scientificreports/ from the Inception v3 framework and used as the input into the SSD framework to perform the object detection task. It uses the outputs of 'Mixed_2d' , 'Mixed_7' , 'Mixed_8' of InceptionV3 framework.
SSD. Figure 2b shows the functional block diagram of SSD 38 based object detection framework. It runs on top of the Inception v3 feature extractor algorithm. SSD can execute object detection tasks using a single neural network structure by converting the classification and localization steps of the object detection task into a regression problem. SSD comprises of the pyramid structured numerous auxiliary convolutional layers on top of Inception CNN layers to perform the object detection and classification task, which are utilized to extract the feature map at different resolutions and enable the network recognize varied sizes of objects. SSD detects and locates objects using six feature maps. Here, the first two sets are obtained from the Inception module and the remaining feature maps ( 10 × 10, 5 × 5, 3 × 3, 1 × 1 ) produced using SSD auxiliary convolution layers. SSD also outputs the bounding box coordinates and class type, as well as the classification confidence level, using bounding box predictors. The bounding box predictors use the default anchor boxes (bounding boxes) with different aspect ratios and scales. These bounding boxes have a fixed size, implying that their dimensions are chosen from a list of predefined values. The fixed size anchor boxes are tiled on generated feature map in a convolution manner and perform the one prediction per default box as well as the per-class scores that indicate the presence of a class instance in each of those boxes. In the end, Non-Maximum Suppression (NMS) is applied to remove the overlapping boxes and keep the highest-rated bounding boxes.
Overview of drain inspection robot Raptor. Structural overview of the Raptor robot is shown in Fig. 3.
The structural frame and cover of Raptor are 3D printed with nylon to have greater tensile strength. The overall weight of the robot is 2.45 kg which can achieve a vertical gradient of 20 • -25 • . The detailed technical specifications of the platform are given in the Table 2. With a compact dimension of 390 × 350 × 200 mm, the platform can navigate through constrained drain sizes.  www.nature.com/scientificreports/ The robot's locomotion was accomplished by two adjustable dual-wheel forks assembled under the front and back sides of the chassis. These forks hold the DC motors for the motor to connect with the wheels. As shown in Fig. 3a, b, the Raptor design is centered around the payload holder to maintain the center of gravity in the middle of the robot. Most of the electronics components are mounted on the center body of the robot, where a mounting structure was provided. It also allows the assembly of electronic components and batteries. Furthermore, some additional mounting structure is added to fix the Lidar, camera, obstacle avoidance sensors in front of the robot. System architecture. The high-level system architecture for Raptor is built using the robot operating system (ROS) framework. The system consists of a mobile platform-Raptor, master control station mobile control Unit for operators. Here, all the communication happens through a WiFi router with TCP/IP protocol. Figure 4 shows the high-level and the low-level control block diagram with all the associated modules.
Locomotion module. A four-wheel-drive system configuration powered the locomotion of the robot. It is powered by a 12 V DC metal gear motor that ensures the wheel pulls the load on all terrains. In addition, the fork holding wheels to the chassis is provided with extra reinforcement to withstand the direct impulse forces from the ground. These DC motors have a stall torque of 7.8 kg cm. With the air-filled rubber tires, the wheels ensure maximum friction, cushion effect, and ride height for the platform.
Re-configurable module. Our Raptor has 3 modes of manual re-configuration as shown in Fig. 5. Here, Mode 1 is fully folded and used to carry a heavy payload weight of around 1.6 kg with a high ground grip and traverse all types of terrain. Mode 2 is halfway open for carrying a heavy payload and traversing all types of terrain, specifically for locomote over the obstacles and stagnant water area, and Mode 3 is used for high-speed and surveillance purposes. The reconfiguration function has been controlled by manually triggering the push button  Control module and path tracking. The on-board control unit for Raptor was built using RaspberryPi, which is capable of processing heavy data from Camera, Lidar, Inertial measurement units (IMU) and beacon modules.
The control system takes linear velocity, and angular input commands directly from the operator and outputs individual motor speed signals using the inverse kinematics method. Each motor is pre-programmed to operate independently based on the specific input commands. For instance, if the input serial command is to turn the robot right, the two wheels on the inside radius spin in the opposite direction but proportional to that on the outer side. Each Roboclaw motor driver controls 2 DC brushless motors with 7.5 A till 15 A peak through pulse width modulation (PWM) signals from Raspberry Pi microcontroller. A 2D Lidar sensor, RP-Lidar with rotation speed of 10 Hz and with sampling rate of 8000 points per second, is used for perceiving the environment for mapping and localization. This high sampling rate improves the mapping accuracy and map density. To map an unknown environment, hector slam algorithm is used to perform simultaneous localization and mapping (SLAM).
Power distribution module. The power distribution module includes a 4-cell Lithium-ion battery, 14.4 V 2800 mA h. Through a toggle switch connected to the main battery, two voltage regulators provide a steady supply of 12 V for the motor driver and 5 V for the Raspberry embedded computing device. The four motors are individually powered through the two motors drivers, with all other sensors powered through the Raspberry Pi.
Tele operation with collision avoidance safety layer. Raptor is designed to be teleoperated by users through mobile or laptop applications. This multi-device compatible graphical user interface (GUI) application is developed using Unity. The communication between robots and control devices happens via the message queuing telemetry transport (MQTT) bridge through ROS. ROS messages are serialized by JSON for MQTT communication and deserialized back for ROS messages in the communication protocol. Figure 6 shows the GUI console which has the provision for monitoring the video feeds and robot control buttons for basic operations like linear forward and backward movement, pivot rotation.
With Lidar sensor data as input, an additional safety layer is added to avoid unintentional obstacle collision due to the carelessness of the operator. In order to tackle this scenario, ROS architecture is configured with DWA

Experimental setup and results
This section describes the experimental results of the drain inspection framework. The experiment was conducted in four phases: collecting training images, training the drain inspection framework, evaluating the trained model on both offline and real-time field test, and comparing the trained model with other object detection frameworks and existing works.
Training images collection and labelling. The training images collection process involve collecting the drain blocking objects images from different sources. The data-set is composed of common drain blocking objects and categorised into three main classes such as trashes (dry leaves, plastics bottles, metals cans, paper, and trash), bushes and silt accumulation. Our data set consists of 4500 images gathered from open and closed drainage's in Singapore. Before labeling, the collected images are resized into 640 × 480 and fed into "LabelImg" GUI tool. It's an open-sourced bounding box annotations tool and is used to mark the objects bounding region. Thereafter, the bounding box data augmentation function is applied on the labeled images 39 . Figures 7 and 8 shows the bounding box annotation method and bounding box data augmentation results of one input image. The data augmentation function converts the single image into multiple images with bonding box marking and uses the contrast adjustment, scaling, rotation, and flipping function on inputted images. The bounding box data augmentation function enlarges the data-set size, helps in controlling the over-fitting issues in training time, and enhances the detection framework learning rate. Parameter configuration. Transfer learning method was used to train the drain inspection model. It's more adaptable, allowing pre-trained models to be used directly as feature extraction. In our experiment, Imagenet dataset trained weight files was used to train algorithm under transfer learning scheme. Under the transfer learn-  www.nature.com/scientificreports/ ing scheme, the detection model was trained with 100,000 epochs and using a batch size of 16, the RMSProp algorithm of 0.9 momentum and an initial learning rate of 0.004, respectively. The image data set is evaluated using the k-fold cross-validation process. Images are divided into k groups in this technique, with k − 1 being used to train the network. The one pair that remains is used for testing. 10-fold cross-validation approach is used in the proposed study. The visuals of the experimental data derived from the model are very accurate.
Evaluation metrics. The efficiency of our trained model was evaluated in both offline and real-time test scenarios. Standard metrics were used to evaluate both detection and classification model performance. Here, accuracy, precision, recall and F measure (1)-(4) were used to evaluate the model and confusion matrix was constructed to find the variables tp (true positives), fp (false positives), tn (true negatives) and fn (false negatives).
Offline test. The offline experiment was performed using web collected images and standard data-sets images including trashnet 40 , deep-sea waste 41 and taco 42 . Here, the three data-sets are composed of various classes of litters collected from diverse environment. In each dataset, 50 images were selected for offline test composed of metal, plastic cans, polythene covers and paper trashes. Figures 9, 10, 11 and 12 show the drain inspection framework offline experimental results. Here, the algorithm detect and classify trash category with average of 85-94% confidence level, bushes detected with 88-96% confidence level and accumulated silt are detected with 80-89% confidence level respectively. In this analysis, we observe that the algorithm detect most of the visible objects with higher confidence level. The miss detection, false classification, detection with lower confidence level happened only when inference is done on decomposed objects or partially occluded objects. The offline experimental results shows that the trained model is able to detect the drain blocking objects in diverse environment with better accuracy. Hence, the model can be used for detecting the drain blocking objects in diverse drain environment.

Experiments with Raptor in real-time drain inspection field trial.
Three different drain environments were selected for the real-time field test: closed drain, semi-closed drain, and open-drain located near the SUTD campus. These drains are used for rainwater collection, 1-3 ft. depth. The experiment was carried out in two lighting conditions (day and night) and two different climate days (after heavy rainfall and summer days). In the first field trial, the raptor robot operation efficiency was tested in terms of maneuverability. In the next step, trash, bushes and silt detection was evaluated from Raptor captured field images. Figure 13 shows the Raptor robot maneuverability test in various drain environments. In this experiment, the robot is deployed in an unknown drain environment with no prior preparation of drain map. Here, the drains are of different structures and sizes. There are areas inside the drain with long open tunnels of uneven concrete terrain with many L-shaped turns, narrow curves, and level changes. Hector slam algorithm was used to map the drain environment. The SLAM algorithm localizes the robot position with respect to the generated map simultaneously. Figure 14 shows the mapping and path tracking results. Here, path tracking is represented by green marker. From the experimental results, it can be observed that the morphology of the robot has accommodated various drain structures, and handles the various terrain constraints inside the drain (Fig. 13). Furthermore, mapping and localization results (Fig. 14) ensure that the robot's position can track in deep and unexplored drain environments with the help of SLAM generated map.
Real-time field trial. Evaluating the drain inspection algorithm with real-time video feed is the second and final phase of the field trial. The experiment was tested on the above-mentioned drain environment. The robot was operated in teleoperated mode and control by RF wireless communication module. The robot captured images are remotely analyzed by a drain inspection algorithm run on GPU enabled high-speed notebook. A total of 300 frames of images with various trash, bushes and silt objects within frame were captured during the real-time field trial and used for evaluation. Figures 15, 16 and 17 show the drain inspection algorithm field test results where the open drain has running water, bushes are sprouted through cracks, and trashes floating on water. Similarly, the closed drain has the accumulated silt and trashes scattered and accumulated after a heavy rainfall. The field test was also done during the night time as seen in Fig. 18.
The detection algorithm has detected majority of trash, bushes, and accumulated silt on raptor robot captured images with a higher confidence level. In this two-drain experiment, the algorithm detects the bushes, trashes, (1) Accuracy(Acc) = tp + tn tp + fp + tn + fn www.nature.com/scientificreports/ and silt with an average of 88-92% confident level. Its bounding region is also accurate with respect to ground truth. Furthermore, statistical measures have been performed for estimating the robustness of the drain inspection algorithm. For each individual class, 100 images were chosen and used for statistical measures. The images are chosen with a mix of both unseen offline and real-time field-collected images. Table 3 shows the statistical measures result of the SSD Inception v3 framework.
The statistical measure indicates that the algorithm has detected the drain blocking objects with 91.42% accuracy for combined test images (offline and real time collected drain inspection images) for day time and an accuracy of 90.18% is achieved during the night time (with light source to illuminate path and objects). A small drop-off of accuracy of 1.24% between day time and night time is reported. Further, the model miss rate is 4% for offline test and 7% for real time test images. The higher miss detection in real time is attributed to object occlusion, blurring due to jerk in robot navigation when moving on uneven surface, shadows, and varying lighting         www.nature.com/scientificreports/ Comparison analysis. This section examine the performance of SSD inception framework with baseline methods SSD MobileNet v2 and SSD VGG16 and also perform the comparison analysis with other state-of-theart object detection methods including Yolo v3 and Inception variants. The comparison analysis was performed with mixed of real time field images and web collected images with a total of 100 images from each class. The detection frameworks were trained using a transfer learning scheme along with our custom drain inspection data set. Table 4 shows the comparison between the proposed framework, baseline methods and the other object detection frameworks along with computation time.
The results indicate that the proposed framework and the various SSD Inception versions have good detection accuracy compared with Yolo v3 and the two baseline methods. The key issue of Yolo v3 and the two baseline methods is the miss detection and false classification of objects. These two factors affect the overall performance of the two baseline methods and Yolo v3 framework. In computation cost analysis, Yolo v3 (31 FPS) and SSD MobileNet v2 (23 FPS) took lower execution time than SSD Inception v3 (18 FPS), SSD VGG16 (14 FPS), SSD Inception v4 (9 FPS) and SSD Inception ResNet v2 (7 FPS). Due to dense CNN layers, the various SSD Inception versions and SSD VGG16 computation time are quite high comparatively. However, this is does not heavily affect the performance of the real-time drain inspection task. It can be overcome by upgrading the computing hardware. In this analysis, we observe that SSD Inception v3 model strikes a balance between detection accuracy and computation time. Hence, SSD Inception v3 model is chosen as the optimal framework for the drain inspection task.
Comparison with existing work. This section elaborates the comparative analysis of the proposed algorithm with existing drain inspection framework reported in literature based on inspection tool and inspection algorithms. Table 5 states the accuracy of various inspection models and algorithms based on some similar classes.
In Table 5    In contrast with our implementation the above schemes' classification and detection scores are quite low, however, the implementations cannot be compared directly with our work. Since their datasets, CNN topology and training parameters are totally different. Further, deploying and maintenance of CCTV for long range drain networks is a challenging task. More cameras are required to cover a more extensive area of the drainage. This will increases the cost of CCTV drain inspection system. The proposed framework utilises real-time inspection in comparisons to the above schemes which utilises offline inspection, this difference is reflected in the choice of algorithm used. Further, compared to Cheng et al. utilisation of Faster RCNN the proposed framework opted for a faster and lighter object detection framework, SSD. Table 6 shows the comparison analysis of current robotic platform with existing drain inspection platforms. Here, robots KURT, PIRAT, and MAKRO has adopted fixed-morphology and designed only for sewer pipe inspection with small diameter. Whereas, Tarantula is the reconfigurable drain inspection robot. However, the robot has high cost, complex mechanical design and heavy weight due to many actuators. Moreover, its maneuverability is not stable in some drain segments 16 . On the other hand, our in-house developed robot Raptor is lighter and has a simplistic design approach. Also, its three modes of reconfiguration provides good maneuverability in varied terrain condition. This design consideration is the core contribution of the proposed design with respect to the state-of-the-art.

Conclusion
Remote drain inspection framework was proposed using Deep Learning based object detection framework and our in-house developed self-reconfigurable robot Raptor. SSD Inception v3 object detection module was trained for drain inspection tasks with common drain-blocking objects. The model was trained by a transfer learning scheme and used the pre-trained weights. The efficiency of the remote drain inspection framework was assessed 100 meters to 300 meters in three different drains including closed drain, semi-closed drain, and open-drain. The experimental results proved that our in-house developed robot maneuverability was stable and its mapping and localization is more precise in complex rugged drain terrain. The efficiency of the drain inspection algorithm was examined in two phases: offline test and real-time field trial using drain images captured by Raptor. Standard performance metrics including accuracy, precision, recall, and F1 measures were used to assess the drain inspection algorithm. The performance metrics results show that the drain inspection algorithm has detected the drain blocking objects, including bushes, trashes, and silt, with 91% detection accuracy. Further, comparison analysis was performed with other object detection frameworks, existing object detection framework and existing drain inspection platforms. The analysis results show that the various SSD inception versions and SSD VGG16 models have higher detection accuracy than SSD MobileNet and Yolo v3 framework. However, SSD MobileNet and Yolo v3 framework is capable of lower inference time. In this comparison, SSD Inception v3 model strikes the balance between detection accuracy and computation time. Its detection accuracy and inference speed outperforms SSD VGG16. All the analysis results ensure that our proposed framework can perform drain inspection tasks and have the capability to identify the drain blocking objects in closed, semi-closed, and fully opened extended drains in varying lighting conditions and weather conditions.