Deep learning-based approach for high spatial resolution fibre shape sensing

Fiber optic shape sensing is an innovative technology that has enabled remarkable advances in various navigation and tracking applications. Although the state-of-the-art fiber optic shape sensing mechanisms can provide sub-millimeter spatial resolution for off-axis strain measurement and reconstruct the sensor’s shape with high tip accuracy, their overall cost is very high. The major challenge in more cost-effective fiber sensor alternatives for providing accurate shape measurement is the limited sensing resolution in detecting shape deformations. Here, we present a data-driven technique to overcome this limitation by removing strain measurement, curvature estimation, and shape reconstruction steps. We designed an end-to-end convolutional neural network that is trained to directly predict the sensor’s shape based on its spectrum. Our fiber sensor is based on easy-to-fabricate eccentric fiber Bragg gratings and can be interrogated with a simple and cost-effective readout unit in the spectral domain. We demonstrate that our deep-learning model benefits from undesired bending-induced effects (e.g., cladding mode coupling and polarization), which contain high-resolution shape deformation information. These findings are the preliminary steps toward a low-cost yet accurate fiber shape sensing solution for detecting complex multi-bend deformations.

Fiber optic shape sensing has proven to have great potential, specifically in medical applications such as catheter navigation, surgical needle tracking, and continuum robot navigation.Compared to other common navigation technologies [1][2][3], fiber shape sensing has many advantages, such as immunity to electromagnetic fields, bio-compatibility, and high flexibility.Fiber shape sensors are small in diameter, easily integrable into flexible instruments, and require no line-of-sight.Distributed sensors based on multicore fibers can also provide high-resolution shape measurements [4][5][6].
Fiber shape sensors measure off-axis strain, which is then used to calculate the directional curvature and reconstruct the sensor's shape [7].Various fiber sensor configurations have been investigated for off-axis strain measurement, including multicore fibers with [8][9][10] or without [11][12][13] FBGs in their cores, fibers with cladding waveguide FBGs [14], and fiber bundles made from multiple single-mode fibers that contain FBG arrays [15][16][17][18][19].For an accurate shape reconstruction, high spatial resolution for off-axis strain measurement is essential.In some fiber shape sensor configurations (e.g., distributed multicore fiber sensor), sub-millimeter spatial resolution can be achieved [4].However, in these sensors, complex and expensive readout units are used to analyze the output signal in time-or frequency domain for strain measurement [20][21][22][23].Although fiber sensors interrogated with the spectral-domain readout systems are cheaper, their spatial resolution is limited by their lower sensing plane density [8,24], making them inapplicable for tracking complex shape deformations.Therefore, a cost-effective, high-resolution, and accurate fiber shape sensing technique is desirable.
We present in this paper a data-driven modeling technique based on deep learning (DL) that can indeed find a meaningful pattern in the eFBG signal that is affected by uncontrolled bending-induced effects.These additional sources of information considerably improve shape prediction accuracy.Our novel technique provides high spatial resolution shape estimation, directly from the eFBG sensor's signal without requiring any strain measurement, curvature calculation, and shape reconstruction steps.

Concept
In this section, we explain the designing and training process of a deep neural network for our eFBG shape sensor.The 30 cm long eFBG fiber sensor used in this work features five sensing planes separated by 5 cm from each other.At each sensing plane, three off-axis FBGs are inscribed at a radial distance of ∼ 2 µm to the top, left, and right side of the fiber's core.
The dataset used for developing the deep-learning-based model is collected using a similar setup reported in our previous work [44] (see Methods for more details).We use three normalized, consecutively measured spectral scans as input data to the proposed DL model.Each scan is recorded from 800 nm to 890 nm comprising 190 wavelength components.The target data are the relative coordinates of 20 discrete points (reflective markers of the tracking system) measured over the length of the shape sensor (see Ref. [45] for more detail on data preprocessing).For this dataset, around 58000 samples are collected during 30 mintues of random movement of the fiber sensor.To evaluate the predictive performance of the trained model in an unbiased way, samples are first shuffled and then split into Train-Validation-Test subsets: 80% for training, 10% for validating, and 10% for testing.We refer to this testing data set as T est 1 for the remainder of this paper.The second set of data (T est 2 ) with a size of ∼ 5800 samples is recorded separately to evaluate the performance of the trained model for unseen shapes from a continuous movement.We also collected 320 samples, as T est 3 , when only certain sensor regions are bent (see Methods for more detail).
A DL model needs a specially designed network architecture to extract essential features from the sensor's spectra and to predict its corresponding shape.To do so, we ran an optimization algorithm similar to the Hyperband optimizer [46].This optimization algorithm looks for the best set of essential parameters, such as number of layers, whose values can not be estimated from the data during training (also known as hyperparameters).Figure 2 shows the architecture of the best-performing configuration after hyperparameter tuning.
To find out which part of the spectra is relevant for feature extraction, we calculate the forward finite difference of the network's output with respect to the input spectral components.This difference provides an influence evaluation Fig. 2 Architecture of the best-performing configuration after hyperparameter tuning.The architecture includes five 1D convolutional layers (Conv1D), six fully connected layers, five max pooling layers, four batch normalization steps, and two dropout steps.The designed network receives three consecutive spectral scans as the input and predicts the relative coordinates of 20 discrete points over the sensor's curve.More detail on the channel, kernel, and pooling sizes is available under Methods.for each wavelength component of the input spectra to decode the model's predictions (see Methods).
As an evaluation baseline, we compared the shape prediction accuracy of the proposed DL approach with the mode-field dislocation method (MFD) on the same test sets.Following the process explained in our previous work [25], we calibrate our shape sensor to determine the exact angular and radial position of each eFBG.Then, we estimate the mode-field centroid at each sensing plane and calculate the curvature and the bending direction [25].Finally, we reconstruct the 3D shape of the eFBG sensor using the interpolated values of the calculated directional curvatures at small arc elements.It should be noted that the density of the sensing planes in our eFBG shape sensor is not sufficient for the MFD method to estimate complex deformations.Nevertheless, we performed the test to show the superiority of the proposed data-driven technique (DL).

Results and Discussion
Shape prediction evaluation.We evaluated the performance of the DL approach using the three testing datasets and compared the results with the MFD method.Table 1 shows the shape error metrics including the tip error, that is, the Euclidean distance between the true and the predicted coordinate of the sensor's tip, and the root-mean-square of the Euclidean distance (RMSE) between the true and the predicted coordinates of the discrete points along the sensor's length.The MFD approach, when using T est 1 dataset, shows median and interquartile (IQR) tip error values of 111.3 mm and 121.5 mm, respectively.These error values reduce to 98.5 mm and 46 mm when using T est 2 dataset.The reason for such performance difference is that T est 1 dataset contains more diverse shapes, as the samples are randomly selected from a larger dataset compared to T est 2 , which is a continuous sensor movement in a short period.As expected, the error values are considerably high in all testing datasets, since there is too little information available for the MFD approach to estimate the complex shape deformations in these datasets.
The DL method, on the other hand, significantly improves the shape prediction accuracy of T est 1 samples, with a median and IQR tip error values of 2.1 mm and 2.6 mm.These values increase to 17.1 mm and 12.6 mm on less diverse T est 2 samples.This is due to the fact that a DL model can only learn to extract the most general/relevant features from the input signal, if the training dataset is representative of the expected signals from the sensor.However, in T est 2 dataset, less than 2% of the samples have at least 100 similar examples in the training data (a maximum RMSE of 5 mm is chosen as the similarity measure after evaluating several thresholds).This shows that 30 minutes of manual shape manipulation is insufficient to cover the sensor's working space and create a representative training dataset for the model to generalize properly.On the other hand, in T est 1 dataset, almost 20% of the samples have at least 100 similar examples in the training dataset, which means the DL method is being tested on samples that the model has already learned how to handle.Therefore, T est 1 can mimic the situation where the training dataset represents the expected shapes of the sensor.
Shape evaluation results of T est 1 dataset define the performance of our model's lower limit.Such performance difference also suggests that the DL model is better to be trained as application-specific, since it can better focus on relevant features when learned from the expected shape distribution of the sensor.On the other hand, when training data covers most of the expected behaviors from the sensor, the DL model might only "memorize" the corresponding shape for each signal without searching for relevant features in the sensor's spectrum.To investigate this, we compare the performance of our DL method with a dictionary-based algorithm.In this approach, all the training and validation samples create a pre-defined dictionary.The shape prediction is made by looking for the closest spectrum to the test sample and presenting its corresponding shape.The median tip errors on datasets T est 1 and T est 2 are 5.9 mm and 50.0 mm with IQR values of 3.9 mm and 43.3 mm, respectively, which are higher than error values when using our DL technique.This shows that our DL model generalizes and is indeed beneficial for predicting more accurate shapes.
Two essential factors have to be considered when working with dictionaries: the size of the dictionary and the execution time required to find the best matching example.To get an accurate shape estimation for a given sample, the number of stored samples in the dictionary should be large enough to cover all possible examples, which leads to a long execution time.Therefore, this approach has a trade-off between accuracy and execution time.However, extensive training data do not negatively affect these two factors in DL method, as the resulting model size is independent of the training data size.
Our observations show that the designed DL model can recognize deformations even between the sensing planes.To further investigate this interesting finding, we evaluated the shape predictions using T est 3 dataset, in which the deformations are only applied between the sensing planes.T est 3 dataset contains four deformation examples, each repeated twice and measured 40 times.As expected, the classical MFD method is not able to accurately predict the sensor's shape for such deformations, as the deformed area is not at any of the sensing planes.However, when using the DL method, we achieve a median tip error of 6 mm which is ∼ six times smaller than the median tip error using MFD on this dataset.The precision of the predicted tip position in T est 3 dataset is 1.9 mm on average.
An example from T est 3 samples where the sensor is bent in the region between the sensing planes 3 and 4 is depicted in Fig. 3a.It should be noted that the intensity ratio of the eFBG Bragg peaks in each sensing plane can also be influenced by previously mentioned effects other than fundamental modefield dislocations.The MFD approach, however, does not consider such effects and is thus incapable of correctly interpreting the signal variations.On the other hand, the DL model manages to accurately predict the sensor's shape as it looks at the full spectral profile, including the minute changes at the wavelengths outside the Bragg resonances.Figure 3b shows the finite difference analysis of the loss value with respect to the 190 wavelength components of the input spectra.The higher the difference, the more important the corresponding wavelength component is for shape prediction in this example.Figure 3c gives a deeper insight into this investigation.For all 190 wavelength components, the Euclidean distance between the predicted relative coordinates of each marker before and after the spectral modification is depicted using a color map.The contribution of each wavelength component to the relative coordinate prediction of all 20 markers is realized from the presented color map in Fig. 3c.
Fig. 4 Decoding the DL model decision for deformations after the last sensing plane.An example from T est 3 samples in which a 3 cm long segment, 1 cm after the last sensing plane, is deformed.Refer to the caption of Fig. 3 for more details.
Another important finding is that the DL model can also detect deformations after the last sensing plane.Figure 4 shows an example in which, a 3 cm long segment, 1 cm after the last sensing plane, is deformed.Similar to the example in Fig. 3, the MFD method is not able to predict the sensor's shape in such deformations.The DL model, in contrast, learned to employ relevant features in the side slopes of the eFBG spectra to predict the correct shape (see Figs. 4b and c).A possible explanation for such intriguing performance is that in the area after the last sensing plane, wavelength-dependent interference occurs between the back-reflected light from the air-glass interface at the fiber's end tip (Fresnel reflection) and the downstream incident light.Deformations in this region affect interferences in two ways: first, the spectral profile of the downstream light changes due to the bending.Second, the coupling conditions between the back-reflected and the downstream lights change.Consequently, the measured spectra from the fiber sensor show small variations, as the deformations affect the interference pattern.Optimum number of sensing planes.A key factor in eFBG sensors, when using the MFD method, is the number of sensing planes for detecting shape deformations.Similar to any other quasi-distributed shape sensor, the distance between the sensing planes determines the sensor's spatial resolution in shape measurements.Depending on the complexity of the shape deformations, a limited number of sensing planes in the sensor (low spatial resolution) can lead to large tip errors in methods that include shape reconstruction (e.g., the MFD method).In this section, we present a theoretical analysis for realizing the minimum number of sensing planes required in eFBG sensors when using the MFD method to reach the same accuracy for the shape prediction as we get using five sensing planes in our DL method here.
We simulated the shape reconstruction error for different spatial resolutions.To do so, we first interpolate the discrete curve points over the sensor's true shape measured by the motion capture system, using a Spline with a resolution of 0.1 mm (this value was selected empirically).We then calculated the curvature and the torsion-the curve's deviation from the osculating plane-at the query points.Finally, we use the calculated curvatures and bending directions at the sensing planes to reconstruct the spatial curve and compare it with the true shape.For a 25 cm long sensor with 50 mm spatial resolution (five sensing planes), the median tip error of the reconstructed shapes, tested on T est 1 and T est 2 datasets, is ∼ 50 mm, which is almost 16 times higher compared to what the DL approach achieved (see Table 1).In order to get a median tip error of 3 mm, the spatial resolution of the sensor should also be in a similar range, meaning that the MFD method would need around 84 sensing planes consisting of 252 eFBGs.

Conclusion
In this paper, we developed a novel fiber shape sensing mechanism with a datadriven technique, that unlike conventional fiber shape sensors, does not include off-axis strain measurement and curvature calculation at discrete points along the fiber sensor's length to estimate its 3D shape.We used an easy-to-fabricate eFBG sensor with a simple and cost-effective readout unit.We designed a deep learning algorithm that can directly learn from our sensor's signal to predict its corresponding shape.We then evaluated the shape prediction accuracy of our designed model (the DL method) in various testing conditions and compared it with an exemplary experiment, the mode-field dislocation method (MFD).Furthermore, we showed that the spatial resolution of off-axis strain measurement in FBG-based (quasi-distributed) shape sensors is the main limitation, as the deformations between the sensing planes are not detected in complex shapes.The deep learning technique, on the other hand, uses the full spectrum of our eFBG sensor, including the Bragg resonance's side slopes, as the model's input to compensate for the low density of sensing planes.We believe that the deep learning model is using the impact of undesired bending-induced phenomena, including cladding mode coupling, bending-loss oscillations, and polarization-dependent losses, as additional sources of information to overcome the spatial resolution limitation for detecting complex deformations.Therefore, there is no need to adapt the fiber sensor design and its interrogation system for minimizing the impact of such bending-induced phenomena.The shape prediction error of our developed DL method for 3D curves in a curvature range of 0.58 m −1 to 33.5 m −1 is reduced by a factor of ∼ 50 compared to the MFD method.We also showed that the designed deep learning model generalizes nicely, as the performance is twice as good compared to a dictionary-based algorithm.The proposed shape sensing solution is 30 times less expensive than the commercially available distributed fiber shape sensor with a similar level of accuracy.

Methods
Working Principle of eFBG Sensor When the eFBG sensor is bent, the field distribution of the fundamental mode moves away from the core center [25][26][27] (see Fig. 1b).Dislocations in the mode-field's centroid cause intensity changes in the reflected signal from the eFBGs [25].From the intensity ratio between the eFBGs at each sensing plane, curvature and bending direction can be calculated [25].For simplification, this approach assumes that no other physical phenomena inside a bent optical fiber affect the intensity ratio between the eFBGs of the same sensing plane.
However, positioning FBGs away from the core axis breaks the cylindrical symmetry of the fiber, which increases coupling from the core mode to the cladding modes [28,29].The strength of such mode coupling varies when the fiber is bent, as it affects the overlap integral between the interacting modes [28,30].Bending an optical fiber causes strain-induced refractive index changes and dislocates the intensity distribution of the propagating light [26,47,47], which directly influences the coupling efficiency.Therefore, the intensity of the cladding modes changes when the fiber is bent.In eFBGs, formation of cladding-mode resonances in fiber gratings provides highly sensitive full directional bending response with a simple light intensity measurement [31].Although cladding modes are often stronger in stripped fibers or in fibers with lower refractive index coatings than the cladding layer [28,29], they have also been observed in standard fibers coated with higher refractive index materials [48].Any recoupling between the excited cladding resonances and the fundamental mode, affects the relative intensity values between the eFBGs.
FBG interrogators for quasi-distributed sensors typically consist of a broadband light source, like a super luminescent diode (SLED), and a grating-based spectrometer.The emitted light from SLEDs is partially polarized, meaning that it undergoes wavelength-dependent polarization changes [32] in a birefringence medium (e.g., bent fiber) [33][34][35][36].On the other hand, the efficiency of spectrometer grating is polarization-dependent, and therefore, the spectral profile will be impacted by polarization-dependent losses.This effect further modifies the measured intensity ratio between the Bragg peaks.The polarization effect in intensity-based fiber sensors is often kept at a minimum by using a polarization scrambler to change the polarization state randomly or by using polarization-insensitive spectroscopy instruments.
As is well known, light power loss increases when optical fibers bend [37].Macro bending loss usually reflects itself in spectral modulations due to coherent coupling between the core mode and the radiated field reflected by the cladding-coating and the coating-air interfaces (also known as whispering gallery modes) [38,49].The reflected field, at the coating-air boundary, causes short-period modulations as the re-injection path is longer [38,49].Whereas, reflections at the closer cladding-coating interface cause long-period resonances [39][40][41]49].These bending attenuation losses are also temperaturedependent. Thermal variations affect the refractive index of the coating layer and consequently influence the coupling between the core and the cladding whispering gallery modes [42].Many models have been proposed to evaluate bending loss peak positions and shapes ([39, 40, 43]).The strong wavelength dependence of bending losses is an additional complicating factor in designing intensity-based sensors [49] as it modulates the spectral profile and affects the intensity ratio at the Bragg peaks of the eFBGs in a same sensing plane.
Setup.Data acquisition setup used for developing the deep-learning-based model is shown in Fig. 5.We used a low-cost FBG interrogator (MIOPAS GmbH, Goslar, Germany) consisting of an uncooled transmit optical subassembly (TOSA) SLED module and a NIR micro-spectrometer with 0.5 nm resolution to cover all 15 Bragg wavelengths from 813 nm to 869 nm.We recorded the sensor's spectra at random curvatures and orientations (in a curvature range of 0.58 m −1 to 33.5 m −1 ) while monitoring the reflective markers attached to the 30 cm long sensor using a motion capture system (Oqus 7+, Qualisys AB, Sweden).The data acquisition time period was 30 minutes for T est 1 and 3 minutes for T est 2 datasets.The acquisition rates in the FBG interrogator and the motion capture system were 75 Hz and 200 Hz, respectively.The sensor's spectra and the coordinate values corresponding to its shape were synchronized with a tolerance of less than 3 ms.
We also used a laser-cut curvature template (Fig. 5) to collect 320 samples for T est 3 dataset, when only certain sensor regions should be bent.The curvature template has four grooves allowing the sensor to be bent at the middle 30 mm area between the sensing planes 2 and 3, 3 and 4, 4 and 5, and 10 mm after the last sensing plane with a bending radius of 50 mm.
Training Setup.The search space we defined for tuning the network's hyperparameters consists of the number of 1D convolutional layers (Conv1D), the number of fully connected layers (FC), the layer settings, the choice of batch normalization and downsampling, training settings, and loss function parameters.Search criteria are presented in Table 2.
In the designed network, input samples with a batch size of 256 are first batch normalized and then fed into a Conv1D layer with 16 channels, followed by a max pooling layer with a kernel size of 3 and a stride of 2. The second Conv1D layer also has 16 channels, followed by a max pooling layer with a kernel size of 2. The third Conv1D layer has 32 channels, followed by a max pooling layer with a kernel size of 3 and a stride of 2. The fourth Conv1D layer also has 32 channels with a stride of 2, followed by a max pooling layer with a kernel size of 3. The last Conv1D layer has 256 channels, followed by batch normalization and a max pooling layer with a kernel size of 2 and a Fig. 5 The data acquisition experimental setup.The motion capture system includes five tracking cameras (Oqus 7+, Qualisys AB, Sweden).For protection purposes, the fiber sensor is inserted in a Hytrel furcation tubing with an inner diameter of 425 µm and an outer diameter of 900 µm.Two v-clamps are used to hold the protection tubing and to fix the optical fiber before the insertion.The reflective markers are 6.4 mm in diameter with an The last layer is an FC layer that maps the output of the fourth FC layer into the target values, the relative coordinates.In all layers of this network architecture, the rectified linear unit (ReLU) serves as the activation function, and the kernel size for the Conv1D layers is 3.In this model, the Adam optimizer with a learning rate of 0.0001 minimizes the SmoothL1 loss function with a threshold of 4.04.
Decoding The Model's Decisions.Inspired by Gradient-weighted Class Activation Mapping (Grad-CAM), we decode the decisions made by our CNN (convolutional neural network)-based model.Decoding our model's decisions helps us understand which part of the input spectra contributes to coordinate predictions.Grad-CAM is a commonly used technique in image classification problems that generates visual explanations from any CNN-based model without any re-training or architectural changes required.Gradient is a measure that shows the effect on the output caused by the input.In other words, we are looking for the part of the input with the highest effect on the model's output.However, due to the small output dimension in each channel of the last Conv1D layer, its gradient heat map highlights the inputs' important parts with a low resolution.Therefore, instead of the gradient of the Conv1D layers, we calculate the forward finite difference of the model's loss with respect to the input spectral elements.The spacing constant is chosen 0.1, higher than the spectral intensity noise level.In this method, we modify the intensity value of one spectral element and monitor the changes of the model's loss value.We repeat this process for all 190 spectral elements.The resultant color maps (shown in Fig. 3  spectral element on the model's SmoothL1 loss value.In order to investigate the contribution of each spectral element to the coordinate prediction of each individual marker, we calculated the Euclidean distance between the predicted coordinates of each marker before and after spectral modification.This way, we were able to highlight all the spectral elements contributing to the relative coordinate prediction of each marker.
Supplementary Information.We provided three videos in the supplementary material, visualizing the sensor's predicted shapes using the DL and the MFD methods on all three datasets.

Fig. 1 FBG
Fig. 1 FBG configuration and working principle of the eFBG sensor.a Sketch of the cross-section view of the eFGB sensor.Each sensing plane of the eFBG sensor consists of three FBGs inscribed off-axis with ∼ 90 • angular separation (also known as edge-FBG triplet).b Mode-field distribution of a straight single-mode fiber and the expected signal from eFBGs of a same sensing plane.c When the fiber is curved, mode-field distribution moves in the opposite direction of the bending, which affects the relative intensity between the eFBGs.

Fig. 3
Fig. 3 Decoding the DL model decision for deformations between sensing planes.a An example from T est 3 samples in which the sensor is bent in the region between the sensing planes 3 and 4. The true shape (ground truth) is shown with green circles.The five sensing planes of the sensor are shown with × signs.The predicted shapes using the mode-field dislocation method (MFD) and the deep learning method (DL) are shown with orange and purple solid lines, respectively.b The finite difference of the loss value with respect to the input spectral elements.Wavelength components shown with colors closer to yellow contribute more to the model's decision on this particular example.c Highlighting the importance of input spectral elements in relative coordinate prediction of all 20 markers based on the magnitude of the Euclidean distance between the predicted relative coordinates of each marker, before and after spectral modification.The position of the sensing planes with respect to the markers are indicated with dashed blue lines.SP i : i th Sensing Plane.
(b) and Fig. 4 (b)) indicate the impact of the changes in each

Table 1
Shape evaluation errors in MFD and DL methods using test sets T est1, T est2, and T est3.The lowest achieved error values are indicated in bold.

Table 2
Search criteria for hyperparameter optimization.The extracted features are flattened to a 2048-long vector, fed into 5 FC layers, each with 2000 units.The first FC layer is followed by batch normalization and a dropout layer with a probability of 0.37, and two more FC layers.A batch normalization, an FC layer, a dropout layer with a probability of 0.16, and a fourth FC layer are the remaining layers before the final layer.