Directional intensified feature description using tertiary filtering for augmented reality tracking

Augmented Reality (AR) is applied in almost every field, and a few, but not limited, are engineering, medical, gaming and internet of things. The application of image tracking is inclusive in all these mentioned fields. AR uses image tracking to localize and register the position of the user/AR device for superimposing the virtual image into the real-world. In general terms, tracking the image enhance the users’ experience. However, in the image tracking application, establishing the interface between virtual realm and the physical world has many shortcomings. Many tracking systems are available, but it lacks in robustness and efficiency. The robustness of the tracking algorithm, is the challenging task of implementation. This study aims to enhance the users’ experience in AR by describing an image using Directional Intensified Features with Tertiary Filtering. This way of describing the features improve the robustness, which is desired in the image tracking. A feature descriptor is robust, in the sense that it does not compromise, when the image undergoes various transformations. This article, describes the features based on the Directional Intensification using Tertiary Filtering (DITF). The robustness of the algorithm is improved, because of the inherent design of Tri-ocular, Bi-ocular and Dia-ocular filters that can intensify the features in all required directions. The algorithm’s robustness is verified with respect to various image transformations. The oxford dataset is used for performance analysis and validation. DITF model is designed to achieve the repeatability score of illumination-variation , blur changes and view-point variation, as 100%, 100% and 99% respectively. The comparative analysis has been performed in terms of precision and re-call. DITF outperforms the state-of-the-art descriptors, namely, BEBLID, BOOST, HOG, LBP, BRISK and AKAZE. An Implementation of DITF source code is available in the following GitHub repository: github.com/Johnchristopherclement/Directional-Intensified-Feature-Descriptor.

The performance of AR model is measured by two parameters: (i) Robustness, and (ii) Efficiency.Robustness is an ability of a descriptor to effectively describe the features even when the image undergoes any transformations.Efficiency is measured in the form of time taken to process the image into an AR.The feature descriptor is robust, if it is • Compact in size/dimension • Feature should remain same irrespective of the nature of the image and scenes.
• Invariant to rotation, scale and lighting changes.
Many feature descriptor models are found in the literature, namely, Scale Invariant Feature Transform (SIFT), KAZE, Speeded Up Robust Features (SURF), Binary Robust Invariant Scalable Keypoints (BRISK), Oriented FAST and Rotated BRIEF (ORB), Fast Retina Keypoint (FREAK) and Histogram of Oriented Gradient (HoG).Based on our survey, each model is using different method to detect the features, which leads to achieve a robustness and efficiency.
SIFT uses multiple scale of images and extract the feature of those images using Difference of Gaussian method.Many modified SIFT 6 has been released recently for the improvement of image matching.The feature detected from SIFT is invariant to size and orientation of the image.Despite, SIFT features, retrieves the image from large set of database, It has limitation such as computation speed.
The descriptor SURF uses Hessian matrix to locate objects / images.It utilizes the box filter in gray scale image for the improvement of computation in image detection and tracking 7 .A new SURF model, involves the multimodal images to predict the feature matching which leads to achieve the quality image registration 8 .However, feature extraction of SURF is not stable in illumination and rotation variation.
ORB make use of FAST detector for feature extraction as name implies it is faster 9 .The Multi-scale pyramid model of ORB represents a scale invariant property.However, accuracy of this property in ORB is less than SIFT 10,11 .The intensity centroid technique attributes the rotation invariance.ORB provides high immune to Gaussian noise.In outdoor environment, ORB predicts more features than SIFT and SURF but it lacks in robustness.
The FREAK descriptor's feature selection is based on retinal sampling pattern.In terms of computation speed, FREAK is better than ORB and BRISK and is 29.1 ms for mobile AR tracking 12 .Generally, for the AR augmentation, the computation time should be less than 100 ms.FREAK achieves the speed by reducing the dimension of the descriptor, but at the cost of compromising its tracking accuracy.kAZE descriptors are used to describe 2D features in a nonlinear scale space by means of nonlinear diffusion filtering.They operates in nonlinear scale space.The Gaussian scale space used in SIFT and SURF does not obey the natural boundaries of object.In contrast, nonlinear scale space preserves the image details as long as it's necessary.Modified KAZE features are employed to detect the Diabetic Retinopathy (DR) in medical field 13 .This method can detect DR in early stages, with minimum amount of processing time 14 .
HoG 15 is used in many CV applications starting from pedestrian detection in static image to human detection in motion (video).However, the performance of HoG lacks in light and scale in-variance.Recently, many modified version of HoG have been published in object detection and matching 16 .In Ref. 17 , the authors have proposed the Linear Binary Pattern (LBP) combined with HoG model to improves the accuracy of feature extraction.Therefore, this model is used in early-stage detection of Congenital heart defects to prevent adverse consequences.HoG provides a good image matching, although it has the limitation such as rotation-invariance.
Applications like autonomous vehicle and augmented reality needs high resistance in light and rotationvariation of image matching.Adversarial learning-based feature extraction uses day and night features to improve the image matching in illumination variation 18 .However, it need prior information of scene for reconstruction.The new GAN model has been published to establish the 3D reconstruction without using any prior data 19 .In contrast, it needs more image to train a model and time consuming.
Image matching in AR tracking engages Spherical localization algorithm 20 to identify the position of stationary users.Spherical localization algorithm adopt the technique of global localization to extract 1000 features for further processing.However, the tracking is consistently poor due to the feature matching.This factor limits the successful tracking, for long period of time.In 21 , the researchers have proposed multi attention model, to improve the feature matching and it enhances the tracking efficiency.
Real-time application of AR tracking requires robust features for effective localization.
In recent years, many feature descriptor models have been published for the improvement of feature matching 22,23 .However, each model satisfies only one property of descriptor as we mentioned in Introduction .
Our aim of the proposed model is to fulfil the gap, which are in mismatch and weakness of the descriptor towards various image transformations.To improve the robustness of the feature descriptor, we are implementing the following steps in our proposed work.
In summary, the main contributions of our work with respect to our preliminary study are as follows: 1.The DITF method computes the features of an image using three inherent filters, namely Tri-ocular, Bi-ocular, and Dia-ocular.2. We introduced a directional intensification algorithm to describe the features in all required directions.3. The DITF model is made in such a way that it is invariance to affine transformations.4. The performance of DITF is measured in terms of the repeatability score for the region of interest.
Hence, DITF model implies the enhancement of robustness in AR tracking.
The remainder of this paper is organised as follows: The proposed methodology is given in The proposed model with the significance of the filter design.Results and Discussion investigates and estimates the performance of the feature descriptor.Finally Conclusion and Future Scope is mentioned in Conclusion and Future Scope.

The proposed model
In this section we will discuss the implementation of DITF Model.We propose a feature extraction model for feature matching that is calculated based on direction of pixels.Our method is different from existing descriptor that it make use of three filters for computing the direction of pixels.The performance result has been analysed using repeatability score.Here, we describe our feature extraction for the image size of M*N and the image matrix is represented as A in Eq. (1).From the image matrix we segmented the sample pane as A ij .The sample plane size should be same as filter size.We design a filter in such a way that it has to analyse the three direction of pixels for the feature validation.The size of the filter is taken as k and l.Apply each filter to the sample plane as illustrated in Figs. 1 and 2. Each filter consist of the values namely,-1,1 and 0 in various spatial location as per the requirement.The filters T(z, y), B(z, y) and D(z, y) respectively are designed based on Eqs. ( 2), ( 3) and ( 4).
Tri-ocular filter act as a parallel filter to measure the parallel plane of the image.Bi-ocular filter obtain the perpendicular values.Dia-ocular filter evaluates the plane that connect non-adjacent vertices of the image.A ij is transformed to vector as given in Eq. ( 5).
The filter computes the directional intensification with the help of Eqs. ( 6) and ( 7) to provides the corner and edge details of the sample image matrix A ij .The algorithm used to acquire the feature vector of the whole image plane is shown in 1.
The details in the given 2 dimensional image plane A might be of variations in the intensity in any particular direction.For the better description, these features, must be highlighted with their intensities along with the direction they point to.Let θ denote an angle from 0 • to • and is divided linearly into R number of points equally, www.nature.com/scientificreports/with the step size � θ , such that , H ij and θ , the feature describing algorithm go on as below.I is plotted as per algorithm 1 to evaluate the feature distribution in image.With respect to I we need to find the orientation of θ along with the variation in intensity.Each θ developed from the algorithm 2 is the feature vector of the descriptor.Every region of interest in image generates one feature vector.The sample plane, filter size and classes decides the dimension of the feature descriptor.The dimension of the DITF descriptor is ((k*l) *R).The DITF model flow diagram is given in Fig. 3.
The entire 2 dimensional gray scale image can be written in matrix form as where M and N denote the size of the matrix A, M is number of rows and N is number of columns, respectively.In order to do the filtering, we define three filters, namely, 'Tri-ocular' , T , 'Dia-ocular' , D , and 'Bi-ocular' , B.
For the operational convenience, the sliced array A ij is transformed into a as Given the filters, and for the fixed i and j, the magnitude metric is given by where b , t and d are the similar transformation of filters T , B and D as mentioned in (5).
Similarly, the angles for each sample of i and j is computed as

Input image transformation
The input is taken as a mandrill image and the robustness of DITF model is tested using Oxford Dataset.For further references of dataset you can use the following link https:// www.robots.ox.ac.uk/.It has five types of transformation (Blur, View-point, zoom with rotation, Light, and JPEG Compression).As per our analysis, we have included three transformations with different types of images.The reason is, these three transformations are occurred frequently in AR Tracking.Each transformation has six variations of images.We have analyzed all six images and we have included significant image for validation.Mandrill image is taken from USC university of southern California, the link is attached here for the reference of database image http:// sipi.usc.edu/ datab ase/.The effective feature descriptor 12 should be robust to scale, rotation and lighting changes of the image.We applied affine transformation to the original image using homography matrix.Oxford Dataset used in our algorithm is as follows: • View-point variation • Blur changes • Illumination variation.
All the images used in our analysis is medium resolution approximately 800*640 pixels.All affine transformation image features are extracted using DITF.The relevant figure is illustrated in section Input Image Transformation

Homography
Homography matrix establish a relation between two images of same planar with different transformation.The homography between reference and test image is computed in two steps: • Manually, select four correspondence point between two images.
The Eq. ( 8) is used to understand the homography principle.The relation of h-matrix between two images are given.The reference image plane co-ordinates are (x,y,1), we are getting warped /transformed image as ( x ′ ,y ′ ,1) with the help of h-matrix.Hence, the affine co-ordinates of any point can be computed in Homography matrix.DITF make use of homography matrix to measure the transformation and establish the robustness of our feature descriptor model.The attribute of homography is independent of scene/ environment 24 .Hence, It is very much useful in AR.

Results and discussion
This section discuss the simulation results and analysis of the feature descriptor: The comparison analysis of two combination of filter was designed namely, • Tri-ocular and Bi-ocular filters (TB) • Tri-ocular, Bi-ocular and Dia-ocular filters (TBD).
1. We plotted an ellipse region on test image that made use of homography matrix, features are extracted in region of interest for correspondence matching.2. Repeatability score is obtained with respect to the feature matching between two images.3. The performance measurement of our DITF model is analyzed and compared with state-of-the-art descriptor.
The simulation results of our model, is implemented by anaconda3 python 3.9.12 in Jupyter notebook with CPU i7-11390H @3.40GHz.This section includes the discussion of "Correspondence and repeatability measurements" and "Results and discussion".

Correspondence and repeatability measurements
Region is a set of pixel with similar kind of properties.For accurate interpretation of image we plotted ellipse pair in reference and test image.Homography matrix of graffiti are given in Eq. (11).Feature extraction in ellipse region make use of DITF model.Figure 4 shows the graffiti view-point varaition of angle 30 • .All the features are overlapped and matched between two images in the overlap region.This shows the accuracy of our proposed model.The overlap error(ǫ ) is measured, to find the true correspondence in image.From Eq. ( 10) 'H' is homography, ' A' is the ellipse region of ground truth image and 'B' is the ellipse region of transformed image.The overlap error should not be greater than 50% for effective feature descriptor.
The key evaluation of image matching is Repeatability.The feature between reference image and query image need to be repeated or same for image matching.Matching is obtained from the overlapping between two image features.This provides the correspondence of the image.In accordance with correspondence value, we can calculate the Repeatability.Repeatability score shows the feature descriptor ability to measure the same feature points irrespective of imaging conditions with N number of times.If the repeatability score is high, then it has (8) www.nature.com/scientificreports/maximum overlapping between images.Repeatability score is measured by the ratio of number of correspondence to total number of features given in Eq. ( 9).
Correspondence matching is an essential factor in CV applications namely, retrieving camera pose, 3D reconstruction, image classification, image stitching, tracking and image registration.Therefore, DITF model evaluates the correspondence, which is necessary for AR Tracking.

Simulation results and discussion
This section analysis the repeatability score of view-point variation, blur image and illumination variation using Oxford Dataset.The feature vector is measured based on two ways as mentioned in section Results and Discussion.
(1) The first measurement of feature vector utilizing both Tri-ocular and Bi-ocular filter (TB).( 2) The second way of feature vector is measured by combining Tri-ocular, Bi-ocular and Dia-ocular filter (TBD).For ideal case, the repeatability score should be 100%.Indeed, the score is decreases due to the mismatch arises in feature matching between the reference and query image.
Figure 5 shows the standard reference image and test image of view-point variation.The DITF features of reference image is compared with 40 • variation, is shown in Fig. 6.From the output, maximum number of features are extracted in both the images, which shows the retrieval of the shape of the feature in view-point variation.Figure 7 displays the light intensity variation of car image.Light intensity is decreased in the car image and tested the illumination-variation of DITF model.Figure 8 illustrates the DITF features of light variation.Usually, feature descriptor struggles to extract the feature in light variation.However, our model owing three filters, which improves the feature extraction in light illumination.The Figs. 9 and 10 are the two examples of blur image.Two different types of images are included for the blur variation analysis.Bike image has more edges and tree image consist of smooth surface with repetition of the scenes.The blur variation occurs due to the relative motion between scene and camera.The Figs. 11 and 12 shows the feature extraction of blur image.Even though, both the images are in different in nature our model predicts the feature correctly.This is concluded from our visual feature representation of image and also we validated this result by repeatability score of DITF model which is illustrated in Figs.13b and 14b. Figure 15 is another example of view-point variation.It contains more edges with homogeneous region and the output is shown in Fig. 16.Almost, all the features are extracted and replicated between the reference and query image.Result proves the robustness of the DITF is high in three transformations with different images.

View-point variation
The mandrill image simulation curve is plotted between Repeatability score and angle.The repeatability score of TB filter is 99.5% and TBD filter achieves the score of 99.75%.Figure 17a shows the repeatability curve of mandrill image for the view-point variation.This result shows TBD achieves better result than TB filter.TBD (9) Repeatability = Correspondences Total features x          Algorithm 2 Generating feature vector using DITF feature descriptor.
for illumination variation in both the images.Although, the light intensity is decreased DITF matched all the features between reference and test image.It proves our descriptor is more robust in light invariance.

Blur Image
For the blur variation three different input image is taken for measurement.The bike, car and tree image.From the results, we can conclude the measurement of TB and TBD for view-point variation, blurred image and lighting variation.Results of View-point variation shows TBD is better than TB filter.However, blur changes and light variation obtain 100% score in both the filter including different images.This results revealed TBD is recorded high value than TB filter.It proves our DITF model repeatability score is above 99% for all the above mentioned transformations.Evaluation of our model proves our result achieves high accuracy.The computation complexity of DITF model measured, based on the arithmetic operation involved in the feature extraction process.The number of multiplications needed to compute G ij in Eq. ( 6) is 6k − 8 + 3 and number of additions needed is 6l − 8.
Assuming that arctan values are available as a look up table and the above calculated values are stored in a memory (so that can be reused), the number of multiplications needed to calculate H ij in Eq. ( 7) is 2 and number of additions needed is 1.As the needed computations are very less for H ij , it can be ignored.Therefore, when the entire image is considered, the total number of additions and multiplications required, while ignoring the least

Conclusion and future scope
In this article, we proposed a DITF based feature descriptor to improve the robustness against view-point variation, illumination changes and blur variation using Tri-ocular, Bi-ocular and Dia-ocular filter.It is verified that the DITF algorithm outperforms even when the images are of low resolution.It is noted that despite the low resolution, the algorithm extracted features with the repeatability scores of 100%, 100% and 99 % for the transformation, including, light variation, blur variation and view point variations respectively.We compared the DITF model with existing descriptors and verify that it achieved a significant percentage of improvement in precision and recall, when a few images from the data base is considered for the analysis.It is verified that the directional intensification with the help of tri filters can increase the robustness.However, the algorithm may not perform well when the image undergoes occlusion and scaling.These analyses open up a scope to address these issues in the future research.

Figure 4 .
Figure 4. Region of interest feature extraction for graffiti image.

Figure 7 .
Figure 7. (a) Original car image (b) Light intensity variation of car image.

Figure 8 .
Figure 8.(a) Feature extraction of car image (b) Feature extraction of Light intensity variation.

Figure 9 .
Figure 9. (a) Original bike image (b) Blur variation of bike image.

Figure 10 .
Figure 10.(a) Original tree image (b) Blur variation of tree image.

Figure 11 .
Figure 11.(a) Feature extraction of bike image (b) Feature extraction of blur variation.

Figure 12 .
Figure 12.(a) Feature extraction of tree image (b) Feature extraction of blur variation.

Figure 13 .
Figure 13.Repeatability score of car and bike image.

Figure 14 .Figure 15 .
Figure 14.Repeatability score of blur image for car and tree image.

Figure 16 .
Figure 16.(a) Feature extraction of wall image (b) Feature extraction of View-point variation.
Figure 13b illustrate the score of bike image for TBD and TB filter measurement.It achieves 100% score.Similarly, car and tree image also obtain the 100% repeatability score and it is shown in Fig. 14.This proves our model outperforms the state-of-the-art descriptors in compression.

Figure 17 .
Figure 17.Repeatability score of mandrill and graffiti image.