Content-based image retrieval of Indian traditional textile motifs using deep feature fusion

In the fast-paced fashion world, unique designs are like early birds, grabbing attention as online shopping surges. Fabric texture plays an immense role in selecting the perfect design. Indian Traditional textile motifs are pivotal, showing rich cultural origins and attracting worldwide art fanatics. Yet, technology-driven abstract forms are posing a challenge for them. The decline of handmade artistic ability due to computerization is concerning. Crafting new designs associated with the latest trends is time- consuming and requires diligence. In this work an interactive CBIR (content-based image retrieval) system is presented. It utilizes deep features from InceptionV3 and InceptionResNetV2 models to match query designs with a database of traditional Indian textiles. Its performance is tested with Caltech-101, Corel-1K state-of-the-art datasets, and Indian Textiles datasets and the results are shown to be finer than the existing approaches. The similarity-based fine-grained saliency maps (SBFGSM) approach is employed to visualize the importance of features. Our approach combines deep feature fusion with PCA dimensionality reduction and speeds up search using a clustering approach. Relevance feedback is employed to refine the retrievals. This tool is expected to benefit designers by accelerating the design cycles by bridging the gap between human creativity and A.I. assistance.


Content
Its performance is tested with Caltech-101, Corel-1K state-of-the-art datasets, and Indian Textiles datasets and the results are shown to be finer than the existing approaches.The similarity-based finegrained saliency maps (SBFGSM) approach is employed to visualize the importance of features.Our approach combines deep feature fusion with PCA dimensionality reduction and speeds up search using a clustering approach.Relevance feedback is employed to refine the retrievals.This tool is expected to benefit designers by accelerating the design cycles by bridging the gap between human creativity and A.I. assistance.
India has a rich cultural and artistic heritage with diversity in roots.The traditional Indian textile styles are admired worldwide.These include the "Madhubani Style" from Bihar, "Kalamkari Style" from Andhra Pradesh, "Ajrakh Style" from Gujrat, "Bagh Style" from Madhya Pradesh, "Kashida embroidery style" from Kashmir, "Chikankari embroidery style" from Uttar Pradesh, etc. Innovative new creations are necessary for preserving these diverse art forms as competition has intensified the variety in the market with reduced time to market.However, methodologies adopted in traditional designs suffer from low productivity, causing a substantial time to market.Fashion designers combine traditional designs with their modern ideas to boost the acceptance of the gen-next.Technology intervention is a must to alleviate this problem.
Works done on classifying and identifying textile images using visual features are very few 1 .An instantly searchable database of existent patterns can go a long way in helping designers to produce new designs rapidly.It will also benefit the e-commerce industry and shoppers seeking newer patterns.Hence, it becomes essential to create database systems for retrieving desired textile patterns from image databases accurately and conveniently.Commonly, two types of approaches are used for textile image retrieval.The first is keywords/text-based image retrieval (KBIR).In this approach, designs are manually annotated with keywords reflecting the contents of the images 2 .The keywords are used as keys to build indexes.Users explore for related images by specifying the appropriate keywords.Searching for specific designs is challenging despite-having high retrieval speed because of limited expression ability of keywords.The attributes of traditional Indian art form fabrics are also tough to describe.In fact, keywords are powerless for some details and features that are difficult to describe.Moreover, manual labeling of fabric images is also substantially subjective, leading to uncertainty of retrieval results and the inaccuracy and inefficiency of KBIR.Manual annotation is expensive and time-consuming, limiting the efficacy of such attempts.These shortcomings in KBIR led to the advancement of retrieval approaches based on content, dubbed content-based image retrieval (CBIR), the second approach for textile image retrieval.
Compared with KBIR, CBIR is more objective [3][4][5] and uses image content to retrieve images to avoid the influence of human subjectivity on the result.It has gathered the attention of researchers across several disciplines, like fabric and fashion design 6 , art galleries, remote sensing, and medical imaging.The first commercial version of the CBIR system was created by I.B.M., named query by image content (QBIC) 7 .This system utilizes

Feature extraction using handcrafted methods
Handcrafted methods commonly adopt pixel-level descriptors, such as MPEG-7, image color histogram, histogram of oriented gradient (HoG) descriptor, color moment (CM), scale-invariant feature transform (SIFT) key point descriptor, Gabor, grey level co-occurrence matrices (GLCM), and local binary pattern (LBP) to fabric images.These methods heavily depend on feature engineering.Arora et al. 19 uses a support vector machine classifier for retrieving textile images, and Xiang et al. 20 utilizes a non-subsampled contourlet transform (NSCT) feature descriptor using a relevance feedback approach for patterned fabric image retrieval.
However, most of the researchers use a blend of two or more feature descriptors to represent fabric images and attain better retrieval accuracy than individual ones [21][22][23][24][25][26][27] .These methods are limited to small datasets.Slight jitters in scale or details significantly affect the retrieval results and demonstrate the necessity for more robustness in these methods.Color features are susceptible to illumination, while shape and texture features are susceptible to geometrical shifts.This is the reason that high-level features are also needed, and low-level features like pixel values and others are not enough. www.nature.com/scientificreports/

Feature extraction using automatic learning-based methods
In the last decade, a shift has been observed in feature representation from hand-engineering to deep learning.Deep learning is a hierarchical feature representation technique to learn abstract features from data, that are essential for the dataset and application at hand.This section discusses automatic feature learning-based methods.The CNN feature representation pipeline is depicted in Fig. 1.CNNs require large amounts of data.Therefore, training it on large datasets provides the requisite knowledge base to identify objects.A deep learning network performed outstanding retrieval in the ImageNet challenge 13 .The basic CNN model motivated other deep learning-based approaches, such as AlexNet, VGGNet, GoogleLeNet, Microsoft ResNet, etc., in the image retrieval domain.
Previous studies 28,29 have trained CNN models for image retrieval systems of wool fabric using classified search, demonstrating the ability of CNNs to learn binary codes and features from labeled data.Whereas Sun et al. 30 integrate CNNs and hash encoding to reduce feature dimensions and computation time for fabric image retrieval.Zhang et al. 31 have presented aggregated convolutional descriptors and approximate nearest neighbours search approach to combine texture and colour features for wool fabric retrieval on a dataset of 82,073 wool images.Prasetyo and Akardihas 32 used a CNN for retrieving Batik images on a small dataset, and Deng et al. 33 proposed a focus ranking approach integrated with CNN for fine-grained fabric image retrieval.They produced a dataset of 25,000 fabric images from 4300 original images.Tena et al. 34 proposed a Modified CNN model for a more accurate search of ikat woven fabrics on a dataset of 4800 images.Cui and Wong 35 introduced a joint local PCA-based 2D color and 2D orientation feature descriptor for textile image retrieval, surpassing histogram features on a 1000 stripe, plaid, and pattern images dataset.Maji and Bose 36 proposed a pre-clustering approach in CBIR using deep learning features on datasets like Caltech-101, Corel-1K, and DB2000 without using humans in the loop.
Limited efforts has been laid into visualizing the feature's explainability in interactive CBIR.Rui et al. 37 introduced the relevance feedback method for enhancing the retrieval process's explainability.Imo et al. 38 presented the visualization of color histograms and texture features to give the user an idea of what they have specified.

Methodology
Pre-trained networks are models trained on large data sets and can be utilized as a starting point for specific tasks.They save time and resources as they can be adjusted for better accuracy and speed in computer vision and natural language processing.This paper represents a feature fusion approach for obtaining features from images by fusing the strengths of multiple pre-trained deep-learning models.Each model has learned distinctive features and representations from various data sets and tasks.By combining their abilities, we can tap into the unique information extracted by each network, resulting in a more comprehensive and distinct set of features.This approach allows us to capture a broader range of patterns and structures in the input data, thus enhancing the richness of our analysis.
To lay the foundation, we provide a concise overview of essential concepts like convolutional neural networks and pre-trained models in this section.

Convolutional neural network
Convolutional neural network (ConvNet/CNN) is the frequently used deep learning algorithm.It can take input images, assign importance (learnable weights and biases) to aspects and objects in the image, and distinguish between them.The layers of CNN have neurons arranged in 3 dimensions: height, depth and width.The word depth implies the 3rd dimension of the layer's activation volume.A layer's neurons are linked to a small region of the preceding CNN layer rather than all of them, unlike in a completely linked neural network.Thus, a CNN comprises multiple layers, and each layer transforms activation volume from one to another via differentiable functions.Their essential components [(convolution, pooling, fully connected layer, and some activation layers (e.g., ReLU, softmax, etc.)] operate on local input regions and depend only on relative spatial coordinates, which is impossible with conventional neural networks.CNNs are recognized for their weight-sharing and local connectivity characteristics 11 .These two characteristics permit the CNNs to act like local filters and to detect the same pattern in more than one part of the image with lesser trainable parameters, reduce the model's memory requirement, and improve the model's statistical efficiency.www.nature.com/scientificreports/ is a common practice.Canziani et al. 42 conducted a comprehensive performance analysis of pre-trained models using ImageNet data in computer vision applications.Transfer learning, a widely used application of pre-trained models in computer vision 43,44 , leverages prior learning through these weights, leading to substantial time savings compared to starting training from scratch.Moreover, it often yields significantly better results.This study's motivation lies in using the transferred knowledge, represented by the layer weights of pre-trained CNN models, as feature extractors.All convolution and pooling layers are frozen, requiring no further training.To determine the output class or value, fully connected (F.C.) layers are removed, and softmax classifier layers are added above these features.Fine-tuning the F.C. layers means utilizing these layers and the knowledge learnt from the source domain dataset (ImageNet) to fit to the target domain dataset (TIAD).As a result, the F.C. layers serve as the classifier, initialized with pre-trained weights.Thereby, it expedites training and facilitates quicker convergence.

Proposed method used a deep feature fusion for feature extraction
The model architecture starts learning high-level (abstract) features from low-level features as it becomes more intense.To represent images in a CBIR system, we use higher-level features fused from multiple models.The working of retrieval systems depends immensely on the quality and discriminative power of the features extracted from images.Our research targets to retrieve images correctly.For this, we merge information from multiple models to boost retrieval accuracy.In addressing our research questions, we design a fused deep learning approach to automatically retrieve images in our proposed CBIR system, leveraging the strengths of pre-trained CNN models-InceptionResNetV2 and InceptionV3.
Our selection of InceptionResNetV2 and InceptionV3 as foundational CNN models for our CBIR architecture is a crucial starting point.This choice is not arbitrary; it stems from prior research and experiments explained in the subsequent section that have demonstrated their effectiveness.By concatenating the features from both networks, we achieve enhanced model performance, improved representation capabilities, robustness, and the ability to leverage distinct viewpoints for better understanding and generalization.This fusion produces a complementary feature representation, which can boost the overall model performance.Moreover, it helps minimize biases and limitations intrinsic to a single network architecture, resulting in more discriminative features and improved accuracy and generalization ability.We employ the concatenation method for feature fusion, directly merging the features from the networks.Each image's resulting feature vector has a sum of the dimensions of the fused features.We discard the softmax activation layer to ensure the most informative representation and select the preceding fully connected layer as our feature vector for CBIR.This vector takes the learned high-level features of the models.We encode the images in the CBIR database by employing a pre-trained model higher level features fusion and obtain an (n+m)-dimensional feature vector for each image.The value of n and m varies with the deep learning network architecture selection.Here, the output features are generated from the Incep-tionResNetV2 and InceptionV3 network models with dimensions of 1536 and 2048, respectively.The benefit of this method is that it extracts higher-level features without relying on class information from our database.To avoid the laborious task of manually classifying images in our dataset, we use a pre-trained neural network model, which is trained on an independent dataset (ImageNet) for feature extraction.Figure 2 illustrates the flowchart of our feature fusion process.

Dataset description
This dataset is an extended version of an earlier work 17,18 .Significant efforts have been laid into enhancing the size of datasets of Indian traditional art forms and their subclasses by gathering information from various connoisseurs at Taj Mahotsav (Agra), Delhi Haat (Delhi), and from websites such as FABCU RATE, Matka tus, Pinte rest-India, Sansk ruti Yards of Tradi tion, Mandir, DEEPAM, and iMith ila, etc.
To further confirm the outcome of the recommended approaches, experiments are also performed on standard datasets available in the literature.Publicly available benchmark CBIR datasets taken in this work are as follows.

Similarity measures
Once the feature vectors for total images in the database are computed and normalized, the task is to find the relevance of each image in the database to a provided query image.The most pertinent images are then retrieved as the final query result.The similarity (or dissimilarity) between a query image (Q) provided by the user and an existing image from the system database is measured by some distance metric.In this section, the following lists the similarity or dissimilarity measures we considered in our research.Let Q denote the vector ( representing the query image and R the vector ( R 1 , R 2 , ..., R n ) representing another image.Further, let Q represent the mean of the values in the Q vector and R the mean of R. Further, let q and r represent, respectively, the cumu- lative distributions of Q and R when they are considered as probability distributions ( www.nature.com/scientificreports/ the images and i is the ith feature value of the database and query image.Finally µ = (µ 1 , . . ., µ n ) is the mean vector such that µ = Q+R 2 .

Evaluation of performance
We calculate the performance of the CBIR system using precision and recall as a measure, which is defined as follows.
(1)  Precision is the ratio of true positives to the total number of retrieved images.It represents the accuracy of the CBIR system in retrieving relevant images.Normally, the number of images retrieved by any CBIR method is a pre-specified positive integer.It is termed as the scope of the system.Precision value is computed for each image in the database, and these values are averaged over all images.Usually, the greater the scope, the more significant the number of relevant images retrieved, leading to decreased Precision.
Recall is another performance measure in CBIR systems that evaluates the ability of the system to retrieve relevant images from a given query.It represents the ratio of relevant images retrieved to the database's total number of relevant images.Higher recall values indicate better system performance in retrieving relevant images.

Results
This section describes the choice of the preeminent pre-trained models employed for fusion, the selection of the most effective similarity measure for our fusion architecture utilizing deep learning network features, the retrieval results of our CBIR system for the selected query images extracted from our datasets, and the precision and recall of image retrieval organized by categories within our dataset.

Model selection
We have experimented with various pre-trained model architectures and found that InceptionResNetV2 and InceptionV3 models perform exceptionally well on our TIAD dataset, shown in Table 1.This approach improves the accuracy efficiency and saves time for image retrieval in our database.

Selection of best similarity measure
Various types of distance measures are employed to determine the similarity or dissimilarity between images in the CBIR system.We took 1500 random images from each class, applied these images one by one, and retrieved the top 20 images.Then, determined the average precision for every class.The results shown in Table 2 show that the Manhattan City block distance measure is the winner with a 92.46% average precision value, the bestretrieved category is Kalamkari (precision: 95.0%), and the worst category is Chikankari (precision: 89.12%).The results showed that Manhattan City block distance and Tanimoto coefficient distance measure provided better results than Euclidean and Jeffrey distance measures.Therefore, we use the Manhattan City block distance measure for all the succeeding experiments.www.nature.com/scientificreports/

Sample query image retrieval
For the scope of 20, using the deep feature fusion architecture on the TIAD Dataset, the sample query image retrieves 20 results as depicted in Fig. 3a.Human experts have manually evaluated and annotated these images based on the defined relevance criteria that serve as our ground truth.
From retrieval results, we find that the query images are from the "kalamkari" category, and all 20 results are related to the query image.Hence, the precision for this query image is 1.The total number of relevant images in the dataset is 40, so the recall for this query image is 20/40 = 0.5.
For one more query image from the TIAD dataset, the retrieved results are depicted in Fig. 3b.Here, we observe that the query image falls in the "Chikankari" category; out of 20 retrieved results, 16 are related to the query image.The total number of relevant images in the dataset is 60.Hence, for this specific image, the precision value is 16/20 = 0.80,and the recall value is 16/50 = 0.32.

Class-wise average precision and recall calculation
In this subsection, we generate the class-wise average precision and recall on TIAD, Corel-1K dataset for a scope of 20 using Manhattan City-block Distance as a similarity metric.Table 3a shows that for the Kalamkari class

Comparing results with other authors' proposed algorithms
For dataset Corel-1K and Caltech-101, we select Maji and Bose 36 paper as the baseline result.This paper 36 extracted deep features from the images using InceptionResNetV2 CNN without using Relevance feedback.We are comparing the precision and recall of this paper 36 with ours.Many works 36,[51][52][53][54][55][56][57][58][59][60] have been done on Corel-1K dataset, extracting various features and similarity distances.Figure 4a indicates that our recommended method is more accurate than the discussed methods in Maji and Bose 36 .
For the Caltech-101 Dataset, we took the average precision and recall of the finest methods applied in paper 36,53,54,61,62 .Results are depicted in Table 4b.The recommended method outclassed other methods.

Simulated visualization using similarity-based fine-grained saliency maps
This section discusses the vital role of fusion features and introduces a new and advanced method called similarity-based fine-grained saliency maps (SBFGSM) in our innovative content-based image retrieval system.This unique technique visualizes the crucial features within an image and showcases remarkable superiority over individual models.
In this part, we discuss essential fusion features and introduce our new and advanced approach called similarity-based fine-grained saliency maps (SBFGSM) in our content-based image retrieval (CBIR) system.This technique helps us see the essential parts of an image and is much better than using separate models.
Fusion features play a critical role in addressing the limitations of single-model-based CBIR systems.Some key reasons why fusion features are crucial: 1. Enhanced discriminative power: fusion features integrate complementary information from different sources, thereby increasing the discriminative power of the retrieval system.By combining diverse aspects of image content, we can capture a broader range of visual cues and semantic information.2. Robustness to variability: single models may exhibit limitations in handling variations in image content, such as lighting conditions, viewpoints, and occlusions.Fusion techniques help mitigate these limitations by aggregating information from multiple models, making the system more robust to diverse scenarios.3. Improved retrieval accuracy: fusion features enable more effective matching between query and database images.By incorporating different modalities or representations, the retrieval system can better align with the user's intent, improving accuracy in retrieving relevant images.
To demonstrate the advantages of fusion features, we utilize similarity-based fine-grained saliency maps (SBF-GSM).This technique leverages the following principles: 1. Saliency-driven fusion: our method computes SBFGSM for each image, highlighting regions of interest based on their relevance to the query.By integrating these maps with the features from InceptionV3 and InceptionResNetV2, we create fusion features driven by the images' visual saliency.2. Enhanced retrieval relevance: the fusion features generated by our approach exhibit improved retrieval relevance compared to using individual models alone.The visual saliency maps guide the fusion process, emphasizing semantically significant regions, leading to more accurate retrieval results.
Our proposed similarity-based fine-grained saliency map (SBFGSM) approach can explain why a black-box CNN features a fused model (here, IRV2 and IV3 fusion), makes retrieval decisions by generating important region perturbation saliency maps for each decision [63][64][65][66] .A fine-grained saliency map refers to a detailed and localized representation of the most significant and visually distinct features within an image, allowing for precise analysis and understanding of specific regions or objects of interest and indicating how a particular region on the retrieved image impacts the similarity.However, a classification-based saliency map explains why a particular class label was assigned to an image, while a CBIR-based saliency map explains why specific images were considered similar to a query image during the retrieval process.Our SBFGSM measures how result regions contribute to the CBIR's distance metric when computing similarity.In simple terms, the SBFGSM can be considered a heatmap in which brighter regions signify a higher contribution to the match score with the query, whereas darker areas have less impact.We measure the the significance of a retrieved image region by applying a binary mask to block out the region of concern that perturbs it and observe how much this affects the black box decision.Inside the binary mask, the region of concern is 0; all other pixels have 1.We use a square block.By sliding the square block over the retrieval image by a stride step, we can show the blocked areas' importance in impacting the similarity.Given a query image Q, a retrieval image R, ⊙ denotes element-wise multiplication, I is a matrix with all entries are 1 and the same shape as www.nature.com/scientificreports/two vector and a square binary mask m i ∈ M , the importance of the region-blocked out by m i is depicted as conveyed in Eq. ( 7): In this, N is the top N retrieval image returned by the CBIR, and M is the set of the binary mask of the retrieved image with N binary masks.The succinct pseudo code of this approach is as follows.
Algorithm 1.Our objective of using the SBFGSM approach is to enhance the interpretability and visualization of the features extracted by feature fusion CNN architectures, thereby improving the transparency and user understanding of the CBIR system.We present the demonstration of the effectiveness of our fusion features, including the similarity-based fine-grained saliency maps, in Comparison to using InceptionV3 and InceptionResNetV2 as standalone models to showcase the superiority of our approach, shown in Fig. 4. It is clearly shown from Fig. 4 that more feature information is retrieved in our fusion approach, as visualized by our proposed SBFGSM approach, and brighter retrieved image information is obtained as compared to standalone models.

Quick response CBIR system
Combining different types of pretrained models like InceptionResNetV2 and InceptionV3 has improved our results.However, we now need to see how quickly we can retrieve images, as we have a substantial dataset with many features.So this section, is about the pace of our CBIR system.We will indicate that it can find images for the TIAD and Caltech-101 datasets.We use principal component analysis 67 and clustering to make this process faster without sacrificing accuracy.We are not calculating the time of image retrieval for Corel-1K because its dataset is small, and the result would not be meaningful.

Principal component analysis (P.C.A.)
It is a dimensionality-reduction technique used to trim down many options into a limited subset that retains the bulk information in the primary data by lowering the number of possibilities.P.C.A. aims to decrease the information in a data set while retaining maximum information to the extent possible.A dimensionality decrease technique entails sacrificing some information for ease, as smaller data sets are easier to handle, visualize and quicker for machine learning algorithms.

P.C.A. on concatenated deep features
A fused CNNs (IRV2 (extracted 1536 features) and IV3 (extracted 2048 features)) concatenated feature dimension is 3584, which is significant.So, we applied P.C.A. on the 3584 feature vector to lower its dimension and choose the number of primary components (M) with maximum average precision value.For the TIAD dataset, we are taking 1024 PCs, and for the Caltech-101 dataset, we are taking 100 PCs to resolve the precision.The first handful of P.C.s have approximately same or sometimes finer average precision for the 3584 features.For the TIAD dataset, the Average precision value with P.C.A. is 92.95%, and the average precision without P.C.A. is 92.46%.For the Caltech-101 dataset, the Average precision value with P.C.A. is 83.24%, and the average precision without P.C.A. is 83.24%.
In this approach, we attempted to analyze the time of average query image retrieval for the top 20 images on TIAD and Caltech-101 datasets.To demonstrate the working of P.C.A. employed in this work, we first train all database images through the CBIR fusion model (without the last softmax layer) and P.C.A., respectively.After that, we store these extracted features of dimension 1024 for TIAD and 100 for the Caltech-101 dataset of each image in memory as a feature bank.When a query image appears, it goes through the CBIR fusion model and P.C.A., respectively.Then, the features extracted from the query image are matched with each feature list in the feature bank.It ultimately retrieves those images with features closest to the query image features evaluated by Manhattan Distance.So, the time between supplying the query image followed by retrieval of similar images is termed image retrieval time.Table 5 shows this average image retrieval time.This can be clearly seen in Table 5 that using P.C.A. has lowered the retrieval time a bit.

Indian style Navratan clustering approach
Based on our previous discussion, image retrieval times increase as database size increases.We present an approach for speeding up the retrieval of images to address this issue.Our approach involves clustering images www.nature.com/scientificreports/ in the database and searching for images within specific clusters.Make a cluster of individual classes because each class has a unique texture to distinguish it from others like chikankari uses white color threads in its style.
In contrast, Kalamkari contains natural color block printing and a pen for creating designs.To reduce model confusion, we only reduce the feature space to a specific class.In this study, we are using nine unique traditional styles popular worldwide.That's why we name it "NAVRATTAN STYLE CLUSTERING".To implement this method, we utilized the pre-trained CNN models InceptionResNetV2 and InceptionV3.Initially, these models were trained on ImageNet dataset to predict 1000 classes.However, we are fusing these models' last convolution layer features by selecting the last dense layer for feature extraction and removing the last softmax layer.This method has been effectively implemented to the TIAD dataset.Figure 5 demonstrates the schematic flow diagram of the recommended NAVRATTAN STYLE clustering-based image retrieval.
The approach is illustrated below, step by step.
1. Firstly, we compute the last fully connected (fc) layer feature extraction, obtained from the concatenation of the last convolution layer features of both CNNs and train the fused model by transfer learning approach using 2048 neurons and 0.3 drop-out layer.2. This newly trained fused model predicts the class C q of the query image I q .9 , where, FS C i is the set of feature vectors of all images belonging to the ith class of Z n . 5. In the next step, the predicted class information C q of query image I q is put to lower the feature space size FS n .6.To check the condition: • if (query_class_label== prediction_value): www.nature.com/scientificreports/-then specific class cluster (C q ) selected for reduced feature space ( FS N ) .This reduced feature space ( F.S. N ) contains deep feature vectors in the images resembles to the predicted class only.The reduced feature space FS N contains feature vectors of ∀I j ∈ C q where, C q ⊂ Z n .Thus, the lowered feature space FS N is defined as: FS N = FS C q , where, q ∈ 1, 2, 3, • • • , 9 .As a result, the lowered feature space FS N contains drastically lesser feature vectors in contrast to the FS N .
-Retrieve Top 20 identical images from FS N using Manhattan city block distance measure.
• else -No cluster be selected -Retrieve Top 20 identical images from FS N using Manhattan city block distance measure.
7. As a result, the classification clustering drastically reduces the image search space based on the semantic nature of the clusters.8. Retrieval time has been further reduced by applying the P.C.A. approach on reduced query vectors in 9 classes.9.This approach saves little retrieval time, but it extracts more semantically analogous images in the retrieved output.

Approach
The assessment of the outcome of this work is done on the TIAD dataset.Figure 6 depicts the retrieval outputs in comparison between the proposed clustering and the previous method for the "Bagh" category query image.Human experts have manually evaluated and annotated these images in reference to the defined relevance criteria that serve as our ground truth.It is clearly evident that the suggested clustering extracts more semantically identical images in the retrieved output than the previous approach.The retrieval efficiency for this specific query in the prior approach is 8/20 = 0.40, whereas for the clustering approach, it is 1, displayed in Fig. 6a,b.Table 6 shows the faster retrieval time.This work has been inspected by both ways, i.e. with P.C.A and without P.C.A. From the time we feed the query to the system until we get the retrieved images is the retrieval time of an image.The process is kept repeated, treating every database images as query images.We noted down the retrieval time for every images and finally took the mean to determine the mean image retrieval time.Precision for the NAVRATTAN clustering retrieval method is 95.18%, improved from the earlier method's precision of 92.46%.However, decrease in image retrieval time is significant, almost 1.47 times faster.The fused model  www.nature.com/scientificreports/predicts identical images within the similar classes, forming clusters of identical images.Therefore, we still obtain enhanced results even when searching within a smaller subset.

Relevance feedback using Manhattan City block distance measure
The relevance feedback (R.F.) concept originated from documentary information retrieval 68,69 .It has gotten much attention in the CBIR field, e.g., (Zhou and Huang 70 ), since past few years.A relevance feedback mechanism is an additional tool for lowering the angle between user relevance and system relevance by giving a clearer vision of the user expectations and adjusting the inside system behavior to bridge the semantic gap.In our research, we attempted the Manhattan City block distance matching measure for ranked images displayed to a user.The user can record his feedback by marking interesting images as relevant, and the remaining images are inevitably considered irrelevant.This process is carried out for some iterations and stops when the user is convinced with the displayed results.The succinct pseudo code of this approach is as follows.

Approach
CBIR system suggested by Maji and Bose 36 , which uses InceptionResNetV2 (IRV2) features, is without relfeedback.We have attempted IRV2 and IV3 features with relevance feedback on our TIAD dataset and two publicly available datasets as a baseline.After that, we compared this with our proposed NAVRATTAN clustering approach.The precision measure is employed to evaluate retrieval performance, also termed as retrieval efficiency.Table 7 depicts the steady increase in retrieval efficiency of the TIAD dataset over 4 R.F. iterations.
Tables 8 provide details of the regular increase in retrieval performance over 3 R.F. iterations on the 2 databases listed as above, using the baseline and the proposed methods with the Manhattan distance.Figure 7 and exhibits that our recommended method betters the methods discussed in Maji and Bose 36 on two publicly available datasets.The retrieval efficiency corresponding to the most notable retrieval performance for each database is highlighted in bold in the respective figure.

Conclusion
Using CBIR in Indian traditional textile motifs is crucial for preserving and promoting the rich background and cultural significance of Indian art forms.CBIR allows for the precise analysis and identification of intricate and detailed motifs, which is essential in Indian art form design.
We have created an expanded dataset of Indian Traditional Art forms, and our proposed approach of interactive CBIR is tested on this expanded dataset as well as on standard benchmark datasets.This study determines that utilizing a pre-trained fused model's last layer features returns more precise "precision results" and "recall results" for CBIR than traditional methods such as C.C.M. and wavelet.The similarity-based fine-grained saliency maps (SBFGSM) algorithm has been proposed to display the significance of fusion features compared to single model features.
Additionally, we integrate several strategies to optimize CBIR retrieval efficiency and speed, in which the relevance feedback plays an important role.facilitates more effective retrieval scores by permitting the user to provide feedback.This transparency them by fostering user-friendly conditions for selecting relevant images in CBIR.With the help of Navrattan clustering, our interactive CBIR system has also tested on dataset image space and reducing features.This technique reduces retrieval time and efficiency in Indian art form design.Our proposed methods have broader applicability for their employability in other datasets and models to examine similar issues using saliency maps for image similarity analysis.
We are working on retrieving the same images with the same query image but with different orientation angles.Different datasets may have varying characteristics, such as image resolution and quality.The effectiveness of the content-based image retrieval (CBIR) methods proposed in the study could be influenced by these variations.In the future, we will introduce this feature in our interactive CBIR system.
-based image retrieval of Indian traditional textile motifs using deep feature fusion Seema Varshney 1* , Sarika Singh 1 , C. Vasantha Lakshmi 1 & C. Patvardhan 2 In the fast-paced fashion world, unique designs are like early birds, grabbing attention as online shopping surges.Fabric texture plays an immense role in selecting the perfect design.Indian Traditional textile motifs are pivotal, showing rich cultural origins and attracting worldwide art fanatics.Yet, technology-driven abstract forms are posing a challenge for them.The decline of handmade artistic ability due to computerization is concerning.Crafting new designs associated with the latest trends is time-consuming and requires diligence.In this work an interactive CBIR (content-based image retrieval) system is presented.It utilizes deep features from InceptionV3 and InceptionResNetV2 models to match query designs with a database of traditional Indian textiles.
5) Precision = Number of true positives Number of true positives + Number of false positives = Number of relevant images retrieved Number of retrieved images

Figure 2 .
Figure 2. CBIR image feature presentation with the help of deep feature fusion between pre-trained learning models.
Number of true positives Number of true positives + Number of false negatives = Number of relevant images retrieved Total Number of relevant images in the dataset

3 .
The final layer output of the trained fused model is utilized to extract the deep features of dataset images Ij ∈ Z n and the query image I q .4. The fused model is employed to construct a feature space FS n , of image dataset Z n .The image dataset contains n different images of 9 classes.It is represented as Z n = {C 1 } , {C 2 } , {C 3 } , . . .,{C 9 } where, C i represents the set of images, that allied to the ith class of Z n .The feature space FS n is represented as FS n = FS C 1 , FS C 2 , FS C 3 , . . ., FS C

Table 1 .
Comparison between pre-trained model's performance for a scope value 20 on the TIAD dataset, using Euclidean distance for similarity measure.

Table 3 .
Classin the TIAD dataset, the proposed fusion architecture retrieves the highest 95.00% precision and 26.46% recall.However, the Chikankari class reflects the lowest precision, 89.12%, and the Batik class retrieves the lowest recall 15.12%.The performance of the Kalamkari class indicates that the proposed architecture is very effective at accurately identifying and retrieving images relevant to that class.The overall mean precision for this dataset is 92.46%, and recall is 19.51%.Table3bshows that in the Corel-1K dataset, for Bus, Dinosaurs, Elephant, and Horse classes, the proposed fusion architecture retrieves the highest 100.00%precision, but for the African People class performs the lowest 83.65% precision.The highest average recall value for the Mountain class is 20.84%, and the lowest is 18.60% for the African People class.This dataset's overall mean precision and recall are 96.99% and 19.79%.

Table 5 .
Compare the retrieval time with/without P.C.A. on (a) TIAD, and (b) Caltech-101 datasets.The smallest average retrieval time value among the methods compared are in bold.

Table 7 .
Proposed relevance feedback approach retrieval efficiency performance on the TIAD dataset.The highest average retrieval efficiency value among the methods compared are in bold.

Table 8 .
Proposed relevance feedback approach retrieval efficiency performance on the Corel-1K, and Caltech-101 datasets.The highest average retrieval efficiency value among the methods compared are in bold.
Figure 7.Comparison of CBIR methods performance on recommended approach and other methods.