An effective AI model for automatically detecting arteriovenous fistula stenosis

In this study, a novel artificial intelligence (AI) model is proposed to detect stenosis in arteriovenous fistulas (AVFs) using inexpensive and non-invasive audio recordings. The proposed model is a combination of two new input features based on short-time Fourier transform (STFT) and sample entropy, as well as two associated classification models (ResNet50 and ANN). The model’s hyper-parameters were optimized through the use of the design of the experiment (DOE). The proposed AI model demonstrates high performance with all essential metrics, including sensitivity, specificity, accuracy, precision, and F1-score, exceeding 0.90 at detecting stenosis greater than 50%. These promising results suggest that our approach can lead to new insights and knowledge in this field. Moreover, the robust performance of our model, combined with the affordability of the audio recording device, makes it a valuable tool for detecting AVF stenosis in home-care settings.


Literature review
Numerous research studies have endeavored to create an automated mechanism that can detect stenosis in AVFs through the use of non-invasive tools that capture vascular sound.The earliest works in this field date back to references [3][4][5][6] .More recent studies since 2014, which include fourteen published papers on automated detection of AVF stenosis, are outlined in Table 1.
To ensure comprehensiveness, the present article is included in Table 1, appearing at the bottom of the table with the new suggestion highlighted in bold.In instances where the corresponding information is not explicitly presented in the cited paper, Table 1 represents this with a hyphen "−"' .Table 1 contains several columns, where C1 to C5 represent the paper number, referenced index, publication date, number of patients, and the total number of AVF-stenosis (S) and normal (N) cardiac cycles.The degree of AVF-stenosis is categorized differently in various studies. 16categorized it into six types (T1-T6), while 17 classified the collected vascular sounds as 5 types: normal, hard, high, intermittent, whistle; denoted as V1−V5 respectively.
C6 and C7 of Table 1 represent the adopted filtering method and input features used in the studies, respectively.Traditional filtering (TF) and Empirical Mode Decomposition (EMD) were the two filtering methods used.The input features were categorized into three types: image (1), texture (2), and both (3).Our study stands out from the fourteen publications listed in Table 1 as we used a unique approach by combining two types of input features, potentially leading to new insights and knowledge in this field.
C8 of Table 1 displays the data transformation methods used in the studies.The listed abbreviations include S-transform, MEM (maximum entropy method) 20 , PCA (principal component analysis) 21 , FT (fourier transform) 22 , FFT (fast fourier transform) 22 , STFT (short-time fourier transform) 22 , WT (wavelet transform) 22 , CWT (Continuous wavelet transform), and LPC (Linear predictive coding).It is important to note that the STFT used in this work and reference 9 was presented as an image, whereas in reference 9 , it was presented in the form of a texture.
C10 of Table 1 displays the use of the DOE method.It is important to note that none of the listed fourteen publications mentioned using the DOE method to optimize the hyper-parameters of the adopted models.C11 to C15 represent the five performance measures adopted in this study, along with their associated standard errors (SE): sensitivity, specificity, accuracy, precision, and F1-Scores (F1-S), respectively.It's worth noting that SE is utilized as a quality measure for assessing the accuracy of these performance estimates.A different technique for conveying the significance of digits in these estimated performance values involves the utilization of the leading digit rule (LDR), as explained in the references cited under 30,31 .The information provided in  www.nature.com/scientificreports/demonstrates that, with the exception of this article, none of the other 14 references include SE in their reporting of these performance measures.
Although study 19 reported a high sensitivity of over 0.95 for venous outflow stenosis detection, the lack of specificity and potential selection bias due to a single interventional radiologist performing interventions on all studied data may weaken the reliability of the results.
The performance of the published works is influenced by the analysis methods used, as well as the type of non-invasive audio recordings used.Study 18 , for instance, utilized a pulse radar sensor, which is different from our approach using an inexpensive audio recording device.However, the purpose of this paper is to evaluate the effectiveness of the adopted analysis methods, rather than examining the impact of different non-invasive audio recordings.

Proposed methodology
The process of automatically classifying AVF-stenosis typically involves three steps: (A) pre-processing, (B) features extraction, and (C) classification using deep learning models.Pre-processing, involves preparing the data for analysis.In Step (B), relevant features were identified and extracted from cardiac-cycles, obtained in Step (A).These features can be divided into two categories: (1) visual features represented as images, and (2) textual features represented as real numbers.Table 1 shows that most of the published research on this topic has focused on textual features, with only one study, 19 , utilizing visual features.In Step (C), associated classification models based on the two types of features were selected.The final binary results were designated as stenosis (S) or normal (N). Figure 4 illustrates the general three-step process in black, with the specific methods adopted in our study highlighted in red.We used Python version 3.6.10 in our analysis.

. , I
Goal: Remove noises.We adopted EMD, s (EMD) i Goal: Extract cardiac-cyales x(j), j = 1, 2, . . ., J In our study, J = 10 The proposed data pre-processing method consists of four tasks: normalization, EMD filtering, segmentation, and data augmentation.In detail: • Normalization: This task involves adjusting the range and distribution on the data to be within [0, 1].The normalized data, denoted as s i , is written as a function of the original data, denoted as s where a = min(s i , i = 1, 2, . . ., I) , and I = 22050 × 60. • EMD filtering: This task involves using the Empirical Mode Decomposition (EMD) 32 technique to extract useful information from the data.The proposed EMD filtering process, denoted as s (EMD) , includes two major steps: (1) decompose s into a finite number of intrinsic mode functions (IMFs, denoted as c i (j), j = 1, 2, 3, . . . ) and a residual.( 2) remove unimportant IMFs.The study in our work shows that remov- ing the first IMF provides the optimal performance; s • Segmentation: This step involves dividing the data into smaller section or segments for easier analysis.In this study, we extracted 10 cardiac-cycles from each signal s

Proposed input features
Two input features are discussed: the short-time fourier transform (STFT) and a modified version of sample entropy with optimized parameters (referred to as "proposed sample entropy").

Short-time fourier transform (STFT)
Before delving into the topic of STFT, let's briefly visit its foundation, the Fourier transform (FT), which transforms signals from the time domain x i (j) to the frequency-domain FT(k).
As a university teacher with 30 years of experience, I have noticed that despite students being able to write the equation of the FT, they often struggle to understand its true purpose and insight in signal analysis.This is due to the complexity of the mathematical equations associated with the FT.The metaphor of "juice vs. recipe", illustrated in Fig. 5, can be useful in explaining the purpose and insight of the FT. Figure 5a represents juice vs. ingredients (kiwi and strawberry, 2 ounces of each) and Fig. 5b represents time-domain signal vs. frequency-domain signal.Just as it is simpler to understand the components of a juice by analyzing each ingredient separately rather than the blended juice as a whole, the same principle applies to analyzing signals.Examining each frequency component individually is easier than trying to comprehend the signals in its time domain form.Now we will be focusing on STFT.The STFT is a sequence involving the FT of a windowed signal.where FT converts data from a time-domain to a frequency domain.Specifically, STFT first divides a longer time signal into shorter segments of equal length (window width) and then computes the FT separately on each shorter segment.The definition of the STFT with a window function w, and a window width m is shown in Eq. ( 3), The advantage of STFT allows us to perform time and frequency analysis in the same plot, in which the x-axis represents the time and y-axis represents the frequency .
To provide readers a quick view of STFT, we define a signal X t , t ∈ [0, 1.5] , as shown in Eq. ( 4).The plots of X t and its corresponding STFTs are depicted in Fig. 6(a) and (b).

Proposed sample entropy with optimal parameters
Sample Entropy (denoted as S E ), which has not been investigated in detecting AVF-stenosis, is defined in Eq. ( 5).
(1) www.nature.com/scientificreports/which is the negative natural logarithm of the conditional probability that A given B. The parameters A, B, three arguments ( z, m, r ), and related notations used in Eq. ( 5) are defined below.
• z contains the time domain data x that satisfies the transformed frequencies f ∈ (F L , F H ) , where F L and F H are pre-determined lower and upper parameters for the frequency domain signal, respectively.
, where g, h are coordinated elements in z (m) (i) and z (m) (j) , respectively.That is, d[z (m) (i), z (m) (j)] is the greatest differences along any coordinate dimension.• sd(z) is the standard deviation of all elements in z.
• The notation A is the number of template vector pairs having d[z (m) (i), z (m) (j)] < r • sd(z), i � = j ; where r is a parameter.• The notation B is the number of template vector pairs having d[z (m+1) (i), x (m+1) (j)] < r • sd(z), i � = j .where z (m+1) (i) = {z i , z i+1 , . . ., z i+m } The proposed sample entropy S E , defined in Eq. ( 5), is an extension of the traditional sample entropy, which can be used to measure the complexity of time-series signals such as vascular sounds.Sample entropy with large values indicates irregular signals whereas those with smaller values characterize more regular signals.For example.Consider two series (a) 1, 2, 1, 2, 1, 2, 1, 2, 1,2; and (b) 1, 1, 2, 1, 2, 2, 2,1, 1, 2. Both series have the same mean and variance, but different sample entropies.Specifically, the sample entropy ( m = 2 ) for series (a) and (b) are 0 and 1.386294, respectively.It is simple to deduce that series (a) has a more consistent pattern compared to (b).
In the proceeding discussion, we will explore the process through which we determined the optimal parameters for the S E , defined in Eq. ( 5).A comprehensive design of experiment (DOE) was carried out utilizing four factors, each possessing varying levels.

Proposed AI model
This section proposes a concatenated AI model (combining ResNet50 and ANN) that incorporates optimal hyper-parameters determined through DOE.ResNet50 is the dominant component of the proposed model and will be further discussed in "proposed AI model" section.The optimal hyper-parameters will be discussed in "proposed AI model" section.

Insight of ResNet50
ResNet50 33 is a type of deep learning model called a residual neural network,which consists of 50 convolutional layers.The innovation of ResNet lies in addressing the vanishing gradients problem.This problem can make the computation of the loss function difficult.This is achieved through the use of residual connections, which allow the gradients to flow more easily through the model during training.
• Notation ⊗ : indicates the dimension of the kernel function.Ex. " ⊗3x3" indicates that the kernel dimension is 3x3, which has 9 parameters.• Notation #: indicates the stride used in convolution function.Ex. "#2" indicates that each convolution is done with moving two steps.• Notation @: indicates the number of images.Ex. "@25" indicates that there are 25 images.
• Notation⊕ : indicates the addition of two associated pixel in the same location.Ex.: "5x5@25⊕5x5@25 = 5x5@25" indicates that the total number of images (25) remain the same, but the pixel of each associated image after ⊕ changes.
The number of convolutional layers in ResNet50 can be determined using Eq. ( 6).
where the 1st 1 listing in the right-hand side of Eq. ( 6) denotes for the one convolutional layer used in CNR1, the 2nd 1 denotes for the one convolutional layer used in MaxPool.The following , and [3 + (3 × 2)] = 9 denote for the associated convolutional layer used in CNR2 to CNR5, respectively.

Proposed AI model with optimal hyper-parameters
The input-feature part of the proposed AI Model incorporates STFT and the proposed sample entropy.The classification part of the model is formed by combining ResNet50 and ANN into one structure.The optimal hyper-parameters for the proposed AI model will be derived in terms of accuracy through the use of 2 5−1 fractional-factorial design on six chosen factors, named A to E.
• In Table 3, the p-values associated with main factors (A to E) and interaction BD are displayed in the the rightmost column of Table 3.The p-values for all the factors are extremely small, specifically 0.000.This implies that the associated factors are significant and play a crucial role in optimizing the performance of our proposed model.

Ethical approval
All experimental protocols were approved by IRB/ethics committee of Taiwan University Hospital Hsin-Chu Branch.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Figure 7 .
Figure 7. visual representation of the ResNet50 framework.

Table 1 .
Literature Review (Two proposed AI Models are marked in bold at bottom).

Table 2 .
Top five combinations for S E .

Table 4 .
Top Five Combinations for the Concatenated Model.