The prototype device for non-invasive diagnosis of arteriovenous fistula condition using machine learning methods

Pattern recognition and automatic decision support methods provide significant advantages in the area of health protection. The aim of this work is to develop a low-cost tool for monitoring arteriovenous fistula (AVF) with the use of phono-angiography method. This article presents a developed and diagnostic device that implements classification algorithms to identify 38 patients with end stage renal disease, chronically hemodialysed using an AVF, at risk of vascular access stenosis. We report on the design, fabrication, and preliminary testing of a prototype device for non-invasive diagnosis which is very important for hemodialysed patients. The system includes three sub-modules: AVF signal acquisition, information processing and classification and a unit for presenting results. This is a non-invasive and inexpensive procedure for evaluating the sound pattern of bruit produced by AVF. With a special kind of head which has a greater sensitivity than conventional stethoscope, a sound signal from fistula was recorded. The proces of signal acquisition was performed by a dedicated software, written specifically for the purpose of our study. From the obtained phono-angiogram, 23 features were isolated for vectors used in a decision-making algorithm, including 6 features based on the waveform of time domain, and 17 features based on the frequency spectrum. Final definition of the feature vector composition was obtained by using several selection methods: the feature-class correlation, forward search, Principal Component Analysis and Joined-Pairs method. The supervised machine learning technique was then applied to develop the best classification model.


Material and methods
Materials. The research material was collected from 38 patients on dialysis at the Clinical Dialysis Centre of the St. Queen Jadwiga Provincial Hospital No. 2 in Rzeszow. The study was approved by the local Bioethical Committee of the Regional Medical Chamber in Rzeszow (No. 17/B/2016). The data collection and all experiments were performed in accordance with the relevant guidelines and regulations. All patients included in the study have been informed about its course and purpose, and they gave written informed consent to participate in the study. Selected clinical and demographic data of the study group are presented in Table 1.
The registration process was carried out just before subjecting patients to dialysis, which allowed to ensure a high degree of repeatability of the physiological state of patients, in particular the level of hydration of the body affecting the blood density. A single room, isolated from other rooms of the ward by a double door, ensured separation from external acoustic interference. The sampling rate was set at 8 kS/s with a resolution of 16 bit/ sample. In total, while recording in several series, 156 recordings were obtained, from which 2670 feature vectors were extracted. Patients were divided due to their physiological state, based on the opinion of medical staff, ranked from the best to the worst.
All cases were therefore divided into six sub-classes from 'patent' (A) to 'failed' (F), proposed by the medical staff of dialyze centre, as 6 recognizable and successive states of deteriorating of AVF functions. The best cases were labelled A, while the worst cases were labelled F. The assignment of individual patients to specific classes was made based on the the results of ultrasound imaging tests. The number of patients assigned to each class and the number of feature vectors obtained from each patient is presented in Table 2.
Classes were determined on the basis of mathematical analysis of the location of samples in the space of cases. Each class is a cluster of closely located samples. The order of the classes was determined on the basis of the mutual location of the classes in the case space, i.e. the distance of classes A and B is smaller than the distance Table 1. Selected clinical and demographic data of the study group.

Variable
Mean ± SD (min-max) www.nature.com/scientificreports/ of classes A and C, which in turn is smaller than A and D, etc. The order of classes determined in this way was confirmed by the assessment of the clinical state of the fistulas by medical staff. The number of six classes was the largest for which total compliance of the mathematical model with clinical assessment was obtained. The Fig. 1 shows sample images of an ultrasound of fistulas in progressive pathologization. The technique used to determine the flow parameters based on the Doppler effect was used. The upper part of the image shows a map of colours velocity and the direction of blood flow through the vessel fistula marked with selected measuring points. The lower part shows a time diagram of instantaneous values of the speed and direction of the flow measured at the point marked with a marker. The quality of the AVF was assessed by Dialysis Center doctors on the basis of a Doppler ultrasound image, the quality of blood supply from the AVF to the dialyzer during the hemodialysis and degree of excretion from urea during the hemodialysis.

Features extraction.
A 30-second signal, containing not less than 20 heart cycles, was recorded each time.
The recorded signal was divided into fragments corresponding to individual heart cycles (Fig. 2). The division www.nature.com/scientificreports/ points were indicated by the local envelope minima, and each fragment obtained in this way was used to create one vector of features. Frequency domain analysis was performed using Fast Fourier Transform (FFT). Because the input signal was sampled at 8 kS/s, an FFT of 8192 points was used, resulting in a spectrum resolution of 1Hz. The length of a single segment signal depends on the pulse. While analysis using FFT requires the packets of predetermined length, the packets longer than 8192 points were cut into 8192 points and shorter packets supplemented to the required length with zeros.
In the course of research, it was found that the type of window used did not affect the final classification results, therefore a rectangular window was used. For further analysis, only the FFT module was standardized according to the formula: while the values of the features were calculated according to the formula: where T f -features value, f dt , f gt -lower and upper third border frequency f.

Features selection.
The final set of diagnostic features to be implemented was selected based on an indepth analysis of classification quality indicators and the assignment of the examined vectors to individual classes. Preliminary estimations of the classification value of each feature were made using the following selection methods: feature-class Correlation, Forward Search, PCA and Joined-Pairs. The last method was used to assess the classification quality of pairs of features and was created in the course of work on developing a classification system for arteriovenous fistula 17 .
Selection methods from the wrapper group (Forward Search and Joined-Pairs) require the use in the ranking process of the classifier which will be ultimately used in the system under construction. The order of the features in the ranking is determined by the quality of the classification achieved using subsets of the features considered. Assessment of classification quality, in turn, requires the division of the data set into the teaching and test part. Typically, n-fold cross-validation is used in such cases. In our case 10-fold validation was used for feature ranking. Very high indices of quality classification (accuracy up to 0.95) were achieved. However, in the process of verifying results using feature vectors from new patients whose samples were not part of the original data set, significant deviations in the operation of the system were obtained. It was found that the classification system learned the individual characteristics of patients rather than the generalized state of their AVFs. For this reason, modifications were made to the method of splitting the data set into training and test subsets, and leave-one-out cross-validation was used-a subset of one patient was excluded as a test set in each iterative course of validation. This allowed the study to avoid undesirable impactsfrom the tested and training set correlations, which could overestimate the classification quality indicators. Examples of feature rankings for the selection methods used are presented in Table 3. The names of T f refer to features calculated using formula (2).
The Table 3 contains the ranking of features obtainedusing the Correlation method and the Forward search and Joined-Pairs methods in two variants-using 10-fold cross-validation and leave-one-out cross-validation for the k Nearest Neighbours (k-NN) classifier.
The selection of the final set of features was based on the assessment of the classification ability of the subsets that were built by successively adding features in the order of ranking. In this way, each subset was checked for each ranking and for each of the classifiers used. Figure 3 shows the value of the average F-score indicator as a function of the number of features in the set for subsets defined by the Correlation method. Thus, the subsequent points of the graph from the left illustrate the value of F-score for the sets of features {T 400 , T 500 } , {T 400 , T 500 , T 100 } , {T 400 , T 500 , T 100 , T 630 } etc. It is worth noting that the first 8 features indicated by the Correlation method coincide with the first eight features indicated by the Forward Search method, though in a different order (Table 3). In addition, a subset of 8 features maximizes the average value of the F-score at around 0.76. This is the highest value achieved when using leave-one-out cross-validation.
Analysis of the selection results showed features describing the energy content in two frequency bands: 60-140 Hz and 280-700 Hz. With reference to specific features calculated on the basis of thirds, it indicated Classification system. The selection of the classification algorithm was based on empirical studies of the classification quality, obtained for k-NN, Support Vector Machine (SVM) and Random Forest algorithm classifiers 18 . The work parameters of classifiers were selected in the iterative process of searching for optimal values 19 . The best quality indicators for each classifier were calculated for each class separately based on confusion matrixes. and presented in the Tables 4, 5 and 6. The values in the confusion matrixes were expressed as a percentage.
The data presented in the tables show that the best quality parameters were obtained using the k-NN algorithm with the Manhattan metric and distance-weighted voting, with k = 7 . The highest quality indicators were achieved both for the classifier in general and for each class separately. For this reason, the k-NN classifier has been selected for the implementation in the target environment.

Implementation
System concept. The studies mentioned above have shown that the phono-angiogram taken from AVF can be used to evaluate occluded blood vessels. Many patients who would be at-risk for stenosis progression definitely need an easy to operate tool for checking the AVF condition. For this purpose, we have created a system called 'NefDiag' for detecting vascular access stenosis by means of a simple phono-angiography procedure implemented on a microprocessor device and equipped with a dedicated application for predictive stenosis progression with 80% accuracy. The idea was summarized in the Fig. 4. Hardware. The basis for choosing the hardware platform was the possibility to run solutions developed during the research process and the implementation of the program layer indicated in the assumptions, as well as the provision of mobility and the ergonomics of use by the potential user. All these requirements were met  www.nature.com/scientificreports/ by a microcomputer designed for embedded applications from the Raspberry Pi series. A version III based on a 64-bit 4-core ARM processor equipped with a floating-point unit was chosen. The block diagram of the hardware layer is shown in Fig. 5.
To build the head recording the acoustic signal from the arteriovenous fistula, an external USB sound card and integrated mic preamplifier were used. The card is based on the C-media CM-108 system and the preamplifier is max9841. As the GUI basis, a 7" LCD display with a touch panel was used.  Table 5. Confusion matrix and classification quality indicators-SVM classifier.  Table 6. Confusion matrix and classification quality indicators-Random Forest classifier. www.nature.com/scientificreports/ Software. The choice of technology for the implementation of the programming layer of the designed system was performed in two stages. First, the ease of use of the operating system and its type were considered. Next, the technology for the implementation of algorithms of data processing and user interaction were selected. It was initially assumed that the diagnostic system would be created without the support of the operating system. However, it was found that building the application directly on the hardware layer will shift the centre of gravity of the implementation to software-hardware communication and will build the middle layer between the target application and the microprocessor and its peripherals. Available hardware platforms can run operating systems that provide hardware support tools through convenient interfaces, allowing developers to focus on constructive coding. The possibilities of using Linux, Windows and RTOS were considered. RTOS was rejected due to the lack of hardware layer drivers and the need to re-implement any developed processing algorithms and data classification. Windows, despite very convenient development tools, does not allow easy connection and transfer of data between modules that use different programming technologies, in particular different languages. In addition, it is a commercial licensed system, which leads to financial consequences. As a result, Linux was chosen, primarily due to being well known by members of the research team, and because of a number of possible programming techniques that are impossible or difficult to obtain via other systems. The selected operating system indicated a way to implement the diagnostic application. Since the entire research process was carried out using tools working under the control of the Linux operating system, it was possible to reuse them, this time in the target system. Therefore, a modular system structure was adopted based on the master module responsible for data flow and control, and executive modules responsible for individual elementary operations, not necessarily made using the same techniques and programming languages as the main module (Fig. 6).
The master module, responsible for the control flow, data distribution and performing the role of the user interface, was written in the C++ language using the Qt library. Modules performing audio acquisition and data analysis tasks used ready-made solutions available in the form of programming libraries or computing environments. The whole was connected through glue logic scripts.
Audio acquisition was done using the ALSA (Advanced Linux Sound Architecture) library that provides communication with the hardware layer, data buffering, and error handling. Due to the hardware limitations of the sound card used (44 100 or 48 000 S/s sampling rate), it was necessary to decimate the signal to the required sampling rate of 8 kHz.
The signal analysis module was designed in the form of a script executed under the control of the Octave package. It successively divides into fragments corresponding to the heart rhythm and extracts diagnostic features based on the FFT. The next step is classification using the WEKA package and visualization the results on the LCD.
The diagnosis is presented in the form of a bar chart on the LCD display.
The diagnostic device. The user interface consists of four sections (Fig. 7): • graph of the time history of the recorded sound signal, • control panel with buttons,  The time chart makes it possible to observe the graphical representation of the signal during recording, which allows the operator to choose where the head should be applied to the patient's body in such a way that the signal is of the best possible quality with a clear heart rhythm and without any distortion.
In the control panel, one of the five operating options for the device can be chosen: Monitor continuous signal registration with visualization in the field of the time graph, enabling selection of the optimal head application site for the fistula, Record registering a 30-s recording for analysis, Analyze starting the process of analysing the recorded signal, Clear flush buffers and reset user interface before performing next test, Stop stopping the Monitor or the Record function.
The test result is presented in the form of a bar chart. The height of each bar represents the percentage probability of assigning the test result to one of the six classes describing the state of the fistula. The device was enclosed in a dedicated cover made with rapid prototyping with the use of a 3D printer (Fig. 8). The part responsible for A/C processing (including the microphone, preamplifier and sound card) has been made as a  www.nature.com/scientificreports/ separate mechanism and connected to the device via a USB cable. Preliminary tests with patients have confirmed the correct operation of the device. Figure 9 presents examples of device indications in response to signals from fistulas in various states. The highest bar of the graph indicates the class in which the fistula was classified. The lateral, lower bars are the result of noise and interference in the signal causing erroneous recognition of some samples and assigning them to neighboring classes. The cases whose results are depicted in Fig. 9 show the classification results for the cases whose ultrasound images are presented in Fig. 1.

Discussion
In our study, we have described an AVF condition assessment system, based on the results of acoustic signal measurements acquired with an electronic head specifically designed for this task. The proposed low-cost device was created using a machine-learning approach. The general concept was to recognize patients with significant stenosis based on patterns developed and assign them to appropriate classes. This activity was performed in a classical way, as a sequence of successive operations: feature extraction, selection and reduction of features, classification, and interpretation of the classification results. As a final achievement we have constructed a prototype of microprocessor device with the intelligent assessment system which based on the six-step patient classification (letters A-F, from the best condition to the worst).
Based on available literature, up to date, the need for fully automated devices offering the AVF condition assessment has not been satisfied. Both patients and doctors need an easy-to-use and reliable support system in assessing the condition of the fistula. To the best of the authors' knowledge, there is no such tool yet. The proposed device based on machine learning technique can be such support.
Due to the limited number of collected cases forming the classification system, diagnostic indications have been discretized in the form of six classes. The sequence of disease states, from 'patent' (A) to 'failed' (F), was determined on the basis of mathematical analysis of the location of samples in the space of cases and confirmed  www.nature.com/scientificreports/ by the medical staff of dialyze cente, as 6 recognizable and successive states of deteriorating of AVF functions. The number of six classes was the largest for which an unambiguously consistent assessment of the AVF status by medical personnel was obtained. Experiment results show that the acustic signal from a single AVF is classified in the correct, expected class with very good accuracy of 81%. Furthermore, it has been observed that classification by these methods correlates with the results of the classic diagnostic tests used to evaluate AVF functioning, such as Doppler USG. An important result of this study is the confirmation of the possibility of obtaining a quick, non-invasive and reliable diagnosis of the AVF condition, as well as the solution of technological problems toward a fully automated assessment. Finally, it can be stated that the low-cost device created can help doctors and patients with renal disease to assess/predict stenosis progression due to cell growth by providing an easy to use and high accuracy tool.
The strength of this research is finding a trade-off between how much information we can extract from a single measurement vs. how reliable that information is. The 6th grade, medically justified scale used (see Fig. 1) seems to be accurate enough for diagnostic purposes, while giving a high probability of a correct, non-subjective assessment of the state of the AVF. The limitations of our study are a small number of patients studied, the use of AVF division into functional classes based on the experience of dialysis station staff.
The performance of the described method can be greatly degraded in noisy environments. By only using spectral information it can be hard sometimes to directly distinguish the dynamic characteristics of bruit directly from a noisy phono-angiography. However, the doctors with experience can recognize the stenosis sound pattern even in the noisy environments. Therefore, we are going to expand the device menu and add some further options with possibility to assess AVF condition with the use of different methods, like wavelet transform 20,21 or others proposed in the literature for detecting hemodynamically significant vascular access stenosis 9,22,23 .

Conclusion
The acquisition of a large amount of data and the choice of an effective classifier enable automation of decisionmaking processes in medical diagnostics. In this paper, we have described an AVF condition assessment system, based on the results of acoustic signal measurements acquired with an electronic head specifically designed for this task. The proposed low-cost device was created using a machine-learning approach. The general concept was to recognize patients with significant stenosis based on patterns developed and assign them to appropriate classes. This activity was performed in a classical way, as a sequence of successive operations: feature extraction, selection and reduction of features, classification, and interpretation of the classification results. Taking into consideration that the proper selection of a set of features can significantly improve the quality of the classification process, we compared several methods of feature selection. We also compared the quality of classifications using SVM, k-NN and Random Forest classifiers and demonstrated the possibility and relevance of the multi-class approach. All operations mentioned above were performed on the recorded acoustic signals obtained from 38 patients on chronic hemodialysis, without any adjustments for the prior medical conditions, age, gender, etc.
The key element of this process was the classification that transforms the vector of features describing a given sample into a value representing one of the possible object classes. It is worth emphasizing that the sequence of disease states (from 'patent' (A) to 'failed' (F)) was proposed by the medical staff of dialysis centre as 6 recognizable and successive states of deteriorating of AVF functions.
Finally, it can be stated that the low-cost device created can help doctors and patients with renal disease to predict AVF fistula stenosis progression by providing an easy to use and high accuracy tool. Up to date, the need for fully automated devices offering the AVF condition assessment has not been satisfied.