Acoustic Sensing as a Novel Wearable Approach for Cardiac Monitoring at the Wrist

This paper introduces the concept of using acoustic sensing over the radial artery to extract cardiac parameters for continuous vital sign monitoring. It proposes a novel measurement principle that allows detection of the heart sounds together with the pulse wave, an attribute not possible with existing photoplethysmography (PPG)-based methods for monitoring at the wrist. The validity of the proposed principle is demonstrated using a new miniature, battery-operated wearable device to sense the acoustic signals and a novel algorithm to extract the heart rate from these signals. The algorithm utilizes the power spectral analysis of the acoustic pulse signal to detect the S1 sounds and additionally, the K-means method to remove motion artifacts for an accurate heartbeat detection. It has been validated on a dataset consisting of 12 subjects with a data length of 6 hours. The results demonstrate an accuracy of 98.78%, mean absolute error of 0.28 bpm, limits of agreement between −1.68 and 1.69 bpm, and a correlation coefficient of 0.998 with reference to a state-of-the-art PPG-based commercial device. The results in this proof of concept study demonstrate the potential of this new sensing modality to be used as an alternative, or to complement existing methods, for continuous monitoring of heart rate at the wrist.


Results
Acoustic signal characteristics. Periodic contractions and expansions of the heart muscles generate pressure waves to flow through the arterial system, in systolic and diastolic phases respectively. It is these pressure waves that causes pulse in the arterial system. In the systolic phase, the flow of the blood in the vessel expands the arterial diameter (vasodilatation) whereas a reduction in the arterial diameter (vasoconstriction) is observed in the diastolic phase. These periodic changes in the arterial diameter are transferred through a thin layer of soft tissues and muscles to produce vibrations at the surface of the skin, which can be sensed to understand the dynamics of the heart and the blood vessel wall itself 21 . The radial artery is an ideal site for pulse assessment, due to the fact that its vascular properties are less affected by ageing and blood pressure than other arteries in the central region 22 . An example of a pulse waveform recorded by placing a microphone on the radial artery is shown in Fig. 1a(I). It can be seen how the signal mainly consists of two peaks with some intermediate ripples. These are caused, amongst others, by noise of the measuring electronics, electromagnetic interference, and environmental noise. In order to characterize this acoustic signal, the PPG waveform was simultaneously recorded by placing a pulse oximeter 23 on the index finger. As anticipated, a slight time delay between the onset of the pulse at the radial artery and the index finger was observed. This time delay is a function of the pulse wave velocity and the arterial length. The time delay was empirically found to be nearly constant over the length of the recordings. The synchronization of the acoustic and the PPG pulse waveforms was achieved by overlapping the nearest peaks by removing the time delay, as shown in Fig. 1a(II). It can be observed that the systolic and the diastolic peaks of the PPG signal and the acoustic signal are temporally correlated. Therefore, to resemble with the heart sounds terminology, we term the two peaks in the acoustic signal as S1 and S2 sounds respectively. The frequency response of the acoustic signal, sampled at 2100 Hz, was obtained using the Fast-Fourier transform (FFT). It can be observed that the frequency content of the signal in Fig. 1a(IV) mainly lies below 25 Hz whereas the bandwidth of the heart sounds for a normal subject lie between 20 and 150 Hz 3 . This is because of the high frequency attenuation in the pulse wave caused from the source of the sounds (i.e. the heart) to the measurement site (i.e. the radial artery) 24 . In order to understand the power distribution among different components of the signal, joint time-frequency analysis using short-time Fourier transform (STFT) was performed. The STFT of the signal was obtained using a Blackman window of 256 samples and 50% overlap between consecutive frames. The Blackman window was chosen because it allows a steeper roll-off around the boundaries. The power density of the time-frequency grids in Fig. 1a(III) demonstrates, as expected, that the signal power is mainly concentrated in the S1 and S2 sounds, with S2 sounds carrying a relatively lower energy. However, the signal power and its signal-to-noise ratio (SNR) is also dependent on the sensor location at the wrist.
To determine the optimal auscultation site on the radial artery, the pulse was located at three distinct positions: distal, middle and proximal. The middle position can be easily located in front of the radial styloid process (protruded bone near the wrist crease) 21 . The proximal and distal positions are 1-2 cm on either sides of the middle position, towards the elbow and the wrist crease respectively; as shown in Fig. 1b. A total of nine acoustic recordings, three from each location for every subject, were recorded from a total of 10 subjects to analyze the power spectrum at the different auscultation sites. Note that, although these recordings would be affected by the characteristics of the recording setup, in the experiment the environmental noise and motion artifacts were kept to a minimum. The PSDs of the three recordings for every location, and for every subject were averaged to compare the SNR on the different auscultation sites. For illustration, the PSD of the signal obtained by completely blocking the microphone port is also plotted in Fig. 1c. The latter is an indication of the noise inherent to the sensing system itself in absence of any other sounds. A close correlation between the power spectrum of the signals at different locations can be observed. The anatomy of the radial artery suggests that the vessel depth at the middle position is relatively lower than in the other two sites 25 . Therefore, the operations of vasoconstriction and vasodilatation produces skin surface vibrations with higher amplitudes in the middle location due to a lower attenuation by the surrounding tissues and muscles. This, in turn, results in a higher SNR. The same reasoning can be followed to compare the PSDs of the distal and proximal positions. Due to the ease of locating the middle position, and the insignificant difference between the PSDs, the remaining of the study recorded the acoustic signal with the microphone port facing the middle position of the radial artery. HR algorithm's performance analysis. In addition to proving the feasibility of obtaining the cardiac signal from the wrist, this study also investigated the possibility of automatically extracting the most fundamental biomarker, namely HR, from the acoustic signal. Since this is the first time such a signal has been sensed via means of wearable acoustic sensing, a novel algorithm had to be created for this purpose. In order to assess the performance of the proposed method, the algorithm results were compared with other state-of-the-art PPG-based devices, for a total of 12 subjects. The ground truth HR values (HR-PPG) were obtained using the FDA approved, and clinically used SOMNOscreen system 23 . The novel algorithm specifically designed to obtain the output from the sensed acoustic pulse signal (APS) provided the estimated HR values (HR-APS). As an illustration, the estimated and ground truth HR values corresponding to 6 recordings, each of 5 minutes duration for one of the subjects, are plotted simultaneously with upper and lower bounds of 5% respectively with respect to HR-PPG, in Fig. 2a(I). The first computed performance metric, shown in Fig. 2a(II), was the Bland-Altman plot 26 . This served to compare the difference between the estimated and ground truth HR values with respect to their corresponding mean. The circled data points in Fig. 2a(II) indicate the HR differences at different HR averages and their diameter corresponds to the number of points coinciding on the same location. The bias μ was calculated by averaging all the HR differences, whereas the limits of agreement (LOA) were obtained by computing ( 2 ) μ σ ± × respectively, where σ is the standard deviation of the HR differences. The bias for this comparison was found to be nearly zero; and LOA indicated a variation of less than 1 bpm for more than 95% of the data points. As a second performance metric, the line of best fit between the estimated and ground truth HR values was also determined, to understand the degree of similarity using Pearson correlation. The R 2 and root-mean-square-error (RMSE) values depict the corresponding measures of fitness of line to the data. A higher value of R 2 and a lower value of RMSE represents a better fit. For the scatter plot in Fig. 2a(III), the fitted line with equation: y = 0.9958x + 0.2512 was obtained, where x indicates the ground truth HR value, and y indicates the associated estimate. The Pearson correlation was found to be 0.996 with corresponding R 2 and RMSE values of 0.992 and 0.397 respectively.
A similar analysis was repeated for the complete dataset of 12 subjects, where a total of 6 recordings, each of 5 minutes duration were recorded from every subject. The Bland-Altman comparison and the line of best fit thus obtained are plotted in Fig. 2b. A near zero bias and LOA of [−1.68, 1.69] bpm suggests a narrow difference between the estimated and ground truth HR values over the whole database. The Pearson correlation was A further evaluation of the proposed method was obtained by computing the mean absolute error (MAE) and the mean absolute error percentage (MAEP) as defined in Eqs. (1) and (2) respectively, where i HR ( ) est is the estimated HR from the acoustic pulse signal and i HR ( ) true is the ground truth HR from the SOMNOscreen monitor at the i th index in a total of N values. MAE as an evaluation index provides an estimate of the deviation across the whole dataset whereas MAEP indicates the percentage of error in the HR estimation. Along with these performance metrics, the standard deviation (σ) and Pearson correlation (PC) were also determined to understand the degree of agreement between the corresponding HR outputs. The accuracy of the method was evaluated by calculating the percentage of HR values obtained from the acoustic pulse signal and lying within ±5% of the SOMNOscreen output. Table 1 lists the performance metrics of the proposed method for all of the 12 subjects. An overall accuracy of 98.78% with a mean absolute error and a standard deviation of 0.28 and 0.86 bpm respectively, were obtained. Figure 3 plots the HR variations in individual subjects including the standard deviation (HR-STD), minimum (HR-MIN), mean (HR-MEAN), maximum (HR-MAX) and root-mean-square (HR-RMS) of the corresponding HR range. The HR in the complete dataset varied from 42 to 121 bpm.
The proposed method was also tested using acoustic signals recorded in a noisy environment. The signals of 5 minutes duration were collected from 5 subjects. During the experiment, the subjects were asked to read a page of text and loud music was played in background at the same time. The results in Table 2 indicate that the effect of environmental noise on the acoustic pulse recordings for the HR determination are insignificant. Table 3 compares the results of the proposed method with other studies which analyzed the accuracy and reliability of different state-of-the-art PPG-based wrist devices used in the commercial market by comparing them with the synchronous ECG signal. Although these devices were tested under different experimental conditions such as sitting in rest position, walking, and running at different speeds and slopes, Table 3   www.nature.com/scientificreports www.nature.com/scientificreports/ the results corresponding to the data recorded at the rest position to provide an indicative comparison with the proposed method. Note that to the best of the authors' knowledge, there is no database or any other study which has published results on HR monitoring using an acoustic pulse signal, and therefore a direct comparison could not be established. Also, the devices in these studies were tested on different number of subjects, but the total data length were quite similar to this study. The table follows the same abbreviations for the comparison parameters as used in the literature. The mean error (ME) and standard deviation (SD) of the HR differences have the same definitions as μ and σ respectively. These parameters obtain a value of 0.01 bpm and 0.86 bpm for the proposed method and are significantly lower than other devices. The MAE and MAEP in this work are found to be 0.28 bpm and 0.39%, and demonstrates better performance in comparison to the devices analyzed by Stahl et al. 8 and Parak et al. 27 . A higher PC of 0.99 as compared to 0.96 for Basis Peak and 0.83 for Fitbit Charge HR, as studied by Jo et al. 10 , also indicates a higher agreement between the estimated and ground truth HR for the proposed method. The standard error (SE) of the mean measures the deviation in the mean HR of all the subjects and attains a higher value of 4.55 bpm in this study. This is mainly because the SE is inversely proportional to the square root of the sample size 28 . Since the other studies were tested on a higher number of subjects, the inverse proportionality results in a lower estimate of the SE. The comparison over these parameters show that, considering PPG is a widely accepted technique, the proposed method utilizing the acoustic sensing can provide accurate results for HR monitoring at wrist under equivalent conditions.

Discussion
The feasibility of acoustic sensing of the radial pulse using a wearable device has been investigated in this paper. While ECG has always been used as the gold standard method to record cardiac signals from the chest, measuring it continuously with a wearable device presents lots of limitations, varying from reliability to usability. An alternative to ECG, which improves on the usability aspects, is to use PPG-based devices instead. This approach  Table 3. Performance comparison of the proposed method with results obtained from different PPG-based wrist devices used in the commercial market. The table only compares the results of the data collected at the rest position and provides an illustrative comparison because the experimental conditions varied between different works. + The data length is for all the subjects combined together. *SD was calculated from the results of 95% equivalence testing given in this paper. † The results provided in the paper were obtained by averaging the data to 5 seconds epochs. www.nature.com/scientificreports www.nature.com/scientificreports/ is very popular due to the fact that it allows monitoring with the sensor attached on the wrist. But methods based on wrist PPG are not limitations free either. The requirements of an active input signal limit either the size of the system and/or the battery lifetime. In addition the systems are very sensitive to motion and other artefacts. Hence having an alternative lower power sensing approach would be desirable to either complement the PPG to increase the sensing accuracy, or replace it altogether, depending on the clinical target. The passive sensing mechanism of state-of-the-art acoustic sensors (MEMS microphones) imposes significantly less constraints in terms of power, hence being more suitable from the size and maintenance perspective for a wearable device.
In this work, the optimal auscultation site on the radial artery was also studied, since this is a factor to consider when comparing the ease of sensor attachment with respect to ECG-and PPG-based approaches. It was proven that acoustic sensing allows for a relatively wide region of sensor placement with an insignificant difference between the SNR of the signals recorded from different locations over the radial artery.
The characteristics of the pulse wave originating from the heart-as a result of the opening and closing of the heart valves, and propagating as a mechanical wave along the arterial branches were also investigated, by comparing the acoustic and PPG pulse waveforms. Although negligible, the heart sounds also transmit an acoustic wave through the body 14 . Since these acoustic features are superimposed on the vessel vibrations caused by the mechanical constriction and dilation of the radial artery, a similar type of skin surface modulation is obtained. While PPG only measures the pulse wave component, it was proven how acoustic-based sensing allowed the detection of both cardiophysiological characteristics of the radial pulse. This was done by observing the bandwidth of the acoustic pulse waveform, which contained energies in the audible range as compared to bandwidth of less than 10 Hz for the PPG waveform 29 . Consequently, with the proposed approach it is shown that it is possible to monitor both, the heart sounds as well as the pulse wave using just one wearable system. These findings could be utilized in the future work to, for example, study different phases of the Korotkoff sounds at the wrist, to measure blood pressure using a wearable device.
Furthermore, by comparing the HR obtained from acoustic sensing with other state-of-the-art PPG based devices, it was shown that the presence of fundamental heart sounds in the acoustic pulse waveform improved the heartbeat detection, an important variable in continuous vital sign monitoring. Heartbeat detection based on extraction of S1 sounds using the new proposed method further reduced the error between the estimated and ground truth HR and achieved a high accuracy of 98.78% with a PC of 0.99 and narrower LOAs of [−1.68, 1.69] bpm. These results prove that the proposed method could be used as an alternative, or to complement PPG for continuous monitoring of HR at wrist. It is worth noting, however, that although the proposed method for HR has been tested experimentally, this paper presents just the proof of concept. To be used as part of a medical device, full clinical validation would require testing on a larger cohort wearing a device based on this principle in an ambulatory setting. This would allow not only to investigate a wider range of cardiac signals, but also to test with real life artifacts.
As a summary, with this work, we showed for the first time, that the acoustic signal sensed from the radial artery in the wrist can be used as a novel physiological signal to extract biomarkers indicative of cardiac performance. Furthermore, this signal provides advantages with respect to other conventionally used ones, which make it specially suitable for wearable devices. The concept and feasibility has been proven with the automatic extraction of HR. In future work, automatic extraction of other cardiac biomarkers could be investigated, such as HR variability using the inter-beat intervals, pulse transit time, pulse wave velocity, etc.

Methods
Acoustic sensing. The periodic pumping of the blood through the cardiovascular system in the body generates dilation and constriction cycles in the radial artery. As a result, periodic variations in the arterial diameter occur which produce corresponding vibrations at the surface of the skin. These vibrations introduce changes in the surrounding air pressure which can be transferred to the diaphragm of a suitable microphone. Long term monitoring of these vibrations using a miniaturized device requires a sensor with a small form-factor, operating with very low currents so that the whole system can run with a small battery over a suitably long period of time. For this study, we designed a miniature, battery-operated wireless device as shown in Fig. 4 using an ultra-low noise, omnidirectional MEMS microphone sensor (InvenSense INMP411). The MEMS microphone was chosen because MEMS technology offers excellent acoustic characteristics with very small form factors. This is achieved through a fabrication process which involves creating a moveable membrane and a fixed backplate over a cavity in the base silicon wafer [30][31][32] . While the perforations in the fixed backplate allows air to flow easily through it, the moveable membrane flexes in response to the change in surrounding air pressure caused by sound waves. These movements change the capacitance between the backplate and the membrane, which can be sensed by an application specific integrated circuit to convert the vibro-acoustic effects in an electrical signal. The chosen microphone has a high SNR of 62 dBA, a uniform sensitivity of −46 dBV between 28 Hz and 20 KHz, and a low power consumption of 210 uA at 3.3 V supply 33 . However, any microphone of similar size and specifications could be used. The analogue output of the microphone after appropriate filtering and amplification was digitised using an inbuilt analogue-to-digital converter of a Nordic Semiconductor nRF52 Series chip. This chip also contained a Bluetooth low energy transceiver for the wireless transmission of the data using a 2.4 GHz chip antenna (Johanson Technology Inc.). The overall weight of the final wireless prototype was 8 grams, although note that this could be further optimized by using more sophisticated manufacturing processes. In addition, its size and shape were designed so that it could be easily attached to the wrist using double sided medical adhesive tapes to keep the sensor affix to the measuring site, for a long-term usage 34 . Algorithmic blocks. An overview of the novel algorithm proposed to automatically determine the HR by extracting the S1 sounds from the acoustic pulse signal is shown in Fig. 5 www.nature.com/scientificreports www.nature.com/scientificreports/ stages: 1-The pre-processing blocks reduce contamination of the signal caused by noisy artifacts, in order to improve the SNR for further analysis; 2-The PSD of the signal is calculated in the following stage using STFT to extract the S1 sounds; 3-Finally, the peaks corresponding to these sounds are detected to provide a time index by constructing a squared energy envelope for HR determination. A pseudo-code for the proposed algorithm is also provided in Table 4. The following sections explain the details of the different blocks.
Acoustic data pre-processing. The acoustic signal sensed at the wrist contains not just the signal of interest but also other signals that are picked up by the electronic system, such as motion artifacts and sounds from the surrounding environment. In order to achieve a better SNR by reducing the effects of the latter, the acoustic signal, denoted by time-series y is processed into rectangular windows of 5 seconds duration with 1 second of overlap between successive segments. The window length is chosen to include enough number of heart beats corresponding to an HR in a range of 40 to 200 beats per minute (bpm).
Most of the frequency content of the acoustic signal is contained below 25 Hz. Because of this, undesired higher frequency interference/noise is reduced by using a fifth-order Butterworth low-pass filter with a cut-off frequency of 25 Hz. The acoustic signal originally sampled at 2100 Hz (f s ) possesses frequencies well below the corresponding Nyquist frequency after the filtering process. This redundant information is therefore removed by downsampling the signal by a factor of 10 reducing the sampling frequency to 210 Hz (f d ), without introducing any aliasing in the signal.
Since the signals were continuously recorded in a session of 30 minutes duration, the subjects were allowed to move their wrist or fingers during the data acquisition. These movements could possibly introduce acoustic vibrations which can be sensed by the microphone and introduce large amplitudes in the signal. The frequencies corresponding to these artifacts can lie within the bandwidth of the acoustic pulse signal, which would not be eliminated with simple low-pass filtering. However, the effect of such movements usually lies in smaller time frames. Because of this, K-means clustering method 35 with two classes, C1 and C2, is used in the algorithm to identify the parts of the signal which are significantly corrupted by them. The method initially divides the signal blocks, y, of 5 seconds duration, into five equal parts, each of 1 second duration, and denoted by y n , ∈ n [1, 5]. For  www.nature.com/scientificreports www.nature.com/scientificreports/ every part, the maximum amplitude (A max ) and the standard deviation (σ) are determined to reflect the signal characteristics as x and y-coordinates respectively. These feature coordinates are fed to the K-means blocks to cluster the five signal parts into two different classes based on the similarity of the features. The method proceeds by choosing two cluster centroids, O1 and O2, and groups the features into two classes by iteratively updating the centroid coordinates, x y ( , ) 2 , to minimize the feature points-to-cluster-centroid distances. Once the iterative process converges, the horizontal change, Δx, between the centroids is determined, and the class C with a lower standard deviation is found. A change of less than 50% in Δx reflects a close correspondence between the maximum amplitudes of different signal parts, and indicates no significant corruption by the motion artifacts. Since the artifacts exhibit a higher standard deviation than the acoustic pulse signal, the class with a lower y-coordinate is chosen in cases where the change in Δx is more than 50%. Depending on the comparison between these parameters, in Eq. (3), the signal parts y n are scored by assigning S n , ∈ n [1, 5] a value of either 1 or 0. The signal parts with a score of 1 are ignored from the further processing.  Figure 6a shows different pre-processing stages for a 5 seconds block of signal, a part of which is significantly corrupted by motion artifacts. It can be seen how the processing results on successfully ignoring the first part of the signal from the further processing. S1 sound extraction. An HR in a range of 40 to 200 bpm corresponds to a beat-to-beat interval of 1500 to 300 milliseconds respectively. The number of S1 sounds in a 5 seconds window therefore can vary from 4 to 17. The measured PSD of the acoustic pulse signal showed that the frequencies corresponding to the S1 sounds, in the joint time-frequency analysis, carried higher power than other parts of the signal. This property of the signal is utilized to extract these sounds in the time-domain and process them further to find the HR. But it was also important to select a proper window length for calculating the PSD of the signal, as a better time resolution allows the extraction of the S1 waveform without interfering much with the nearby signal transitions.
The power spectrum of the acoustic pulse signal with a downsampled frequency of 210 Hz is calculated in the algorithm using a Blackman window of 32 samples (approximately 150 milliseconds) with an overlap of 50% between successive frames. The chosen time window, as shown in Fig. 6b(II), provides the required time resolution to extract the S1 waveform by segmenting the time axis into a relatively higher number of grids. The colour intensity of these grids in the time-frequency space indicates their corresponding contribution to the overall power of the signal. The grid with the maximum power, P max is found and all the grids with power not differing more than 5 dB with respect to P max are also selected. It is understood that the beat-to-beat interval cannot be lower than 300 milliseconds 3 therefore, all the grids with a mutual separation within this time period supposedly belong to a single S1 sound, and hence they are all grouped together, as shown by rectangular windows in Fig. 6b(III). For m of such groupings, the starting and end time points, t sa and t ea , where ∈ a m [1, ], are noted. The threshold difference of 5 dB (P t ) is increased in steps of 1 dB, up to a maximum of 10 dB, to limit these m number of groupings for a 5 seconds window between 4 and 17. A tolerance window of 150 milliseconds, observed empirically, is added to t sa and t ea to enlarge the region of interest in the time-domain, and ensure that the S1 waveform 1. Initial pre-processing of the signal.
• Acoustic pulse signal: y, sampled at f s = 2100 Hz.
• K-means method: Form two clusters by scoring the signal parts y n using S n = {0, 1} for ∈ n [1,5].
3. Peak detection from extracted S1 sounds. 4. Find the continuous average HR.
• Find the time indexes for maximum of energy peaks: T m = max(E m ).
• Averaging filter: ∫ y  − . + . seconds is retained, whereas the other parts of the signal are zeroed for the further processing as shown in Fig. 6b(IV).
Peak detection. Constructing energy envelope of the extracted S1 sounds. Although a number of peak detection methods using the joint time-frequency analysis exist 36,37 , the power spectrum of the acoustic signal obtained using STFT provides an easy way to detect the S1 sounds as the peaks. However, it is important to determine a single time-index for every S1 sound in the signal, so that their mutual time differences can be utilized to calculate the HR. To obtain the peak-indexes, every sample of the signal is first squared so that the positive and the negative waveform of the S1 sounds can be transformed to only positive amplitudes above the baseline as shown in Fig. 7a(I). The squaring process provides a nonlinear amplification of the signal by emphasizing the higher frequencies corresponding to the S1 sounds, whilst attenuating the nearby transitions with lower energies.
A moving average filter is subsequently used to integrate the squared energy waveform. The width of the integration window is an important parameter to consider and should ideally be equal to the maximum time duration of the S1 sound in the signal. A window with a larger width can combine the energy of the S1 sound with the energy of nearby signal transitions, whereas a narrower window can produce multiple energy envelopes for the same sound 38 . For a signal with a sampling frequency of 210 samples/second, the filter averages the squared energy waveform over a window of 32 samples. The squared energy followed by an averaging process therefore produces an energy peak corresponding to the S1 sound, as shown in Fig. 7a(II) which can be easily processed to find the corresponding time index.
Artifact identification and elimination. In the pre-processing stage of the proposed algorithm, there were some instances when the artifacts introduced by the wrist or finger movements significantly corrupt some sections of the acoustic signal and were not detected by the K-means method. This happened when the maximum amplitude and standard deviation of the signal corrupted with artifacts were close to the features of cleaner sections in a 5 seconds window. Since these artifacts may have significant power, in comparison to the S1 sounds, the STFT analysis allows such signal transitions to appear as well in the further analysis. The energy envelopes of such sections corrupted with artifacts could introduce misleading energy peaks affecting the accurate determination of time indexes. To avoid the misclassification of an artifact as the S1 sound, features such as time width and amplitude of every energy peak, are determined in the algorithm. For the acoustic signal, the total number of 5 seconds blocks is defined as L, where y n [ ] z , z L [1, ] ∈ represents each signal block. Assuming that the parameter l z provides the total count of energy peaks in y n [ ] z , the width and amplitude features of every energy peak E m are denoted by w m and a m , respectively, where ≤ ≤ m l 1 z . The thresholds W z and A z to process the segment under (II) Low-pass filtered and downsampled signal to remove higher frequency components and redundant information respectively. (III) Clustering using the K-means method to identify signal segments corrupted with motion artifacts. Symbol + and □ represents the features and cluster centroids respectively. (IV) Signal segment corrupted with motion artifact (due to wrist/finger movement) removed from the downsampled signal; (b) S1 sounds extraction from a different pre-processed signal with no corrupted segment: (I) Acoustic signal after initial low-pass filtering, downsampling and K-means application. (II) PSD of the signal obtained using STFT to extract S1 sounds. (III) Rectangular windows representing the regions of interest. (IV) S1 sounds extracted by adding a tolerance of 150 milliseconds on both sides of the rectangular windows. Subjects and experimental protocol. Signals were recorded from 12 healthy subjects aged 19-48 by placing the new miniature, battery-operated wearable device over the radial artery. The sensor attachment, over an area equal to the size of the sensor (27 × 20 millimetres), did not require any cleaning process. The data was recorded only through contact sensing without applying any external pressure on the device. The signals were sampled at a frequency of 2100 Hz and wirelessly transmitted to a nearby base station. The PPG signals from the index finger were simultaneously recorded using a commercially available SOMNOscreen pulse oximeter 23 . The SOMNOscreen monitor also provided an estimate of the HR every (1/4) th second. The monitor uses a methodology to determine the HR for which the details are not publicly available. A total of 6 recordings, each of 5 minutes duration were recorded from every subject. All the recordings were collected in an uncontrolled environment, but the subjects were asked to sit and relax on a chair. Since the recordings were performed for a long duration, the subjects could move their wrist and fingers, as and when required. The synchronization of the data from both the sensors, which is critical to evaluate the performance of the proposed system, was carried out by matching the nearest systolic peaks.
Human subjects. The study was approved by the local ethics committee of Imperial College London (ICREC reference number: 18IC4358). All research was performed in accordance with relevant guidelines and regulations. The informed consent was obtained from all the subjects in human trials.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.