Non-invasive diagnosis of deep vein thrombosis from ultrasound imaging with machine learning

Deep vein thrombosis (DVT) is a blood clot most commonly found in the leg, which can lead to fatal pulmonary embolism (PE). Compression ultrasound of the legs is the diagnostic gold standard, leading to a definitive diagnosis. However, many patients with possible symptoms are not found to have a DVT, resulting in long referral waiting times for patients and a large clinical burden for specialists. Thus, diagnosis at the point of care by non-specialists is desired. We collect images in a pre-clinical study and investigate a deep learning approach for the automatic interpretation of compression ultrasound images. Our method provides guidance for free-hand ultrasound and aids non-specialists in detecting DVT. We train a deep learning algorithm on ultrasound videos from 255 volunteers and evaluate on a sample size of 53 prospectively enrolled patients from an NHS DVT diagnostic clinic and 30 prospectively enrolled patients from a German DVT clinic. Algorithmic DVT diagnosis performance results in a sensitivity within a 95% CI range of (0.82, 0.94), specificity of (0.70, 0.82), a positive predictive value of (0.65, 0.89), and a negative predictive value of (0.99, 1.00) when compared to the clinical gold standard. To assess the potential benefits of this technology in healthcare we evaluate the entire clinical DVT decision algorithm and provide cost analysis when integrating our approach into diagnostic pathways for DVT. Our approach is estimated to generate a positive net monetary benefit at costs up to £72 to £175 per software-supported examination, assuming a willingness to pay of £20,000/QALY.


INTRODUCTION
Venous thromboembolism (VTE) is associated with a major global burden of disease. Worldwide, the incidence of VTE is 1-3 per 1000 individuals, rising to 2-7 per 1000 in individuals aged over 70 years, and 3-12 per 1000 in those over 80 years 1 . VTE, deep vein thrombosis (DVT) and pulmonary embolus (PE) are the leading cause of hospital-related disability-adjusted life years lost 2 .
Using these estimates, and using the most conservative incidence figure, globally at least 7.7 million people will require investigation for VTE every year. An ageing population across many countries will lead to a greater health burden, particularly in middle-and low-income countries where early death from infection is decreasing. Mortality from VTE is common, a European study estimated 534,000 deaths per year 3 and a similar study in the US reported 300,000 deaths per year 4 . DVT has a high level of morbidity. 30-50% of the surviving patients develop long-term symptoms in their affected leg (post-thrombotic syndrome) 5 .
In high-income countries, the routine practice to diagnose patients after a positive D-dimer blood test and an indicative evaluation using the Wells score 6 is to confirm or rule out a suspected DVT with a two-or three-point ultrasound scan. Ultrasound scans are most commonly performed in a radiology or cardiovascular department of a hospital by a highly trained radiographer/radiologist.
Currently, no reliable test is available that can be used in a general healthcare setting (GP practice, community hospital, on a hospital ward) or be used remotely at the point of care (nursing home, patient's home). Between 85 and 90% of patients presenting to their GP in high-income countries with a suspected DVT will be investigated only to find no evidence of a thrombus 5 . Many patients will receive unnecessary anticoagulants with numerous potential side-effects through an often-painful subcutaneous injection whilst waiting more than the recommended four hours for their scan. Safely negating this wait would improve patient satisfaction, reduce the burden of high-risk treatment (anticoagulants confer haemorrhagic complication risks) and discount healthcare costs. Rapid diagnosis is known to improve compliance to regulatory guidelines that state DVT should be diagnosed within 24 h [6][7][8] . Clinical evidence that DVT examinations using ultrasound can be performed by nurses has been shown [9][10][11] . However, confidence in acquiring ultrasound images is generally low because of the required image interpretation skills and liability concerns, which inhibits wide-scale adoption of such approaches. In this study we evaluate if Machine Learning (ML) technology can provide anatomical image acquisition guidance and point of care diagnostic support. Such ML technology is currently often summarised as Artificial Intelligence (AI) support systems.
ML technology has previously been explored in the context of VTE, with several studies having shown the potential for ML clinical decision support systems (CDSS) to add incremental value in improving VTE risk stratification of patients. Most of these proposed CDSS are predominantly based upon the Wells criteria 12 , whilst others are more complex, taking into consideration a broader range of clinical risk factors for VTE as identified in the Caprini model (35 discrete clinical risk factors) 13,14 . However, to the best of our knowledge, no prior study has shown the potential benefit of ML to aid in the image-based diagnosis of DVT using ultrasound. Our hypothesis is that ML technology can complement the clinical pathway and provide non-specialists with the necessary confidence and skills to perform ultrasound DVT screening autonomously. Early modelling has been undertaken to assess the potential cost-effectiveness of such an approach.

Study participation
External Validation Set 1 (EVS1). 124 patients who presented to the Oxford Haemophilia and Thrombosis Centre, Oxford, UK, with symptoms suggestive of DVT were approached for inclusion into this study. Compression ultrasound has been performed according to the standard practice, without software guidance. Patients have first been scanned as part of the standard pathway with various scanners, followed by another scan using a provided Philips Lumify probe with screen recording software.
The recorded screen capture videos have been curated to a data set that is similar in nature to one as it would have been acquired with AutoDVT software guidance.
Thirty-six patients have been excluded during the enrolment phase for various reasons as summarised in the Consort Diagram in Fig. 1. Two patients with confirmed DVT have been excluded due to imaging conditions that are not covered by the standard compression ultrasound DVT protocol (non-echogenic thrombus and superior thrombosis in the iliac vein). Control participants had no DVT based on comprehensive clinical and laboratory testing performed under the supervision of and interpreted by a haematologist. This results in a data set comprising of 88 eligible patients. An overview of patient characteristics in this clinic's database is given in Table 1.
It was specified that all examinations that were not performed according to the standard implemented in our study design should  Algorithm performance on the internal validation set Figure 2 shows qualitative examples for the segmentation output of our method. Table 2 shows quantitative results for the anatomical landmark detection task; Table 3 for the vessel  compression task and Table 4 regarding segmentation performance. Common image evaluation metrics, Sørensen-Dice Coefficient (Eq. (6)) for segmentation results and F1-score (Eq. (5)) for anatomical landmark discrimination and categorical vessel compression analysis, are used for quantitative evaluation.
Algorithm performance on the external validation sets EVS1. Quantitative results on EVS1 are summarised in Table 5.
Receiver operator curves are shown together with confusion matrices in Fig. 3 on patient level and Fig. 4 on sequence/ anatomical landmark level. Note that these results are based on retrospective analysis of prospectively acquired ultrasound videos without using software guidance. In a perfect prospective setting, AutoDVT guides the operator to acquire images that are well suited for algorithmic evaluation. A setup more akin to the latter paradigm has been tested in the pilot study in EVS2 (Fig. 5).
EVS2. Results from 30 DVT-suspected cases (four DVT positive) that have been acquired using the AutoDVT software prospectively in the Clinic of Angiology in Potsdam, Germany, are presented in Table 5 and Fig. 5. We test the same fourfold crossvalidation models as used for EVS1 data from Oxford. Cost effectiveness D-Dimer plus ultrasound confirmatory scan for DVT diagnostic (Fig. 11a) is currently costed at £92-£97 16 (Table 14) in the UK NHS. Using the sensitivity and the specificity ranges from Table 5 (EVS1 + EVS2), a maximum positive net monetary benefit (NMB) between £71-£139 per ML-guided examination can be achieved when AutoDVT is integrated into clinical pathways according to Fig. 11. We assume a willingness to pay of £20,000 per QALY 6,17 . Figure 6 shows how the NMB changes with different prices for an ML-guided examination, considering the different diagnostic algorithm variants in Fig. 11. Accuracy versus costs is compared in Table 6.

DISCUSSION
This study provides a proof of concept that ML-based analysis can distinguish patients with and without DVT while providing image acquisition guidance for non-experts according to the clinical standard. Evaluation was performed on a sample size of n = 53 enroled patients from the same clinic, 34 DVT-positive patients and 30 additional patients from another clinic, n = 4 DVT positive. Algorithmic DVT diagnosis results in a sensitivity within a 95% CI range of (0.82, 0.94), specificity of (0.70, 0.82), the positive predictive value (PPV) of (0.65, 0.89), and a negative predictive value (NPV) of (0.99, 1.00). Our method suggests a diagnosis based on robust segmentation in contrast to a direct image discriminator model. Consequently, our method does not rely on discrimination in the conventional ML sense. Our model learns predominantly from healthy volunteer data how a healthy vessel looks like and uses this knowledge to identify DVT suspected patients in the test data. This is different from traditional decision boundary modelling with fully supervised learning from a balanced dataset. Thus, our model is not noticeably affected by class imbalance issues in the training data; if the vein closes, the compression sequence is not DVTsuspected, otherwise it is. Identifying the correct vessel and interpreting the state of the vessel is the challenging part, which is addressed by our ML model. Data variability is relevant for the representation of the vessel itself. To improve this learnable variability, we use data augmentation as commonly used for image data 18 . All images are resampled to 150 × 150 pixels to facilitate real-time inference capabilities. We use image augmentation during training: random left/right flipping, ±15 pixel random translation, ±15 ∘ rotation, random zoom at a maximum factor of 0.05, intensity re-scaling with a maximum range of ±0.3.
ML has previously been studied for a variety of diagnostic approaches [19][20][21] . Several studies have applied ML in the context of VTE, although these ML applications have focused on developing CDSS that aid clinicians in VTE risk stratification of patients rather than diagnose VTE 12,22 . To the best of our knowledge, our work is  a pioneering study that shows the potential benefits of ML for the diagnosis of DVT through imaging. Our work evaluates all implications for the implementation of a ML model in a challenging clinical workflow like DVT diagnosis with ultrasound imaging, a pathway that requires direct humanmachine interaction. This contrasts currently dominating ML methods for retrospective image analysis of tomographic data like CT or MRI, which usually presents itself to an algorithm clearly without imaging artefacts and in an often canonical orientation. Free-hand ultrasound poses additional challenges compared to these settings.
First, a user needs to be directed and guided to acquire images that are suitable to make a prediction through a ML model. This requires algorithmic provisions to discriminate useful images from images that do not adhere to a clinic standard. We solve this problem through training a discriminator ML model, which can identify predefined anatomical locations along the femoral vein.
Second, compression ultrasound requires the analysis of continuous image sequences which is challenging in a setup that requires real-time feedback. We solve this problem through a sliding window, multi-channel input approach, which enforces spatio-temporal consistency for a combined vein-segmentation with learned decision boundaries for identifying a vessel as fully closed. Furthermore, mobile ultrasound probes are used and connected to a GPU-accelerated laptop to provide sufficient computational power.
Third, image domain shift is a serious limitation of ML applications in healthcare. Domain shift occurs when a model is trained on images that have been acquired on one device while the testing is performed on images from other, previously unseen images from different devices. Commonly, a noticeable drop in performance is observed in such situations. We mitigate this problem through integrating image data from a diverse set of devices, covering almost the entire market for mobile ultrasound devices. Still, there is no established method for robust domain adaptation 23 . Hence, a risk of reduced performance remains when applying the presented algorithms to images from a new device. This risk must be avoided by deploying these algorithms exclusively with thoroughly tested, specific devices.
ML-supported devices such as described here are often summarised as clinical AI 21 . A critical element of any AI-based support tool is its clinical relevance.
DVT has a relatively low prevalence; 7.1% for a selected population who present to a DVT clinic as in our work and <0.003% in the general population. We factor this in when calculating PPV and NPV, thus providing values that are most informative for patients. In our case, a NPV of around 99% means that if the software-supported imaging test does not provide evidence for the presence of a blood clot, that there is an extremely low chance that this prediction is wrong and that the patient might still have DVT. Conversely, the PPV of our method is about 77% with a large 95% confidence interval of 12 percentage points. This means that if the automated imaging test gives a DVT positive result, that there is still a 20-30% chance that this diagnosis is wrong. This is addressed in possible clinical pathway integration strategies in Fig. 11. A positive test with AutoDVT will always lead to a confirmatory scan with an expert, who will also make treatment decisions which may include secondary criteria like for example the age of the thrombus. However, within the group who tested positive with AutoDVT, the expert's chance of seeing an actual DVT-positive patient is more than 80%, which is notably higher than the current 7.1%. Increasing the pre-test probability for DVT will likely reciprocally increase the diagnostic utility and discriminatory power of the expert examination as well.
Literature and our own experiments show strong evidence that a DVT examination in primary care performed by non-experts is feasible. We would expect that rapid point of care diagnostics and wide availability of testing, which is conceivably enabled by our approach, would lead to timely treatment, decreased stress, and increased patient satisfaction. Furthermore, a cost analysis simulation model has been evaluated when integrating the proposed algorithm into the clinical practice. Assuming a willingness to pay £20 000/QALY 17 , a maximum NMB between £71 to £139 per examination could be attained when ML guidance is used by non-specialist workers for DVT diagnosis. This assumes zero costs for the use of the software; thus, it is the maximum achievable NMB. A DVT examination software tool could cost up to £72-£175 at the sensitivity and specificity levels measured in Table 5, before the NMB falls below £0. If the examination costs go above £72, then the conclusion that AutoDVT is cost-effective becomes more uncertain.
Our study has several limitations. First, in EVS1, we evaluate a prospectively enroled patient cohort retrospectively, on video sequences that have not necessarily been acquired at an optimal standard. Therefore, we had to curate the data and automatically extract clips from entire exam video recordings that would be Eq. (6) E q . ( 7) Evaluation according to Eqs. (6) and (7) on the internal validation set.
(⋅, ⋅) is the 95% confidence interval range. most similar to clips as they would be acquired by the AutoDVT software guidance method. Furthermore, free-hand ultrasound examinations are highly operator-dependent, and every operator has a unique style of examination. Our proposed approach aims to standardise these styles to provide optimal input for subsequent image analysis parts and to aid clinical audits. Second, standardised acquisition has been demonstrated in EVS2 but for both external validation sets, our patient cohorts are small, and we compare across-population with findings from literature. This limits the types of statistical techniques that can be employed in this study to evaluate statistical significance. We will soon start a multi-centre prospective trial that will address these issues to give further insights into the practical implications of employing AI support for DVT diagnosis. As suggested in the proposed diagnostic DVT decision trees (Fig. 11), the ultimate goal of employing AI support for DVT diagnosis would be to develop a ML-powered system using free-hand ultrasound that enables healthcare generalists at the point of care to exclude the presence of DVT in negative cases. If sufficient accuracy is achievable, this could obviate the need for a diagnostic scan performed by an expert user for DVT negative cases, leading to quicker diagnoses and further cost-benefits. Achieving this goal will require a number of clinical acceptance issues to be overcome. Perhaps the most important of these is the notion of clinical responsibility. When an expert user performs a scan, the presence or absence of DVT is determined by the expert user, who bears the clinical responsibility for the outcome of the test. By obviating the need for an expert user, the clinical responsibility and any associated liability must lie with the AI/ML-powered system and hypothetically associated teleradiology workers, since it is the system that determines the outcome of the test and not the non-specialist user holding the ultrasound probe. The implications of this are particularly significant in the context of a false negative outcome given the possibility of DVT progression to PE and even death. With this in mind, clinical acceptance is realistically only attainable if the AI/ML-powered system can achieve exceptionally high NPV, as shown in our work.
Hence, this study describes the first step of a larger clinical trial programme which we will use to ultimately evaluate the clinical efficacy of the AutoDVT software for the diagnosis of proximal DVT. The study we describe confirms that the AutoDVT software can diagnose DVT accurately. However, in order for the device to be accepted within the clinical community a large-scale efficacy study is required to confirm non-inferiority to expert-led compression US for proximal DVT diagnosis. Once this has been conducted, the device will be able to be offered as a diagnostic alternative to hospital clinic-based DVT diagnosis. In conclusion, our study shows the potential of a ML-powered system using free-hand ultrasound to identify DVT in clinical populations with high-throughput requirements and at the primary care level. Since access to ultrasound imaging is increasing and amplified through costeffective mobile ultrasound devices, a ML-supported examination by less specialised front line care workers has the potential to be adopted for proximal DVT screening before confirmatory tests.

Study design
This study is a primary analysis of compression ultrasound scan recordings performed on prospectively enroled patients at the Oxford Haemophilia and Thrombosis Centre adult DVT clinic. The University of Oxford, UK, approved the study (Ethics: 18/SC/0220, IRAS 234007). All participants Fig. 3 Evaluation results for ESV1 on patient level. Receiver operator characteristics on EVS1 resulting from fourfold cross validation (a). Confusion matrices are shown in (b) for the optimal threshold (* in (a)) in each fold. Frame colours in (b) correspond to ROC fold colours in (a)). Vessel status is extracted automatically from 53 patients in EVS1 through the combination of fold-specific groin and knee model pairs.
provided written informed consent. Eligible participants were consecutively recruited between January 2019 and December 2019. Patients were approached about participation in the study after their routine ultrasound DVT examination. After study information and consent, they were scanned for a second time by an expert radiographer. During the second scan a mobile ultrasound device was used (Philips Lumify L7). The examinations were recorded as mp4 videos. Patient identifying information has not been recorded in the videos but separately in a spreadsheet where it was tagged with a unique identifier (UID) by co-author Ch.D. Only the UID was used during downstream analysis.
A second pilot evaluation has been conducted in another clinic, the Ernst von Bergmann Klinikum Potsdam, Germany, (Ethics: S7(a)/2020). Eligible participants were recruited between November 2020 and April 2021. Patients were approached about participation and consented in the study after their routine ultrasound DVT examination. The examination was conducted with a Clarius L7 HD (2020) by a clinical expert. In contrast to the first data collection in Oxford, the AutoDVT software has been used by the operator in Potsdam for guidance and video acquisition.
In this work, we call the data set from the Oxford Haemophilia and Thrombosis Centre the EVS1 and data from Potsdam EVS2. Since the analysed prototype device is based on a ML computer algorithm, training data and preliminary testing data are required. Thus, preliminary data acquisition was performed on healthy volunteers (n = 246) and nine consenting patient volunteers who were examined for DVT (n = 4 DVT positive). The acquisition has been performed by two radiologists and three trained engineers. We call the data that is used for training of the model training set (Table 7). The volunteers and patients that have been left out from training to monitor the algorithm's performance during development are collected in the internal validation set ( Table 8).
The ML model's task is to annotate vessels, find anatomical landmarks, and analyse vessel compression state automatically. DVT diagnosis is done by automatization of the standard clinical ultrasound compression algorithm in a heuristic computer programme, based on the biometrics acquired from the ML model during the scan. Thus, the ML model has been trained mainly on data from healthy volunteers (n = 246, age range 18-84, BMI < 30) and compression sequences from consented patients with confirmed DVT (n = 9). An overview over the inclusion criteria is given in Fig. 7 and the training data population in Table 7. An overview over the internal validation set is shown in Table 8. All compression sequences have been manually annotated (marking pixels that belong to vein or artery by different colour labels) by a trained workforce (n = 23 trained labellers) including medical students and employees of ThinkSono Ltd to (a) train the algorithm and (b) evaluate its performance quantitatively.
Image quality control has been performed by a medical student according to a specialist-defined scheme. Quality control scoring system. We use a 10-point expert image quality scoring as it is outlined below to curate video data that has not been acquired under AutoDVT guidance and real-time quality control. The quality cut-off, i.e., the minimum required quality has been less or equal to a total score of 20 in this study.    Fig. 6 Costs of the guidance tool vs. net monetary benefit (NMB) per examination when implementing ML-guided DVT diagnostics into clinical diagnostic pathways. The NMB has been simulated with a deterministic model for each of the diagnostic algorithm variants in Fig. 11 at the mean (solid line) and the 95 CI interval (shaded area) from Table 5 to show possible optimistic and pessimistic scenarios. The red lines on the y-axis mark the maximal attainable NMB range when examination costs are zero.    Subjects may contain more than one landmark; thus, subject IDs may be present in the training and internal validation set. Individual sequences are either in one or the other set. Landmarks used for the groin model in this study are LM0-LM4 and those for the knee area are LM8-LM10. Landmarks used for the groin model are LM0-LM4 and those for the knee area are LM8-LM10. Fig. 7 Consort diagram for inclusion of volunteer scans into the training set and internal validation set. Dataset curation for the training and internal validation data. Our approach can be trained from image data that originates predominantly from healthy volunteers.

Ultrasound protocol
Non-enhanced ultrasound imaging was performed by a research physician or radiologist (at least one year of hands-on ultrasound DVT imaging training) using either Clarius L7 (2017) and Clarius L7 HD (2020) or Philips Lumify L7 or GE VScan Extend (scanned with linear probe, only for training data) ultrasound devices. Example images for these scanners are shown in Fig. 8. Two-point compression ultrasound was used for this study. Clinically, a compression is deemed adequate when the vein is compressed fully. A vein that does not compress at the same pressure, at which a healthy vein would collapse, indicates DVT. The femoral vessels were examined from 2 cm distal to the saphenofemoral junction to 2 cm proximal from the inguinal band. The superficial femoral vessels were examined in the adductor canal. The examination of the popliteal vein starts from the distal 2 cm of the popliteal vein and its trifurcation into the anterior tibial vein, posterior tibial vein, and the peroneal vein. The entire examination has been recorded as screen capture videos, cropped to the ultrasound image area without user-interface content and resampled with bilinear interpolation to 150 × 150 pixels. Participants were positioned in a supine position, with the hip rotated outwards by about 60-80 ∘ and the knee flexed at about 60 ∘ . The knee area was examined either supine with neutral hip and knee flexed at 80-90 ∘ or sitting upright with knee hanging loose over the gurney edge at 90 ∘ .

Statistical analysis
Ultrasound has a sensitivity of 94% and a specificity of 97% for DVT detection 24 This allows to provide~95% confidence intervals for the core algorithm's performance for the vessel segmentation and anatomical guidance tasks. The power of this study is above 0.8 at a significance level of 0.05, with a Cohen's d effect size of 0.5, when assuming an effect between 0.9 (without software support n 9 = 697, n 10 = 1107) and 0.95 (with software support, this study n = 53, n = 30) with a standard deviation of 0.1. For this setting, 51 patients are required as a minimum to reach a power of 0.8. We achieve this for EVS1 alone (n = 53) and for the combined analysis of EVS1 with EVS2 (n = 83). The R software package (©The R Project for Statistical Computing) has been used for numerical power analysis.
Algorithms are evaluated at the participant level. To evaluate classifier performance, we calculate sensitivity, specificity, PPV, NPV, and overall diagnostic accuracy for DVT identification for the internal and external validation sets.
We also generate the ROC of the DVT classification score for the external validation sets and calculated the area under the ROC (AUC). We show confusion matrices at the optimal algorithm threshold.  table and Tables 7 and 8 for a description for the location of these landmarks. These example images have been manually cropped and contrast normalised for better readability.

Algorithm design
This study aims to validate the effectiveness of an ML-powered device (AutoDVT) for the diagnosis of proximal DVT. AutoDVT is a CE-marked software product (93/42/EEC 40873) that is coupled to a handheld CE-marked ultrasound machine. The AutoDVT software has two functions: (1) Directing the user to correctly position the ultrasound to complete a thorough scan, and (2) analysing the scan results to confirm the presence/absence of a thrombus. Fig. 9 Overview over the AutoDVT prototype core algorithm. a whole overview and b overview over the individual branches. A U-Net 41 serves as a backbone for automatic delineation of vein and arteries (b). The prediction of the anatomical location of the image is based on our previous work 15 . Network branches predict the anatomical location and whether the vessel is open or closed under pressure. Landmark predictions are performed from the learned numeric representation in the bottleneck layer; vessel compression state is predicted from the output segmentation mask. The network components are connected and can be trained through back-propagation 42 in an end-to-end manner. The input is a stack of nine images (individual video frame images resampled to 150 × 150 pixels) from an ultrasound video stream that moves by one in a sliding window fashion. A single segmentation mask is produced for the last-most image within approximately 25 ms. Two separate models with identical architecture are trained, one for the groin area (LM0-LM5) and one for the knee area (LM8-LM10). Each model holds 31,475,527 parameters. (OC = open/close).
The software uses a fully automated ML vessel segmentation network with auxiliary branches that predict the anatomical location of the ultrasound image relative to the deep veins in the leg and the compression status of the vein (open or closed). Veins have been labelled by a radiologist to be either open or closed and fully compressed. Two networks with identical design/architecture have been trained: one for the groin/thigh area and one for the knee area. The subject IDs overlap between the training set and internal validation set because a sequence can have multiple landmarks but belong to either a healthy patient or a patient with confirmed DVT. See Table 7 for an overview over the algorithm training data and Table 8 for the internal validation data. Annotations include manual delineations of vein and artery cross sections in the images as well as discrete image-level labels for eleven anatomical locations. To facilitate algorithmic evaluation, we have defined anatomically salient landmarks (LM0-LM10) on the common femoral vein, superficial femoral vein, and popliteal vein. Example images for these landmarks, acquired with the different ultrasound probes that are used for algorithm training in this study, are shown in Fig. 8.
To exclude DVT an operator must follow a protocol as instructed by the software. This protocol resembles the clinical practice of three-point or twopoint examinations [33][34][35] , which means doing compression ultrasound in two to three regions where the greatest risk of developing thrombosis occurs. For three-point compression protocols, these regions include: (1) the common femoral vein at the level of the inguinal crease (LM0-LM4), (2) the superficial femoral vein superior in the adductor canal (LM5-LM7), and (3) the popliteal vein and its trifurcation in the popliteal fossa (LM8-LM10).
For two-point compression protocols the same regions are examined except (2), i.e., LM0-LM5 in the groin and LM8-LM10 in the knee. To maximise the overlap between common procedures in the clinics from where our external validation sets originate, we investigate in this study the effectiveness of algorithmically evaluated two-point compression DVT examinations.
Thus, using the training set, the discriminator parts in the ML models are trained on consolidated groups of landmarks LM0-LM1, LM2-LM3-LM4, i.e., two groups, for (1) and one group, LM8-LM9-LM10, for (3). This means three successful vein compressions, two in the groin area and one in the knee area, are required in total to exclude DVT. All identified anatomical locations must show fully compressible veins, otherwise the participant is categorised as suspected DVT case.
Two deep ML networks with identical architecture as shown in Fig. 9 were trained on a GPU server (Nvidia Tesla K80) using the Adam optimizer with momentum 0.9 to optimise the parameters of the network. Binary cross entropy (BCE, Eqs. (1) and (2)) is used for the segmentation task (onehot encoded) and the vein open/closed task. Cross entropy (CE, Eq. (3)) is used for the anatomical landmark detection task as an error metric.
y i log ðpðy i ÞÞ þ ð1 À y i Þlog ð1 À pðy i ÞÞ (1) y i log ðpðy i ÞÞ þ ð1 À y i Þlog ð1 À pðy i ÞÞ Where y is the real label and p(y) is the predicted probability for the image belonging to this label.
L ðveinopenorclosedÞ ¼ Àðylog ðpðyÞÞÞ þ ð1 À yÞlog ð1 À pðyÞÞ The total error metric (loss function) for our network results as where α and β are adjustable hyper parameters. We use α = 100 and β = γ = 1. The PyTorch deep learning framework 36 has been used for our implementation. A series of manually tuned temporal quality control functions ensure robust communication with the user regarding vessel location in the image, quality of compressions, imaging parameters and placement of the probe (Fig. 9). The internal validation set (n = 26 healthy subjects, held out from training) has been used to test the models' performance during development by comparing segmentations to manual delineations of the vessels and manual, categorical image labels with respect to the anatomical locations (LM0-LM10) and the vessel compression status (open or fully closed). For categorical labels, the F1-score is used, And for segmentation masks the Sørensen-Dice Coefficient is applied per label (background, artery, vein), DICE ¼ 2 true positive pixels 2 true positive pixels þ false positive pixels þ false negative pixels (6) Fig. 10 Prototype implementation user interface. The AutoDVT software instructs users to locate a given landmark, instructs to perform a correct compression and evaluates the result automatically.
In addition, the bounding boxes for the individual segmentation masks are generated and the intersection over union (IoU = Jaccard index) is computed, which is a common performance metric for object detection tasks, ¼ Area of overlap with true bounding box Area of union with true bounding box In an end-user scenario a non-expert operator would have three to five attempts to complete a compression, otherwise referral is recommended. A screenshot of the AutoDVT software during use is shown in Fig. 10. During our experiments, all compressions have been competed in under five attempts.
Technical uniqueness of the proposed framework. We propose a triple-task convolutional neural network (CNN) fully integrated into a clinical prototype device that jointly classifies the anatomical landmark plane in the current field-of-view, scores vein compressibility and provides semantic segmentation masks for arteries and veins. The proposed network architecture can intrinsically learn to interpret video data to perform localisation, segmentation, local deformation estimation and classification from weak discrete labels that characterise whole images, e.g., anatomical landmark locations. Furthermore, it is designed to require a reasonably low number of floating-point operations to facilitate real-time performance.

Cost effectiveness
We simulated the potential cost-effectiveness of a ML-enabled approach at the front line of care, where non-specialists may perform the examination independently. A decision tree analytic model was designed and implemented in Microsoft Excel (©Microsoft Corporation) to estimate the lifetime costs and benefit measured in terms of quality-adjusted life years (QALYs) for different proximal DVT testing algorithms. The current clinically used diagnostic DVT algorithm is shown in Fig. 11a and possible integration strategies for our method are shown in Fig. 11b-f. The cost analysis model adheres to guidelines issued by the National Institute of Health and Care Excellence (NICE) 6 . It uses an UK NHS and personal social services perspective with costs at 2018/19 prices and with discounting for both costs and QALYs being undertaken at 3.5% per annum. Note that costs associated with tangible and intangible expenses that families can incur in the event of disability or even death due to  Fig. 11 Possible integration strategies for our approach into DVT diagnostics pathways. a current clinical algorithm to diagnose DVT without software support according to UK NICE guidelines 6 and b-f possible variants to integrate ML software support into the clinical pathway. Algorithms 1-3 shown in (b-d) generate a positive net monetary benefit (cf. Fig. 6). The examined modifications have been suggested by health economics and clinical experts. Note that treatment options may further depend on the age of the clot, which might be manually estimated during confirmatory ultrasound scans 43 . misdiagnosis would not be considered by either NHS or personal social services expenditure and is commonly excluded from a NICE appraisal 6 .
The model uses sensitivity (the ability of a test to correctly identify a patient with a true proximal DVT) and of specificity (the ability of a test to correctly identify a patient without a true proximal DVT) as measured on the external validation sets in this study. We also include clinical tests (Wells Score, D-dimer, and proximal ultrasound) that form part of the diagnostic algorithm. Our cost analysis model splits patients into two subgroups at the start of each algorithm, a subgroup in which patients have a proximal DVT and a subgroup in which patients do not have a proximal DVT. Measured sensitivity and specificity values are used alongside an estimate of the prevalence of proximal DVT of 14.7% taken from Kilroy et al. 37 to estimate the number of patients (from a cohort of user specified size) that receive each clinical test and their ultimate diagnoses (proximal DVT or not). Patients with a diagnosed proximal DVT will receive treatment.
We generate four possible outcomes for patients based on their DVT status and the results of each diagnostic algorithm: Treated patients with a true DVT (true positive patients), treated patients without a true DVT (false positive patients), untreated patients without a true DVT (true negative patients) and untreated patients with a true DVT (false negative patients). Each of the four diagnostic accuracy outcomes have estimated associated costs incurred and utility accrued for the patients. These numbers are multiplied by the proportion of patients in each outcome and are combined with the costs of each test to obtain estimates of the total costs and QALYs for the diagnostic algorithm. When costs and QALYs are obtained for diagnostic algorithms with and without the ML model, the estimated incremental cost-effectiveness ratio for AutoDVT can be calculated.
Parameters for the cost-effectiveness model. Test characteristics have been taken from Goodacre et al. 24 and are presented in Table 9 with the statistical distributions used in stochastic analysis presented in Tables 10-13.
Treatment reduces the probability of a patient with a DVT experiencing a fatal or non-fatal pulmonary embolism (PE) or postthrombotic syndrome (PTS). However, treatment is associated with risks of fatal haemorrhage, non-fatal intracranial haemorrhage, and non-fatal non-intracranial haemorrhage.
According to Goodacre et al. 24 patients who do not experience any of a PE, PTS or a haemorrhage accrue a mean of 11.58 discounted lifetime QALYs. Mean quality of life multipliers for PTS, non-fatal PE and nonfatal intracranial haemorrhage of 0.977, 0.94, and 0.29 respectively were also presented by Goodacre et al. 24 with statistical distributions used in the stochastic analysis presented in Tables 10-13. These data were used to estimate total QALYs for the four diagnostic accuracy outcomes.
The lifetime, discounted, QALYs accrued by patients in each classification differ based on their true DVT status and their results from each diagnostic algorithm. Untreated patients with a DVT remain at high risk of PE and PTS but do not have the risks of haemorrhage associated with treatment. Treated patients with a DVT have reduced risks of PE and PTS but have the risk of haemorrhage associated with treatment. Treated patients without a DVT have the same risk of PE and PTS as the general population but are subject to the risks of haemorrhage associated with treatment. Untreated patients without a true DVT will accrue the same discounted lifetime QALYs as the general population. The QALYs accrued in each of the four diagnostic accuracy outcomes are shown in Table 14.
The discounted lifetime costs associated with patient outcomes were taken from 24,38 . Where appropriate costs were uplifted to 2018/19 values using inflation indices presented in Curtis et al. 38 .
The lifetime costs associated with PTS and non-fatal intracranial haemorrhage were both composite costs including the cost of a first attendance at a vascular surgery outpatient clinic and the cost of  subsequent vascular surgery outpatient clinics visits for PTS and the cost of care in the first year and subsequent years in the case of non-fatal intracranial haemorrhage. The total cost associated with PTS and the method used to calculate these were included in Goodacre et al. 24 together with the costs of the components of the total cost. From this, it was estimated that the expected lifetime of patients with PTS was 11.67 years. No such information was provided for patients experiencing a nonfatal, non-intracranial haemorrhage and thus it was assumed that the same expected lifetime applied when calculating costs. Treatment for DVT consists of approximately eight days of low molecular weight (LMW) heparin followed by ninety days of Warfarin.
The total cost of DVT treatment of £845 is estimated using the same derivation as that used in 24 with one change: The current version of the British National Formulary 39 indicates that the initial dose of LMW heparin in the treatment of DVT is a large loading dose with subsequent smaller maintenance doses, thus the initial loading dose will be associated with a greater cost than subsequent maintenance doses. The costs of LWM heparin and warfarin are taken from the current version of the British National Formulary 39 . Additional resource use such as GP visits and anticoagulant clinic visits and their unit costs 24,38 and NHS Reference Patients without a DVT who are treated 11.

[A]
[A]: The variance on these parameters is based on the variance of the parameters that make up these values.