Abstract
Antenatal hydronephrosis (HN) impacts up to 5% of pregnancies and requires close, frequent follow-up monitoring to determine who may benefit from surgical intervention. To create an automated HN Severity Index (HSI) that helps guide clinical decision-making directly from renal ultrasound images. We applied a deep learning model to paediatric renal ultrasound images to predict the need for surgical intervention based on the HSI. The model was developed and studied at four large quaternary free-standing paediatric hospitals in North America. We evaluated the degree to which HSI corresponded with surgical intervention at each hospital using area under the receiver-operator curve, area under the precision-recall curve, sensitivity, and specificity. HSI predicted subsequent surgical intervention with > 90% AUROC, > 90% sensitivity, and > 70% specificity in a test set of 202 patients from the same institution. At three external institutions, HSI corresponded with AUROCs ≥ 90%, sensitivities ≥ 80%, and specificities > 50%. It is possible to automatically and reliably assess HN severity directly from a single ultrasound. The HSI stratifies low- and high-risk HN patients thus helping to triage low-risk patients while maintaining very high sensitivity to surgical cases. HN severity can be predicted from a single patient ultrasound using a novel image-based artificial intelligence system.
Similar content being viewed by others
Introduction
Antenatal hydronephrosis (HN) is a common prenatal ultrasound finding, detected in up to 2–5% of fetuses1. After birth, the condition is closely monitored with up to 80% of cases experiencing resolution without intervention. In the remaining patients, HN may be secondary to a pathologic process, such as ureteropelvic junction obstruction (UPJO), ureterovesical junction obstruction (UVJO), or vesicoureteral reflux (VUR), which may benefit from surgical intervention. The challenge is to risk-stratify patients early in life. However this is currently not possible, therefore babies with HN are monitored with serial ultrasounds, and many will undergo invasive testing, requiring urethral catheterization, intravenous access, and exposure to radioisotopes and radiation. In addition to the anxiety, discomfort, and morbidity related to these additional tests, there is growing concern about the potential link between radiation exposure and future malignancies2. Risk stratification using ultrasound images alone has the potential to streamline care for low-risk patients, reduce the number of patients investigated with invasive tests and help providers to comply with the as low as reasonably achievable (ALARA) radiation principle, while expediting interventions for those that may benefit.
Machine learning (ML) models have shown tremendous promise in healthcare, including for those with HN. Predicting patients most likely to progress to surgery, or those at risk for urinary tract infection (UTI) has been explored using clinical variables3,4. Standardized assessment of anatomical regions of the kidney in ultrasound images has been explored in multiple works including the parenchyma to hydronephrosis area5 the hydronephrosis index using comparing the total kidney area with the renal pelvis area6, automatic segmentation of kidney regions in the ultrasound to predict obstruction7, and morphometric feature extraction from kidney ultrasound8. Others have developed a convolutional neural network model to broadly classify HN as Society for Fetal Urology (SFU) low vs. high grade based on the full ultrasound image of the kidney9; however, the clinical utility of such a distinction is unclear. Providers use this grading system to communicate the severity of HN, but the HN grade alone does not inform clinical decision making. In addition, assigning HN grades relies on the subjective assessment of repeated patient imaging, which has been shown to be highly variable with poor reliability among raters. This introduces a critical bias into these models10,11,12.
Objective
In an attempt to remove the subjectivity of HN kidney ultrasound interpretation and help liberalize access to a reliable assessment tool, we built a model to estimate risk of requiring surgery for patients with HN directly from ultrasound images, following publication of a proof of concept13. We refer to the output this model produces as the HN Severity Index (HSI). Herein, we test this score at 4 large paediatric quaternary care institutions in North America for its ability to discriminate between surgical and non-surgical HN patients with the goal of adapting follow-up and assessment for this condition from the current standard of care (Fig. 1A) to a more streamlined approach, particularly with fewer follow-ups and scans for low-risk patients (Fig. 1B).
Materials and methods
The aim of this study was to evaluate the HSI score for paediatric patients with HN. This model was evaluated by treating surgical cases (i.e. obstructive HN) as the ground-truth label and using area under the receiver-operator curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity (true positives/all positive cases), and specificity (true negatives/all negative cases) to test a prospectively-collected set of 202 consecutive patients from the same institution. HSI was then tested in data from 3 additional paediatric hospitals. A power analysis based on the SickKids training data (Supplementary Methods) was used to assess the power of each HSI test. The prospective sample size from the development institution (SickKids) targeted 80% power at a 0.2 null hypothesis margin. The present study is reported in compliance with the Standardized Reporting of Machine Learning Applications in Urology (STREAM-URO) framework for reporting methods and results of machine learning tools built in Urology (Supplementary Table 8)14.
Ethical review
Each site received approval from their respective Internal Review Boards (IRB) and Research Ethics Boards (REB) for this work. Deidentified data was collected via retrospective chart review and therefore a waiver of consent was applied. Specifically, approval for data collection and analysis was granted by the Hospital for Sick Children REB, the Children’s Hospital of Philadelphia IRB, the Lucile Packard Children’s Hospital IRB, and the University of Iowa Stead Children’s Hospital IRB. All research was performed in accordance with relevant guidelines and regulations.
Retrospective data collection
Data collection included all samples from our original study13 and was extended to include more SickKids samples for training, with additional, new, prospective samples for testing. HN patients who were seen in the paediatric urology clinic between 2015 and 2019, were less than 24 months of age at baseline, and had ultrasound findings of isolated hydronephrosis or hydroureteronephrosis, were included in this study. Patients with vesicoureteral reflux (VUR) were also included if the reflux was diagnosed during the workup of HN and was associated with HN on ultrasound. Patients with VUR detected after a urinary tract infection (UTI) without evidence of HN, as well as those with known congenital anomalies of the urinary tract—such as duplication anomalies, posterior urethral valves and neurogenic bladder—were excluded. The inclusion of children with VUR diagnosed during the work-up of HN allowed for a fair comparison with children who have HN with an unknown VUR status, meanwhile the exclusion of patients with no HN and with more complex anomalies ensured the a priori consistency of the condition being assessed. De-identified kidney ultrasound images along with a linked set of clinical characteristics were retrospectively-collected at each study site. Captured variables included: patient age, sex, kidney laterality,, and any surgical intervention. Ultrasound images and the surgical intervention variable were used to develop and test the model, whereas the remaining variables were used to stratify model performance and assess bias. HN was graded according to the SFU grading system15, and grades were assigned by a paediatric radiologist and by experienced paediatric urology clinicians. One representative sagittal and transverse view were collected by capturing a screenshot in PNG format centered around the kidney. This was done by reviewing the full ultrasound sequence and selecting a sagittal and transverse kidney image which was clearest to the person selecting the image. Images were selected by urologists, trainees, and research assistants who had received training in how to select images, most often using images that had been used to measure the sagittal kidney size and anteroposterior diameter. All kidney images were then resized to 256 × 256 pixels, set to greyscale, contrast adjusted to a uniform histogram equalization and saved as a PNG file. In some cases the images were saved as a different image format first (image preparation 1) and in other cases they were saved as a PNG file always (image preparation 2). This procedure was used for all images from each machine and institution.
Hydronephrosis Severity Index
We defined the HN Severity Index (HSI) as the likelihood HN was secondary to an obstruction. The HSI varies between 1 and 0, with 1 indicating obstruction with certainty and a 0 indicating that there is no probability of obstruction. The HSI threshold was set to target 90% sensitivity in HN patients from SickKids. We propose that different HSI thresholds can be used toward different clinical management decisions at individual institutions or clinical settings. In this work, the threshold and HSI value derived from SickKids was used to assess the transferability of this single-institution management strategy and model to other independent institutions and clinical-management teams.
Surgical indications and confirmation of diagnosis
Obstruction was defined as decreased differential function (< 40%) at baseline, a decrease of ≥ 5% function between serial nuclear scans, prolonged drainage (T1/2) time, progression of HN, and/or development UTI, pain or calculi in the setting of HN (SFigure 4). UTIs were diagnosed by catheter specimen in febrile children with positive urinalysis (leukocyte esterase ± nitrites) and a positive urine culture of > 50,000 CFU/ML of a single organism. For patients deemed to be obstructed, surgical management included pyeloplasty for ureteropelvic junction obstruction16 and ureterostomy17, ureteral reimplantation18 or ureterovesicostomy for ureterovesical junction obstruction19. The diagnosis of obstruction (UPJO or UVJO) was confirmed intraoperatively and supported by pathology when applicable. All surgical patients in this study had an intraoperatively confirmed obstruction. The decision to send specimen(s) for pathological assessment was at the discretion of the surgeon. We did not include surgeries that were performed solely to address VUR, such as endoscopic injection or ureteral reimplantation. Resolution of HN was defined as SFU ≤ 1 or APD ≤ 10 mm on at least 2 consecutive ultrasounds.
Machine learning model
A Siamese convolutional neural network (CNN) was trained from random weights using 2 kidney ultrasound images (sagittal and transverse) and surgery labels only (Fig. 1C).
Model architecture
The original model used in this study is a 7 layer convolutional, Siamese neural network, described in detail in Erdman et al.13, trained to discriminate between obstructed and non-obstructed HN cases. This model uses two 256 × 256 pixel images and passes each through the same (i.e. Siamese) convolutional layers. Here the Siamese architecture of the network is used to regularize the model weights while using images from two different ultrasound views. The channel depth of each image is tripled from one dimension (greyscale) to mimic 3-dimensions (RGB). From there the first convolution applies a kernel of 11 pixels, with a stride of 2 pixels, and 0 padding. The second layer has a kernel size of 5, stride of 1, and padding of 2. The following 3 convolutions have kernel size of 3 with padding of 1, with the sixth convolution having kernel size of 2, stride of 1, and padding of 1, and the final convolution having kernel size of 3, stride of 2, and padding of 0. We then flatten the output and pass it through a fully connected layers, concatenate the output from the sagittal and transverse view and pass the concatenated feature vector through 3 additional fully-connected layers. We do not test alternative convolutional architectures, as our previous work13 showed no significant difference in the performance between the custom architecture described here and DenseNet-12120, ResNet-1821, or VGG-1622.
Model training
Our model was trained and tested using the python 3.8 and pytorch v1.7.0. We trained our model over 50 epochs using a stochastic gradient descent optimizer with a learning rate of 0.001, momentum of 0.9, and weight decay of 5e−4. These parameters were set in our previous work retrospectively validating our model using fivefold cross validation, using grid-search with validation-set performance to identify the best parameter set13. Model training and evaluation was performed using a laptop with an NVIDIA GeForce RTX 2070 Max-Q GPU.
Model selection
We held out a random 20% of our data as unseen test data and trained our model for 50 epochs using fivefold cross-validation. For each epoch, in each fold, we set a threshold where our validation set achieves 90% sensitivity (Supplementary Fig. 3). We then assessed the average sensitivity of our fivefold models in our test set for each epoch, selecting the epoch in which the test sensitivity is > 90% and the specificity is maximized. This epoch was then used as our stopping point in a model trained with only a training/validation split.
Model performance calculation
Model performance was assessed using 4 statistics: AUROC, AUPRC, sensitivity, and specificity. AUROC was computed using the pROC v1.17.0.1 package23 and AUPRC was computed using the EGAD v1.18.0 package24 within R v4.0.2. Sensitivity and specificity were computed by first finding the highest value in the validation set that will achieve 90% sensitivity. This threshold value is then used to split observations in the unseen data into predicted obstructed (above the threshold) and predicted non-obstructed (below the threshold). Sensitivity was then the share of obstructed cases which are predicted to be obstructed and specificity is the share of non-obstructed cases predicted to be non-obstructed. Confidence intervals at α = 5% level were computed using bootstrapping. Specifically, observations were drawn from our dataset with replacement to create 500 simulated datasets of the same size as the dataset for which the confidence interval is being computed. AUROC, AUPRC, sensitivity and specificity were computed for each of these simulated datasets. Each of these statistics were then ordered over the full set of simulated data and the values at the 2.5%- and 97.5%-ile location were used as lower- and upper-confidence level bounds, respectively.
Model evaluation data
Following development and retrospective testing of our model, data for the prospective SickKids sample was collected from patients after they have been evaluated and their treatment decision made. Patients were selected using the same inclusion criteria described in our “Retrospective data collection” section. Model performance was assessed overall and stratified across patient features (age, sex, and postal code) and patient visit features (machine used, date) to assess bias. Patient postal code was used to assess model performance bias across patients from systematically different geographic regions of the province for the Canadian cohort. These are equivalent to American ZIP codes but were only available in our Canadian data. We next evaluated the correlation between HSI and surgical indication at three independent institutions, using the same patient selection criteria: Stanford Children’s Health (Stanford), University of Iowa Children’s Hospital (UIowa), and Children’s Hospital of Philadelphia (CHOP). These data were then passed through the SickKids-trained model to evaluate the ability of this model to generalize to different settings.
Results
The HSI model was trained using a retrospectively-collected dataset of 1938 ultrasound images for 403 patients and their linked health records from SickKids, of which a random 80% was used for training and 20% for testing (Supplementary Table 1). Of the 403 patients, 96 (24%) underwent surgical interventions: pyeloplasty was the most common procedure (74/96; 77%), followed by ureterovesicostomies (19/96; 20%) and 3 ureterostomies/reimplantations (3%).
Model generalization was tested prospectively in 202 consecutive patients evaluated at the SickKids Urology clinic. Of these, 28 (14%) underwent surgical interventions. Similar to the patients in the training set, pyeloplasty was the most common procedure (20/28; 71%), followed by ureterovesicostomies (7/28; 25%) and 1 ureterostomies/reimplantations (4%).
Overall, we found that our model scores produced an area under the receiver operator curve (AUROC) of 93%, which indicates a strong sensitivity/specificity trade-off across model thresholds, and an area under the precision-recall curve (AUPRC) of 58%, demonstrating our model’s precision/sensitivity (recall) trade-off (Table 2). Therefore, the HSI ultrasound model performs well, providing a score from only 2 images with none of the reliability issues inherent with grading HN.
Model performance was next evaluated with a threshold targeting 90% sensitivity, resulting in 93% (95% CI 91%, 100%) sensitivity and 58% (47%, 70%) specificity in the prospective test set (Table 2), which was in alignment with our prior findings from retrospective testing13. The HSI model showed a negative predictive value (NPV) of 99% (95% CI 99%, 100%) and positive predictive value (PPV) of 28% (23%, 34%). Here NPV is favored over PPV as the HSI score is first intended to triage patients with HN unrelated to obstruction and concentrate resources on patients with more likely obstruction.
The HSI model was further evaluated for robust prediction across patient sex, age, the number of previous ultrasounds, ultrasound machine, APD, affected side, postal code, and preprocessing batch to identify groups with low model performance. We found consistently high sensitivity in all patient groups with sample size > 10 (Table 2, Fig. 2, Supplementary Table 5). Specificity fell for patients with high ApD: 9–14 mm (54%) and ApD > 14 mm (16%), leading to a PPV of 10% and NPV of 100% for patients with ApD 9–14 mm and a PPV of 51% and NPV of 92% for patients with ApD > 14 mm (Fig. 2E). Group-specific findings remained consistent when only the most recent patient visits were considered (Supplementary Table 6). Of these patients, the HSI model was 100% sensitive and > 80% specific with only images from their first ultrasound, suggesting that this model indeed provides the opportunity to streamline patient monitoring earlier than SFU grade allows.
Subsequently, we tested the model in three samples of convenience from external institutions with different characteristics, including different distribution of surgical cases (the Stanford dataset had 98 vs. 12, UIowa 16 vs. 53 and CHOP 29 vs. 57 non-obstructed and obstructed patients respectively) and a different age distribution (Table 1). Despite these differences, we found that the model generalizes effectively as shown by the AUROCs of 90% for Stanford, 93% for UIowa, and 92% for CHOP (Table 2). Overall, the model performance when applied to the Stanford dataset was 89% sensitive and 73% specific (99% NPV, 15% PPV), to the UIowa dataset, 96% sensitive and 54% specific (92% NPV, 74% PPV), and to the CHOP dataset, 82% sensitive and 79% specific (68% NPV, 89% PPV) (Fig. 1D, Table 2).
Last, we broke down the model’s performance across patient sex, age and HN side (Table 2). In every institution, we found our model performed as well or better in female patients than male patients, however none of these differences were significant, and neither sex shows significantly lower than 90% sensitivity. We also found no trend in performance across age groups. No institution showed significantly lower than 90% sensitivity in any HN side, except for some low populated subcategories: right-side HN at CHOP (with 61% sensitivity, based on 18 patients) and at Stanford (50% sensitivity in the right kidney, based on 6 patients).
Discussion
Artificial intelligence-driven evaluation of HN patients based on a single set of ultrasound images represents a novel and important opportunity to streamline evaluation, improve access to care and ensure safety for children through standardized clinical management. For years, the value of this technology has been an area of great interest in pediatric urology, and a clear next step in advancing care beyond the use of current classification systems. However, large datasets to build and evaluate models for HN are challenging to collect. In addition, concerns have been raised regarding the consistency of ultrasound images as well as the variability in clinical interpretation and management between different providers and institutions.
The present work tackles this challenge, providing a model that requires users to upload a small number of images and reliably outputs a score that can be easily introduced into the decision-making process. We tested the generalization of our model and found that the HSI can distinguish surgical (i.e. obstructed) HN cases from non-obstructed HN cases. Our power analysis shows that our datasets are sufficiently powered to establish clinically significant sensitivity and specificity. This, coupled with the reliable results independent of the prior experience or bias introduced during the subjective assessment of images is an opportunity to standardize results and effectively compare care between different providers or centers. Moreover, HSI thresholds can be adjusted in different settings, granting the ability to customize the output and refine it as new patients are evaluated.
When evaluating patients with HN, ultrasound imaging is often supplemented with nuclear scans for determination of differential function and drainage time. SickKids samples were used to explore the proportion of nuclear scans that could be avoided if this model were used in clinical practice. To do this, if the model showed HSI < 90% sensitivity threshold, no nuclear scan would be ordered and the follow-up time lengthened. If HSI ≥ 90% sensitivity threshold, it would recommend proceeding with the clinical standard of care (Fig. 1b). Using this strategy, we noted that at the first ultrasound, 18 of 36 (50%) non-obstructed patients would avoid a nuclear scan and 5 of 9 (56%) non-obstructed patients would avoid a nuclear scan following their second ultrasound, with no obstructed patients seeing a deviation in their care (Supplementary Table 3.1). This is consistent with our original finding in a retrospective test set that our model would be able to reduce nuclear scan usage in 58% of non-obstructed patients13.
We therefore find that an important utility of our tool is the opportunity to decrease the burden of HN monitoring and management on institutions and families. Our model can help to identify patients who are unlikely to have HN related to a surgical condition, which may mean these children could undergo evaluation by their primary care provider, with fewer additional ultrasounds, and avoid nuclear scans. Using our model as an opportunity to decrease the frequency and type of monitoring for this population also offers an advantage to families. Many tertiary care facilities are located in large city centers and have wide catchment areas, resulting in some families having to travel long distances for their appointments as access to tests and expertise may be limited outside of these institutions25. The hidden costs associated with visits to hospitals include lost income from missing work, travel, lodging and food which may be a challenge for some families26. In addition, by providing this information to parents and caretakers after the first ultrasound, there is great potential in decreasing anxiety and providing reassurance.
While there is value in the present work, there are important limitations to address. We acknowledge that an agreed upon definition -or a “gold standard”- for the diagnosis of obstruction does not exist. Thus, the decision to intervene surgically is dependent on the individual provider, patient and family, and their assessment of available information. Providers use proxies for determining the presence of clinically significant obstruction such as a decrease in differential function, worsening HN, delayed drainage time or the development of symptoms. Most surgeons tend to rely on the same variables for recommending surgery across paediatric urology clinics27,28,29. In an attempt to objectively confirm the diagnosis of obstruction in our study, we reviewed the documented intraoperative findings as well as with pathology when available. For non-surgical cases, confirmation of non-obstructive HN was resolution without intervention. We believe that these additional measures add validity to our study and partially explain the consistency of our model’s performance across different centers. In addition, we do not have access to consistent patient race or socioeconomic status data from across the data sets used in this study. While we do include data from highly different geographic regions and therefore different racial make-up, a systematic assessment of the model’s performance in patients based on these factors would be of value. Therefore, future work will disentangle the potentially complex interplay this algorithm may have based on patient race and socioeconomic status.
Future work for assessing the HSI will include a cost–benefit analysis for the potential savings this tool could deliver by reducing low-risk patient follow-up and imaging in different clinical settings, including different clinical divisions (e.g. Urology, Nephrology, Emergency Department) and levels of care (primary to quaternary). In addition, this work uses manual selection and cropping of specific views of the kidney. While this selection and cropping can be performed by non-specialist clinicians and technicians, automation of these steps would be beneficial and will be addressed in future adaptations of the model. We will also perform a pilot to assess the optimal deployment strategies for this tool in different settings, for example comparing its use when integrated in a Picture Archiving Communication System (PACS) vs. within the ultrasound machine vs. as a desktop application on clinic computers. Finally, as noted above, we will examine how this model may alter care in settings with different racial make-up of the care providers and patients. Past studies have shown that patient race impacts the timeline to surgery for UPJO30, therefore patient race may not show bias with respect to our model performance but may show a difference in terms of the cost–benefit of using this model for risk stratification. In addition, patient-physician racial concordance has been linked to patient outcomes31,32,33, therefore, this association will also be included as a potential factor for model bias and difference in cost–benefit in future assessments of this model.
Significance
We have demonstrated that our novel algorithm can accurately and reliably distinguish obstructive HN vs. non-obstructive HN directly from ultrasound images alone with independent data from 4 different populations. This carries a significant potential impact on the care and management of children with HN. The model presented here can be used to develop multiple institution-specific models with far fewer training samples than were required for the original model34. This has important implications for smaller medical centers and settings where data collection and storage are challenging.
Conclusion
The HSI score, an artificial intelligence-generated prediction of HN severity based on ultrasound images alone, is accurate and generalizable, and lacks issues related to subjective interpretation. The use of this technology may help reduce invasive testing for those children who may resolve without intervention and expedite care for those that may benefit from it. In addition, this model offers the opportunity for less financial burden to families and institutions and offers the potential for standardization across health care settings and risk stratification for those practicing in remote areas.
Data availability
Code for the models and tables of this work can be found at https://github.com/larunerdman/HN_Replicate and power scripts can be found in https://github.com/ErikinBC/power_roc. As the data for this work contains de-identified private health information, it has undergone approval for ethical use at each institution and can be accessed via reasonable request from the authors and further institution-level ethical and data transfer approval for use.
References
Nguyen, H. T. et al. The Society for Fetal Urology consensus statement on the evaluation and management of antenatal hydronephrosis. J. Pediatr. Urol. 6, 212–231 (2010).
Little, M. P. et al. Leukaemia and myeloid malignancy among people exposed to low doses (<100 mSv) of ionising radiation during childhood: a pooled analysis of nine historical cohort studies. Lancet Haematol. 5, e346–e358 (2018).
Bertsimas, D., Li, M., Estrada, C., Nelson, C. & Scott Wang, H.-H. Selecting children with vesicoureteral reflux who are most likely to benefit from antibiotic prophylaxis: Application of machine learning to RIVUR. J. Urol. 205, 1170–1179 (2021).
Lorenzo, A. J., Rickard, M., Braga, L. H., Guo, Y. & Oliveria, J.-P. Predictive analytics and modeling employing machine learning technology: The next step in data sharing, analysis, and individualized counseling explored with a large, Prospective Prenatal Hydronephrosis Database. Urology. 123, 204–209 (2019).
Rickard, M., Lorenzo, A. J. & Braga, L. H. Renal Parenchyma to hydronephrosis area ratio (PHAR) as a predictor of future surgical intervention for infants with high-grade prenatal hydronephrosis. Urology. 101, 85–89 (2017).
Shapiro, S. R., Wahl, E. F., Silberstein, M. J. & Steinhardt, G. Hydronephrosis index: A new method to track patients with hydronephrosis quantitatively. Urology. 72, 536–538 (2008) (discussion 538–9).
Roshanitabrizi, P., Zember, J., Sprague, B. M., Hoefer, S., Sanchez-Jacob, R., Jago, J., et al. In Standardized Analysis of Kidney Ultrasound Images for the Prediction of Pediatric Hydronephrosis Severity. Machine Learning in Medical Imaging 366–375 (Springer International Publishing, 2021).
Cerrolaza, J. J. et al. Quantitative ultrasound for measuring obstructive severity in children with hydronephrosis. J. Urol. 195, 1093–1099 (2016).
Smail, L. C., Dhindsa, K., Braga, L. H., Becker, S. & Sonnadara, R. R. Using deep learning algorithms to grade hydronephrosis severity: Toward a clinical adjunct. Front. Pediatr. 8, 1 (2020).
Rickard, M. et al. Six of one, half a dozen of the other: A measure of multidisciplinary inter/intra-rater reliability of the society for fetal urology and urinary tract dilation grading systems for hydronephrosis. J. Pediatr. Urol. 13(80), e1-80.e5 (2017).
Keays, M. A. et al. Reliability assessment of Society for Fetal Urology ultrasound grading system for hydronephrosis. J Urol. 180, 1680–1682 (2008) (discussion 1682–3).
Kim, S.-Y. et al. Comparison of the reliability of two hydronephrosis grading systems: The Society for Foetal Urology grading system vs. the Onen grading system. Clin. Radiol. 68, e484–e490 (2013).
Erdman, L., Skreta, M., Rickard, M., McLean, C., Mezlini, A., Keefe, D. T., et al. Predicting Obstructive Hydronephrosis Based on Ultrasound Alone. Medical Image Computing and Computer Assisted Intervention—MICCAI 2020 493–503 (Springer International Publishing, 2020).
Kwong, J. C. C. et al. Standardized reporting of machine learning applications in urology: The STREAM-URO framework. Eur. Urol. Focus. https://doi.org/10.1016/j.euf.2021.07.004 (2021).
Palmer, L. S., Maizels, M., Cartwright, P. C., Fernbach, S. K. & Conway, J. J. Surgery versus observation for managing obstructive grade 3 to 4 unilateral hydronephrosis: A report from the Society for Fetal Urology. J. Urol. 159, 222–228 (1998).
Rickard, M. et al. Evolving trends in peri-operative management of pediatric ureteropelvic junction obstruction: Working towards quicker recovery and day surgery pyeloplasty. World J. Urol. https://doi.org/10.1007/s00345-021-03621-9 (2021).
Shrestha, A. L., Bal, H. S., Kisku, S. M. C. & Sen, S. Outcome of end cutaneous ureterostomy (ECU) as a non conservative option in the management of primary obstructive megaureters (POM). J. Pediatr. Urol. 14(541), e1-541.e5 (2018).
Farrugia, M.-K. et al. British Association of Paediatric Urologists consensus statement on the management of the primary obstructive megaureter. J. Pediatr. Urol. 10, 26–33 (2014).
Alyami, F. A. et al. Side-to-side refluxing nondismembered ureterocystotomy: A novel strategy to address obstructed megaureters in children. J. Urol. 198, 1159–1167 (2017).
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 4700–4708 (2017).
He, K., Zhang, X., Ren, S., Sun, J. Deep Residual Learning for Image Recognition. arXiv [cs.CV]. 2015. http://arxiv.org/abs/1512.03385.
Simonyan, K., Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv [cs.CV]. 2014. http://arxiv.org/abs/1409.1556.
Robin, X. et al. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 12, 77 (2011).
Ballouz, S., Weber, M., Pavlidis, P. & Gillis, J. EGAD: Ultra-fast functional analysis of gene networks. Bioinformatics. 33, 612–614 (2017).
Whitley, J. A. et al. Availability of common pediatric radiology studies: Are rural patients at a disadvantage?. J. Surg. Res. 234, 26–32 (2019).
Otis-Chapados, S., Coderre, K., Bolduc, S. & Moore, K. Evaluating the distance travelled for urological pediatric appointments. Can. Urol. Assoc. J. 13, 391–394 (2019).
Chandrasekharam, V. V. S., Babu, R., Arlikar, J., Satyanarayana, R. & Murali, K. N. Functional outcomes of pediatric laparoscopic pyeloplasty: Post-operative functional recovery is superior in infants compared to older children. Pediatr. Surg. Int. 37, 1135–1139 (2021).
Blanc, T. et al. Retroperitoneal laparoscopic pyeloplasty in children: long-term outcome and critical analysis of 10-year experience in a teaching center. Eur. Urol. 63, 565–572 (2013).
Cost, N. G., Prieto, J. C. & Wilcox, D. T. Screening ultrasound in follow-up after pediatric pyeloplasty. Urology. 76, 175–179 (2010).
Nelson, C. P. Evidence of variation by race in the timing of surgery for correction of pediatric ureteropelvic junction obstruction. J. Urol. 178, 1463–1468 (2007).
Jetty, A. et al. Patient-physician racial concordance associated with improved healthcare use and lower healthcare expenditures in minority populations. J. Racial Ethn. Health Dispar. 9, 68–81 (2022).
Street, R. L. Jr., O’Malley, K. J., Cooper, L. A. & Haidet, P. Understanding concordance in patient-physician relationships: Personal and ethnic dimensions of shared identity. Ann. Fam. Med. 6, 198–205 (2008).
Shen, M. J. et al. The effects of race and racial concordance on patient-physician communication: A systematic review of the literature. J. Racial Ethn. Health Dispar. 5, 117–140 (2018).
Curth, A., Thoral, P., van den Wildenberg, W., Bijlstra, P., de Bruin, D., Elbers, P. W. G., et al. Transferring clinical prediction models across hospitals and electronic health record systems. In PKDD/ECML Workshops (1) 605–621 (2019).
Acknowledgements
We would like to thank the Bitove Family and the Hospital for Sick Children’s Women’s Auxilliary Volunteers for their generous financial support for this project.
Funding
Canadian Institutes of Health Research (LE, AG).
Author information
Authors and Affiliations
Contributions
Conceptualization: L.E., M.R., A.G., A.J.L. Methodology: L.E., M.R., E.D., M.S., A.G., A.J.L. Investigation: M.R., K.S., D.A., K.N.V., M.E.C., J.DS., D.K., M.A.B., C.S.C., G.E.T., J.W., A.X., Y.F., B.V. Formal analysis: L.E., E.D., S.B.H. Visualization: L.E., E.D., S.B.H. Project administration: L.E., M.R., K.S., D.A., K.N.V., M.A.B., C.S.C., G.E.T., J.W., A.X., Y.F. Supervision: L.E., M.R., K.S., N.D.R., C.S.C., G.E.T., Y.F., B.V., A.G., A.J.L. Writing—original draft: L.E., M.R., A.G., A.J.L. Writing—review and editing: L.E., M.R., E.D., M.S., N.D.R., C.S.C., G.E.T., Y.F., B.V., A.G., A.J.L.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Erdman, L., Rickard, M., Drysdale, E. et al. The Hydronephrosis Severity Index guides paediatric antenatal hydronephrosis management based on artificial intelligence applied to ultrasound images alone. Sci Rep 14, 22748 (2024). https://doi.org/10.1038/s41598-024-72271-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-72271-9
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.