An automated COVID-19 triage pipeline using artificial intelligence based on chest radiographs and clinical data

While COVID-19 diagnosis and prognosis artificial intelligence models exist, very few can be implemented for practical use given their high risk of bias. We aimed to develop a diagnosis model that addresses notable shortcomings of prior studies, integrating it into a fully automated triage pipeline that examines chest radiographs for the presence, severity, and progression of COVID-19 pneumonia. Scans were collected using the DICOM Image Analysis and Archive, a system that communicates with a hospital’s image repository. The authors collected over 6,500 non-public chest X-rays comprising diverse COVID-19 severities, along with radiology reports and RT-PCR data. The authors provisioned one internally held-out and two external test sets to assess model generalizability and compare performance to traditional radiologist interpretation. The pipeline was evaluated on a prospective cohort of 80 radiographs, reporting a 95% diagnostic accuracy. The study mitigates bias in AI model development and demonstrates the value of an end-to-end COVID-19 triage platform.


Supplemental Discussion
Several notable differences between the proposed and prior studies are the sheer size of the study's training and validation sample (Supplementary Tables 1 and 2), including the COVID-19 images, relative to prior studies, as well as the procedures to mitigate sources of bias, including: Diversity of disease origins, severities, and complexities Original images were collected from an actual influx of ED patients, contributing to a diverse dataset of clinical findings that represent a real-life distribution of disease origins, severities, and complexities. Most prior studies ( Supplementary Tables 1 and 2), however, relied on public repositories, which often are pieced together to represent an unrealistic sample and often exhibit an overrepresentation of severe disease cases as unusual or severe presentations are more likely to be uploaded online.
RT-PCR timing with CXR Radiology reports were leveraged in conjunction with RT-PCR results to detect COVID-19 pneumonia, increasing the confidence that images within the positive class indeed demonstrate COVID-19 pneumonia-related findings. Additionally, the authors considered the timing between CXR acquisition and RT-PCR administration to confirm the validity of ground truth labels. Almost 90% of original CXR scans within the training set were collected within one day of RT-PCR administration, while all CXRs within the test sets were collected within 24 hours of RT-PCR administration. Most prior studies relied on public repositories, which often assign binary values with no supporting documentation, such as RT-PCR data, radiology reports, or patient charts, to validate these findings. As there are often no restrictions for contributors to share COVID-19 CXRs to public datasets, there is no guarantee that positive cases indeed represent COVID-19 disease findings 15 .

Selection bias mitigation via cross-validation
The study employed 5-fold cross-validation to minimize selection bias and increase its sample size for training and validation. As such, the model comprises an ensemble of five models tested on a unique outside fold, mitigating the likelihood of selecting a fortuitously favorable internal validation set. This technique, as demonstrated in Supplementary Table 1, was not observed in prior studies, exposing them to additional sources of potential bias.

Multiple external test sets
The ability for the model to generalize was evaluated on various external test sets for the diagnosis and prognosis components of the triage pipeline. The study demonstrates that the pipeline can accurately output predictions on unseen data and is not overfitted to the training data. As prior studies often do not employ external testing to assess model generalizability, they are likely subject to overfitting and overly optimistic results for two notable reasons. First, most prior studies were trained on public repositories that are likely to have more severe cases of COVID-19, impairing their models' ability to detect early stage disease findings. Second, prior studies mostly relied on the COVID-19 Image Data Collection, which has been demonstrated to exhibit distinct image artifacts 16 . Even with preprocessing techniques, such as lung segmentation, and visualization techniques, such as Grad-CAM, it is impossible to fully discern whether a given model is basing predictions from actual COVID-19 findings or inherent image artifacts without external validation.

Performance comparison to radiologists
The authors compared the performance of the models to those of radiologists. The authors demonstrate that the diagnosis model was able to outperform the average radiologist by a statistically significant margin and correctly detect COVID-19 from 17 of 38 CXRs that were originally marked as normal by the original radiologist, as well as most radiologists from the study. The study, thus, addresses the increasing evidence that COVID-19 at an early stage can be difficult to discern, exemplifying the value of an AI solution in actual clinical workflows.

Public code and model sharing
The authors have published their code ( Supplementary Tables 1 and 2), as well as the trained models they have developed. By publicly sharing the code and model files, as well as deploying the prediction models as web applications, the authors invite other researchers to replicate the study's findings and share their advancements in medical image analysis.
While the study leverages several well-known deep learning methodologies to develop an automated pipeline for rapid triage of COVID-19 patients, the authors have designed a study that has addressed notable risks of biases from prior studies that comprise data integrity and clinical viability of their proposed models. Combined with the value of its associated dataset, the proposed AI and informatics pipeline has immense value as a clinical tool that can streamline COVID-19 triage and improve patient outcomes.