An AI-based algorithm for the automatic evaluation of image quality in canine thoracic radiographs

The aim of this study was to develop and test an artificial intelligence (AI)-based algorithm for detecting common technical errors in canine thoracic radiography. The algorithm was trained using a database of thoracic radiographs from three veterinary clinics in Italy, which were evaluated for image quality by three experienced veterinary diagnostic imagers. The algorithm was designed to classify the images as correct or having one or more of the following errors: rotation, underexposure, overexposure, incorrect limb positioning, incorrect neck positioning, blurriness, cut-off, or the presence of foreign objects, or medical devices. The algorithm was able to correctly identify errors in thoracic radiographs with an overall accuracy of 81.5% in latero-lateral and 75.7% in sagittal images. The most accurately identified errors were limb mispositioning and underexposure both in latero-lateral and sagittal images. The accuracy of the developed model in the classification of technically correct radiographs was fair in latero-lateral and good in sagittal images. The authors conclude that their AI-based algorithm is a promising tool for improving the accuracy of radiographic interpretation by identifying technical errors in canine thoracic radiographs.


Database creation
The archives of three different veterinary clinics -namely the Veterinary Teaching Hospital of the University of Padua (Legnaro, Padova, Italy), the Pedrani Veterinary Clinic (Zuliano, Vicenza, Italy) and the Strada Ovest Veterinary Clinic (Treviso, Italy) were used in this project.Three different X-ray systems were used (1-FDR D-EVO 1200 G43 (Fujifilm Corporation) digital radiology (DR) at the Veterinary Teaching Hospital of the University of Padua, 2-a Isomedic RT 800 MA (Isomedic S. r.L) at the Pedrani Veterinary Clinic, 3-FCR PRIMA T2 (Fujifilm Corporation) at the Strada Ovest Veterinary Clinic).Canine thoracic radiographs, acquired in latero-lateral (both left and right) and in sagittal (both ventro-dorsal and dorso-ventral) projections were collected from the databases of the three institutions.

Image analysis
The images were assessed simultaneously by three of the authors (TB, SB, and EV, with 13, 5 and 1 years of experience in veterinary diagnostic imaging respectively) in a Digital Imaging and Communication in Medicine (DICOM) format using a freely available image visualization and analysis software (Horos, Nimble).The tags were assigned following a consensus discussion.The tags used for the evaluation of image quality were: (a) correct, (b) rotated (rotation was evaluated by checking for superimposition of opposite ribs in latero-lateral images, and of the sternum and vertebral column in sagittal images), (c) underexposed (an image was classified as underexposed if quantum mottle was evident or if the pulmonary structures were not clearly evident due to an overall lack of detail), (d) overexposed (when some portions of the image were completely black), (e) limbs (if the limbs were incorrectly positioned), (f) neck (if the neck was too flexed or too extended), (g) blurred (if motion artifacts were seen, with evident distortion of the anatomical structures), (h) cut (if a portion of the thorax was excluded from the radiograph), (i) foreign object (if any examples of these, or medical devices, were present).All the tags, except for "correct", were not mutually exclusive and therefore a multi-label deep-learning approach was used.The evaluation of exposure is, to a certain extent, subjective and, therefore, to make the evaluation more objective, a radiograph was rated as underexposed if quantum mottle was evident within the entire image, especially affecting the bony trabecular pattern 17 .On the other hand, a radiograph was rated as overexposed if only some areas of the radiograph remained completely black despite changing brightness and contrast.The position of the limbs was rated as incorrect if a superimposition of the limbs on the thoracic structures was evident.The position of the neck was rated as incorrect in the case of abnormalities in the position of the trachea (over-extension or over-flexion) in latero-lateral radiographs.Neck mispositioning was not considered in sagittal radiographs.

Deep learning
The DICOM files were initially converted to the MetaImage Medical Format (MHA) format, resampled to 224 × 224 pixels and normalized by a Z-normalization specific to the ResNet-50 network.The ResNet-50 pre-trained on ImageNet was used, since previous research has indicated that it provides the most accurate results for X-ray classification with a limited size datasets [11][12][13][14] .The architecture was then fine-tuned on the aforementioned database with a multi-label setting, as the quality classes were not mutually exclusive.Binary cross-entropy was employed as the objective function, the Adam algorithm as the stochastic optimizer, and an exponential scheduler was used to reduce the learning rate after each epoch.The images set was randomly split into a training, validation and test set comprising 80%, 10% and 10% of the images respectively.The training set was augmented online through standard transformations, including affine transformation, random cropping, flips, and contrast changes.The training was conducted on a workstation (Linux operating system; Ubuntu 18.04, Canonical) devoted to deep learning, equipped with four GPUs (4x Tesla V100; NVIDIA and Canonical), a 2.2 GHz processor (Intel Xeon E5-2698 v4; Intel) and 256 GB random access memory.The evaluation metrics were not directly optimized or utilized during training, nor was the metadata related to the source institution deployed to guide the training process.

Statistical analysis
All the statistical analyses were performed using a custom-built Python programming language script (Python Software Foundation; the Python Language Reference, version 3.6; available at http:// www.python.org).The performance of ResNet-50 was evaluated by means of the receiver-operator characteristics curve (ROC) and the area under the receiver-operator characteristics curve (AUC); the sensitivity, the specificity, and the positive and negative likelihood ratios (PLR and NLR, respectively), along with their 95% confidence intervals, were also calculated.The performance of ResNet-50 for each quality parameter was rated as excellent (AUC ≥ 0.9) high (0.9<AUC ≥ 0.8) fair (0.8 < AUC ≥ 0.7), or poor (AUC < 0.7) 17 .All P-values were assessed at an alpha of 0.05.

Ethics approval
This study was conducted respecting the Italian law 26/2014 (that transposes the EU directive 2010/63/EU).As the data used in this study were part of routine clinical activity, no ethical committee approval was required.Informed consent regarding personal data processing was obtained from the owners.

Database
Overall 6028 latero-lateral and 4053 sagittal radiographs were included in the database.Left and right laterolateral projections were grouped together.In the same way, ventro-dorsal and dorso-ventral (sagittal) radiographs were also grouped together.The number of radiographs for each tag are listed in Tables 1, 2. As multiple quality

Classification results
The complete classification results for the radiographic quality indices are reported in Tables 3, 4. Applying the proposed AI-based tool on the latero-lateral radiographs resulted in variable performances for the different quality indices: in fact, it had an excellent accuracy only for limb mispositioning and a high accuracy for blurriness, foreign object, underexposure, overexposure, rotation and neck mispositioning.The accuracy in classifying normal radiographs was only fair.The overall accuracy was 81.5 %.On the sagittal radiographs, only 8 images were classified as blurred and therefore this latter quality index was not included in the model.The performance of the proposed AI tool on sagittal radiographs was high for all the considered quality indices except for underexposure, which was excellent (AUC = 0.92).The overall accuracy was 75.7%.

Discussion
The present study suggests that deep learning may be a valuable tool for automatically evaluating the quality of both sagittal and latero-lateral canine thoracic radiographs.This option would be highly beneficial in situations where an expert veterinary radiologist is not readily available, such as when centres rely on external consultation services or when an expert radiologist is only occasionally present.Overall, the ability to automatically evaluate image quality has the potential to improve efficiency and effectiveness in the veterinary medical imaging field.
In this prospective quality-improvement study, the quality criteria for chest radiographs were derived from the indications given in textbooks 2 , while also incorporating elements from prior works on the automatic evaluation of chest radiographs in human medicine 15,16 .Radiographic abnormalities were evaluated by the authors based on their expertise in veterinary diagnostic imaging, which thus involved some degree of subjectivity.In order to, at least partially, overcome this subjectivity, the radiographs were evaluated simultaneously by three different experienced operators.
Not surprisingly, one of most common quality issue encountered on our database was a lack of parallel (in 840 latero-lateral radiographs) and perpendicularity (in 1018 sagittal radiographs) between the animal and the detector, labelled as "rotated" in this paper.This quality index is also frequently reported in human medicine, with Nousiainen et al. 16 proposing an automated methodology for chest radiograph quality control using convolutional neural networks (CNNs).Rotation was evaluated subjectively during that study, and the deep learning-based approach had an AUC of 0.72 for detecting a quality issue of that type.Instead, the model presented here, demonstrated a higher accuracy (AUC of 0.84) for rotation, likely due to the larger size of our training database.Another study, by Meng et al. 15 , also examined the automatic evaluation of human chest X-rays, including the assessment of rotation.However, it is difficult to directly compare the results of our study with those of Meng et al. 15 as the methods used were quite different; in fact, Meng et al. 15 developed a complex method to automatically measure the degree of rotation.However, the accuracy of this latter method for detecting rotation was limited.
In the present study, the accuracy for classifying both underexposed and overexposed radiographs was high, with AUCs between 0.84 and 0.92 in the different datasets.This result was rather unexpected because the radiographs included in the study were obtained using both computed radiology (CR) devices and direct radiology (DR) systems.It is known that underexposure appears slightly differently in CR than in DR 16 .Nonetheless, the high accuracy achieved in this study suggests that the developed algorithm was able to identify common features of underexposure in both modalities.To the best of our knowledge, this is the first study proposing a deep learning-based algorithm to evaluate such quality indices and, therefore, a comparison with similar studies is not possible.
The presence of any foreign object on the radiograph was recorded and included in the quality indices.While these foreign objects are not a quality issue in and for themselves, they can sometimes obscure important areas of the image, making it difficult to detect certain lesions.Most of the time, these objects are medical devices that are vital to the patient (e.g.metallic clips, tracheal or oesophageal tubes, chest drainages).To the best of our knowledge, the influence of foreign bodies on the accuracy of AI-powered diagnostic tools has not yet been investigated.However, it can be postulated that their presence might interfere with the interpretation of the images by the algorithms, as these objects are superimposed on thoracic structures.
Mispositioning of the limbs is a common issue in latero-lateral radiographs, and this can hinder interpretability due to the superimposition of the shoulder and forelimb muscles and bones on the cranial portion of the thorax, potentially obscuring lesions in that region 15 .The developed network had a high accuracy (AUC = 0.93 on latero-lateral, and AUC = 0.92 on sagittal) in detecting this technical error, suggesting that it was readily identified by ResNet-50.In our opinion, this quality index is less prone to subjectivity, and the evaluation by the three experienced radiologists may have been more consistent, leading to the high accuracy of the network.
One limitation of this study is that the respiration phase was not considered among the quality indices.Other similar studies in human medicine have included this quality index in their analysis 16 .We elected not to include inspiration among the quality indices because there are no objective criteria for evaluating the appropriateness of the respiratory phase in the literature, and such an assessment would therefore be very subjective and prone to high inter-and intra-rater variability.
The overall accuracy of the generated system exhibited a slightly superior performance on latero-lateral images (total accuracy 81.5%) than on sagittal images (total accuracy 75.5%).It is the authors' opinion that this discrepancy is largely due to the smaller size of the sagittal image database in comparison to the latero-lateral Vol.:(0123456789) Scientific Reports | (2023) 13:17024 | https://doi.org/10.1038/s41598-023-44089-4www.nature.com/scientificreports/issues were present in several radiographs, the total number of tags exceeded the total number of radiographs.1252 latero-lateral and 854 sagittal radiographs were discarded as belonging to skeletally immature dogs.All the included radiographic tags were included in the training, validation and test sets.Example images of some of the included tags for latero-lateral and sagittal radiographs are reported in Figs. 1 and 2 respectively.

Table 1 .
Summary of the radiographic abnormalities detected on the training, validation and test sets of the latero-lateral radiographs.

Table 2 .
Summary of the radiographic abnormalities detected on the training, validation and test sets of the sagittal radiographs.

Table 3 .
Performance of ResNet.50 in the test set of the latero-lateral radiographs.Values in parenthesis are 95% confidence intervals.

Table 4 .
Performance of ResNet 50 on the test set of the sagittal radiographs.Values in parenthesis are 95% confidence intervals.