Needle-based storage-phosphor detector radiography is superior to a conventional powder-based storage phosphor detector and a high-resolution screen-film system in small patients (budgerigars and mice)

This method comparison study used radiographs of 20 mice and 20 budgerigars to investigate comparability between computed radiography (CR) and high-resolution screen-film systems and study the effects of reduced radiation doses on image quality of digital radiographs of small patients. Exposure settings used with the mammography screen-film system (SF) were taken as baseline settings. A powder-based storage-phosphor system (CRP) and a needle-based storage-phosphor system (CRN) were used with the same settings (D/100%) and half the detector dose (D/50%). Using a scoring system four reviewers assessed five criteria per species covering soft tissue and bone structures. Results were evaluated for differences between reviewers (interobserver variability), systems and settings (intersystem variability, using visual grading characteristic analysis). Correlations were significant (p ≤ 0.05) for interobserver variability in 86.7% of the cases. Correlation coefficients ranged from 0.206 to 0.772. For mice and budgerigars, the CRN system was rated as superior to the SF and CRP system for most criteria, being significant in two cases each. Comparing the SF and CRP system, the conventional method scored higher for all criteria, in one case significantly. For both species and both digital systems, dose reduction to 50% resulted in significantly worse scores for most criteria. In summary, the needle-based storage-phosphor technique proved to be superior compared to the conventional storage-phosphor and mammography screen-film system. Needle-based detector systems are suitable substitutes for high-resolution screen–film systems when performing diagnostic imaging of small patients. Dose reduction to 50% of the corresponding dose needed in high-resolution film-screen systems cannot be recommended.

screen-film systems are contrast limited 3 . In digital radiography two different approaches have been developed. On the one hand, there are computed radiography (CR) systems with a storage plate and separate read out process. Then there are direct digital radiography systems, where x-ray photons are directly converted into electrical charges 4 . One established type of CR detector is the conventional powder-based storage phosphor detector (PIP), which consists of small phosphor particles dispersed in a binding agent 5 . A more recently developed CR detector is the needle-based storage phosphor detector (NIP). Here, the phosphor particles form a crystalline needle-structure that is oriented perpendicular to the detector surface 5 . Comparing the technical aspects, NIPs have a higher conversion efficiency than PIPs, resulting in a higher signal-noise-ratio while using identical exposure settings 6 . Smans et al. 7 , using a computer model, showed that the threshold-contrast detectability of their simulated NIP system was superior to the also simulated PIP system. In preclinical trials, NIPs and PIPs have been used on phantoms, where NIPs depicted lower contrast levels better than PIPs 8 . When tested on phantoms for chest radiology, NIPs were significantly superior to PIPs regarding image quality and the potential for dose reduction 9,10 . In one phantom study, dose reduction of up to 68% of the initial dose was possible 11 . In clinical trials, a dose reduction of 50% on NIP systems produced images that showed no significant differences in image quality compared to PIP images at 100% of the dose 12 . In neonatal chest radiology a NIP system was preferred by reviewers in comparison to a PIP system, here dose reduction of 20% was possible without detectable loss of image quality 13 .
In veterinary medicine, various digital detector systems have been tested for dogs, cats and large animals such as horses [14][15][16] . Data concerning the use of digital detector systems for birds, snakes and lizards, with body masses ranging from 123 to 847 g has also been published [17][18][19] . In general practice, veterinarians are consistently confronted with even smaller patients. Animals like budgerigars and mice, with body masses ranging from 30 to 50 g, make high demands on x-ray technique due to their delicately structured anatomy and their high respiratory rate which demands a shorter exposure time.
To the authors' knowledge, no studies have been conducted to evaluate the use of computed radiography in patients with body masses lower than 100 g. We wanted to explore the implementation of these methods, since radiography represents an affordable and reliable diagnostic means in standard veterinary practice. The objective of this method comparison study was to investigate whether or not image information generated with CR systems is at least equivalent to that acquired by high-resolution screen-film systems. Furthermore, we wanted to study the effects of a reduced radiation dose in computed radiography on the visibility of structures in these very small animals. As model species, budgerigars and mice were used as representatives for small rodent and pet bird species regularly seen in small animal and specialized practices. Additionally, mice are a commonly used species in laboratory animal science. We especially wanted to include an avian species in our study to take into account some species specific features of avian anatomy, such as air sacks, pneumatized bones and typical position of the intracoelomic organs.

Materials and Methods
In this study, two different CR storage systems, namely AGFA DX-S (Agfa Healthcare, Bonn, Germany), a needle-based detector technology (referred to as system "CR N " for needle-based CR detector), and the powder-based storage phosphor detector system Fuji HR/Philips AC 500 (Philips Healthcare, Hamburg, Germany; system "CR P " for powder-based CR detector) were used in comparison to the high-resolution mammography film-screen system KODAK MIN-R S Film (Eastman Kodak Company, Rochester, United States) (referred to as System "SF" for screen-film system). For information on technical equipment and exposure settings see Table 1.
The radiographs were made using a Bucky-table unit (Philips Bucky Diagnost TH, Philips Healthcare, Hamburg, Germany), but placing the detector on the table and the animals directly on it. Exposure settings were adjusted on the bases of the dose requirements of the film-screen system. The settings identified to generate images of adequate brightness were subsequently used as "100%-dose" in the digital CR systems (D/100%). The halving of the detector dose was achieved by halving the mAs value (D/50%). In the digital systems the displayed values of the dose indicator were used as additional tools for dose control. Dose-Area Product (DAP) measurements were performed for all systems to monitor uniformity of exposure. For the digital CR systems, system-specific processing algorithms were used. In pre-studies, the parameters of these processing algorithms were evaluated with regard to detail visibility. For the Philips Healthcare system (CR P ) the parameters were set to unsharp mask filtration with gamma type (GT) E, density shift (GS) of 0.27, rotation center (GC) of 1.80, rotation amount (GA) of 1.03, frequency rank (RN) of 9, frequency type (RT) T, frequency enhancement (RE) of 1.00 and System X-ray system www.nature.com/scientificreports www.nature.com/scientificreports/ Kernel size of 7 with the help of the Workstation Easy Vision Rad Release 4.2. L5 (Philips Healthcare, Hamburg, Germany). For the Afga Healthcare system (CR N ) the self-adaptive MUSICA 2 software (Agfa Healthcare) and automated algorithms based on default values were used.The settings were consistently used for each detector system, so the process of detection and conversion has to be evaluated as a whole.
Procedure. In this prospective study, twenty mice with a mean body mass of 38 g, ranging from 32 g to 43 g, and twenty budgerigars with a mean body mass of 41 g, ranging from 33 g to 57 g, were included. The animals did not show any signs of illness regarding their individual history and gross examination that could lead to an expectation of radiologically detectable abnormalities. All radiographs were taken only in right lateral recumbency within the first two minutes of isoflurane induced general anesthesia (isoflurane 2.0-2.5%, 100% oxygen induced via mask). The budgerigars were positioned using a plexiglass avian restraint board. To allow best comparability of the radiographs, all five radiographs of each animal were taken in immediate succession using the same technical settings. The study was approved by the local animal welfare authorities (Landesdirektion Sachsen, No. A01/11) and conducted according to the German animal welfare regulations.
Four reviewers received soft copies of the images taken with two different storage systems and two different doses (100% and 50% each), as well as the radiographs on film. To prevent biased interpretation, observers were unaware of the animal identification, due to the removal of metadata from the digital images and blinding of all images using a DICOM Anonymizer (https://sourceforge.net/projects/dicomanonymizer/) with randomly chosen, unconnected three-digit numbers. The workstation was equipped with two medical gray-scale monitors (EIZO MX240W, matrix: 1920 × 1200 pixel, dot pitch: 0.27 mm; luminance: 320 cd/m 2 , contrast ratio: 850:1; Avnet Technology, Nettetal, Germany). A commercial medical image analyses software was used (GOP-View XR2-T, Contextvision, Stockholm, Sweden).
To be consistent with the practical routine of digital image reading, the radiologists were encouraged to apply the entire workstation functionality to record as much information as possible. Films were analyzed using a light box (Planilux DX, luminance: 4,700 cd/m 2 , Planilux, Warstein, Germany) and a focal spot light (Planilux Irisleuchte 70; luminance: 25,000 cd/m 2 , Planilux, Warstein, Germany) A magnifying glass (×4), and brightness adjustment could be used, if and as necessary. Evaluation time per image was unlimited. The ambient light and other conditions of the viewing environment fulfilled the requirements for medical image interpretation 20 .
Scoring system. To determine differences between the systems, visual grading characteristics (VGC) analysis, a method previously used in comparable studies, was conducted [17][18][19] . In VGC studies, predetermined image quality criteria are assessed by multiple reviewers, subjectively rated, here in accordance to an absolute visual grading analysis (VGA), and the scores then compared using a method similar to receiver operating characteristics (ROC) analysis 21,22 .
Criteria were selected with regard to their lack of obviousness, as an overall good rating would prevent the generation of meaningful results. Some criteria were suitable for evaluation of detail rendition (Mice: trachea and spine, Budgerigar: tracheal rings, spine and humerus), others for evaluation of contrast resolution (Mice: heart, lung, abdomen, Budgerigar: lung, kidney).
A four-step scoring system was implemented for each of the five criteria with 1 being the best 4 being the worst score. The observers were trained for their task using a separate set of images. The images were evaluated independently by four observers with a minimum of two years of experience with digital radiography.
For details on the criterion and scoring definition see Tables 2 and 3, and Figs 1 and 2.
Statistical analysis. In total, 800 assessments of radiographic images were taken into consideration for this study. The Spearman's rank correlation coefficient was applied (IBM SPSS Statistics 20, IBM, Armonk, NY) for the evaluation of the interobserver variability. Correlation was considered to be significant with p < 0.05 and highly significant with p ≤ 0.001. To evaluate the effect size of correlation, we referred to Cohen 23 . Correlations under 0.1 were considered negligible, correlations between 0.1 and 0.3 were considered low, correlations between 0.3 and 0.5 moderate and correlations over 0.5 were considered high. Additionally, interobserver agreement was calculated using Cohen's kappa test. Significance levels and effect sizes were considered the same as described above. Mean values, scoring frequencies and 95%-confidence intervals were calculated to facilitate comparison between systems and reviewers. A receiver operating characteristics (ROC) Analysis was applied for intersystem variability (Sigma Plot 11, Systat Software Inc., San José, CA). The obtained VGC curve graphically demonstrates the comparison of two systems. In case of an equal rating, the curve would be a diagonal resulting in an area under the curve (AUC) of 0.5 18,19 . The more one system is rated superior, the more the curve moves to the respective axis, therefore changing the area under the curve value towards 0.0 or 1.0.

Results
Interobserver variability. For criteria in mice, the average scores given by the individual reviewers were 2.82 (reviewer 1), 2.32 (reviewer 2), 3.02 (reviewer 3) and 2.13 (reviewer 4). For criteria in budgerigars, the respective average scores were 2.36, 2.23, 2.90 and 2.28. Interobserver correlations were calculated for all criteria in both species, adding up to 60 rank correlation values, hereafter referred to as cases that were significant in 86.7% (52/60) of the cases, and highly significant in 75.0% (45/60) of the cases. Regarding the evaluation of criteria for mice, the reviewers' scores correlated in all but three cases significantly with a Spearman's r ranging from 0.206 to 0.582 and a mean value of 0.354. Those three cases occurred for different criteria and different reviewers. For the budgerigars, the reviewers' scores correlated for all criteria except for the humerus. For this criterion only one out of six correlations was significant. For the budgerigars, the Spearman's r ranged from 0.222 to 0.772 with a mean value of 0.503. The interobserver agreement over all criteria was significant for mice and budgerigars between all reviewers. The only exception was an insignificant agreement between reviewer 3 and 4 regarding mice. The Budgerigars. Mean scores for the different criteria ranged from 1.99 for the humerus through 2.32 for the tracheal rings, 2.48 for the kidney and 2.70 for the cervical spine to 2.72 for the lung. Mean values separated by criterion and system/technique can be found in Table 4.
At 100% of the dose, the CR N system received a mean score of 2.11, the CR P system a score of 2.31 and the SF system a score of 2.27. The CR N system was evaluated as superior to the mammography screen-film system SF for five out of five criteria, with the score differing significantly in one case (p = 0.024). Also, the CR N system scored higher than the CR P system for four out of five criteria, one of which was highly significant (p ≤ 0.001). When comparing the SF system with the CR P system, the conventional method scored higher for three criteria, the CR P system for 2 criteria. The differences were not significant.
At 50% of the dose, the CR N system received a mean score of 2.64, the CR P system a score of 2.88. When comparing the CR N system at 100% and 50%, the 100% dose always received a better score, the difference being highly significant for four out of five criteria (p ≤ 0.001, p = 0.002, p ≤ 0.001, p = 0.002). When comparing the CR P system at 100% and 50%, the 100% dose always received better scores, as well. Here the differences were significant to highly significant for five out of five criteria (p = 0.01, p ≤ 0.001, p ≤ 0.001, p = 0.04, p = 0.03).

Abdomen
Assessment of the size of abdominal organs (e.g. kidneys) Heart   www.nature.com/scientificreports www.nature.com/scientificreports/ All in all, the systems scored higher at full dose than at half dose, for both mice and budgerigars. Further details are shown in Table 5.

Discussion
Evaluation methods. Visual grading characteristic (VGC) analysis is a method to evaluate the performance of different radiographic systems and has previously been applied in similar studies [17][18][19] . Evaluating physical parameters is important to standardize examination procedures but it does not necessarily allow predictions about the clinical performance of radiographs 24 . Visual grading in contrast uses anatomical criteria for a visibility assessment 25 , therefore offering an objective link to clinical interpretations. Although we used a standardized study setup, there is still potential bias arising from individual differences, in this case of the reviewers rating the images 26 . The reviewers were chosen with regard to their experiences in radiology, as well as experience in the interpretation of avian radiographs. A training session was held beforehand to reduce divergence in scoring. However, the mean scores varied by about one point for mice and about half a point for budgerigars leading to a low to moderate agreement. While the reviewers were not necessarily expected to give the same scores, which is reflected in the level of agreement, it was still interesting whether the tendencies of scoring were the same. Therefore, even though one reviewer marked the radiographs significantly worse than the other three, correlations were significant for all criteria, except for the criterion humerus in budgerigars. This criterion always received good scores regardless of the system or dose, displaying a limited discriminatory power when trying to draw comparisons between different settings. Therefore it can be assumed that the criterion did not fully meet the expectations for VGC analysis. In contrast, the budgerigars' cervical spine, which consists of more delicately structured bones, scored lower. Here the high number and small size of the single vertebral bodies could influence interpretability. The other criteria in budgerigars ranged between these extremes. Due to the air sacs, radiographic images have good coelomic contrast and superimpositions in the coelomic cavity are less extensive than in the mammalian abdomen, possibly explaining why the mean scores for mice differed more than the mean scores for  www.nature.com/scientificreports www.nature.com/scientificreports/ budgerigars. Here, the criterion spine received an exceptionally good score, while the criterion abdomen scored lowest. A probable explanation is that discerning different soft tissue organs in the abdomen is more challenging. In total, the choice of criteria enabled the provision of a broad distribution of scores.

Species Structure System
Occurence of scores www.nature.com/scientificreports www.nature.com/scientificreports/ Comparison of radiography systems. The systems tested in this study are of great relevance for the use in veterinary practice, as they represent both a conventional technique that has long been recommended for small patients as well as an established computed radiography system. The needle-based storage phosphor system is a promising technique that is commonly used in human medicine, but is not yet common in veterinary practice. Relating to the mean scores at 100% dosage, the CR N system ranked first, the mammography SFsystem second and the CR P system third. Concerning the powder-(CR P ) and needle-based (CR N ) storage phosphor detectors, this corresponds with the findings of studies on human chest radiography 8,9,11,12 . Due to the production process, powder-based storage phosphor detectors like the CR P show a weakness concerning image sharpness, as light scatters on the powder particles. In the mice radiographs, for example, the CR P system was evaluated significantly worse than the other two systems regarding the criterion trachea. The trachea is a small-sized structure which is often overlaid by surrounding tissue. This demonstrates the limitations of this specific digital storage system.
When it comes to needle-based storage phosphor detectors, the needles work like optical fibers, allowing higher image sharpness due to less scattering of photons during exposure and improved conduction of photons to the detector resulting in little extinction on the way 5 . As mentioned above the criterion trachea in the mice radiographs, for example, scored significantly higher for the CR N system than for both the SF and CR P system, leading to the conclusion that needle-based detectors have a higher potential to depict small-sized soft tissue structures. Regarding the budgerigar radiographs, significant differences occurred for the criterion lung, where the CR N system scored better than the SF system and the CR P system, indicating that delicate soft tissue structures like the honeycomb structure of the lung are also best depicted with needle-based detectors. The mammography film system ranked in the middle. This shows parallels to an early study of Bacher et al. 27 , where the mammography system also managed to score better than a powder-based storage phosphor detector. For instance, the criterion spine in the mice radiographs proved to be evaluated best on the conventional mammography system. Even though mammography films are optimized for discerning soft tissues, their spatial resolution is superior in comparison to digital radiographic systems, so they have an advantage in depicting miniscule bone structures. However, studies have shown that detail detection is sometimes scored better using the digital system 28 . Due to their superior dynamic range, the digital systems score better in terms of soft tissue structures. Similar experiences have been made in preceding studies 17 .
When talking about digital radiography, direct digital radiography such as flat panel detectors, in contrast to the computed radiography systems used in this study, also need to be included in the discussion as they are widely used in practice. Studies have shown, that mammography films performed worse than flat panel detector system 27 . However,when comparing NIPs and the more modern flat panel detectors, reports vary. Tests on human cadavers suggested that flat panel detector performance and dose reduction potential is inferior when compared to NIPs 4 . Another source stated that calcification detection is significantly reduced when using NIPs in comparison to a flat panel system 29 . Effects of dose reduction. Dose reduction was predominantly compared within each system. Data showed that in both species and both systems dose reduction lead to significantly worse scores for the majority of the criteria. Exceptions from the image deterioration were the criteria kidney for budgerigars and heart for mice using the CR N system and the criterion abdomen for mice in both systems. While the kidneys, which are surrounded  Table 5. Summary of the statistical analyses stating significant occurrences in intersystem variability through statistically calculated AUC values. *Significant (p ≤ 0.05), **highly significant (p ≤ 0.001). Interpretation: an equal assessment of the criterion would result in a value of 0.5. The more one system is superior, the more the value tends to 1.0 (first system) or 0.0 (second system). "CR N " refers to AGFA DX-S (Agfa Healthcare, Bonn, Germany) standing for digital needle-based detector, "CR P " refers to Fuji HR / Philips AC 500 (Philips, Hamburg, Germany) standing for digital detector and "SF" refers to KODAK MIN-R S (Kodak, Stuttgart, Germany) for a conventional mammography screen-film. The numbers 100% and 50% refer to the percentage of dosage used in the trial. Significance levels refer to the deviation from the AUC value 0.5 when applying the ROC-analysis.
www.nature.com/scientificreports www.nature.com/scientificreports/ by air sacs, seemed to be contrasted enough to always score in the same range regardless of the radiation dose, the criterion abdomen generally scored poorly because of a lack of contrast between the abdominal organs.
In comparison to the SF system at 100% of the dose, dose reduction within the CR N , as well as the CR P system lead to significantly lower scores for nearly all the criteria. We based the exposure settings on the dose requirements of the film-screen system and analogically used the same setting for the digital systems. Mammography films traditionally require higher doses due to their sensitivity class and higher resolution than standard film-screens 30 , so it is surprising that the same high doses are needed for digital imaging in small patients to receive similar scores in the assessment.
Based on these results, dose reduction cannot be recommended for digital radiographic imaging in small patients. However, reduction was assessed in only two steps (100%, 50%), which gives only a rough impression of the effect of dose reduction. Further studies that focus on the gradual effect of dose reduction on image quality are needed. A transition from conventional to digital radiographic imaging does not automatically entail dose reduction for small patients.
Other studies considered dose reduction possible for animals with a body mass of around 400 to 500 g 17,18 , while dose reduction in animals with 200 g body mass already led to detectable loss of quality 19 . An adequate radiation dose, implementing the ALARA principle, is therefore especially essential in very small animals to achieve radiographic images that allow for sufficient clinical evaluation.
Limitations of the study. The study results were very similar for two species from very diverging classes, indicating that the conclusions might also be relevant for other small species, even belonging to other classes, such as reptiles. Although we used a limited number of patients (n = 20) and only two species were assessed separately, the obtained results provide valid recommendations for practitioners in small mammal and avian medicine.

Conclusion
The digital needle-based storage phosphor technique (CR N ) proved to be superior in depiction of different delicate structures of small animals with a mean body weight of 30 to 50 g, in comparison with the conventional storage-phosphor system (CR P ) and -except for the detection of delicate bone structures -also in comparison to the high-resolution screen-film system (SF). Needle-based detector systems can therefore be recommended as a substitute for conventional systems when performing diagnostic imaging in very small patients. As in conventional radiography, an appropriate radiation dose is also essential within the digital systems to achieve images that allow optimal interpretation.