Main

Symptomatic breast masses and masses identified from mammographic screening are routinely investigated using ultrasound and often ultrasound guided core biopsy (Liston and Wilson, 2010; Willett et al, 2010). Despite the accuracy of greyscale ultrasound in differentiating benign from malignant solid breast masses, such masses usually undergo either image guided core biopsy or short-term follow-up (Stavros et al, 1995).

Static ultrasound elastography, which has been available for many years, provides a colour map of tissue elasticity superimposed on the real-time greyscale ultrasound image. Invasive breast cancers are stiff compared with normal and benign tissues (Fleury et al, 2009) and often show areas of stiffness which are larger than the greyscale abnormality (Itoh et al, 2006; Schaefer et al, 2011). To overcome the lack of quantitative data generated by static elastography, scoring systems comparing the size and distribution of areas of elasticity within the greyscale ultrasound abnormality have been developed (Itoh et al, 2006; Fleury et al, 2009). Static elastography has been shown to have similar diagnostic performance to conventional greyscale ultrasound imaging but poor interobserver variability has prevented its widespread use (Regner et al, 2006; Burnside et al, 2007).

Shear wave elastography allows acquisition of objective measurements of lesion stiffness in kilopascals, unlike static elastography which does not give quantitative results (Athanasiou et al, 2010). Shear wave elastography has been shown to yield accurate information with regard to benign/malignant differentiation of solid breast masses in two previous small studies (Athanasiou et al, 2010; Evans et al, 2010). The small amount of data available suggests good shear wave reproducibility with an intraclass correlation coefficient of 0.80. This contrasts with the poor reproduciblity seen with static elastography (Regner et al, 2006; Burnside et al, 2007; Evans et al, 2010).

There has been only two large published study assessing the diagnostic performance of shear wave elastography combined with greyscale ultrasound to differentiate between benign and malignant solid breast masses (Chang et al, 2011; Berg et al, 2012), and no previous studies have assessed the reproducibility of shear wave elastography when four images rather than two are analysed. The BE1 study addressed the reproducibility of the interpretation of shear wave images but not the reproducibility of shear wave images of the same lesion taken by different operators (Cosgrove et al, 2011). Most shear wave studies have used the mean stiffness findings most useful, however, the BE1 study found the maximum stiffness value most helpful in distinguishing benign from malignant breast masses.

The aim of the study was to assess the performance of shear wave elastography combined with BI-RADS classification of greyscale images for benign/malignant differentiation in a large group of patients.

Patients and methods

Shear wave elastography has been part of the routine breast ultrasound examination of solid breast masses at our institution since November 2009. In accordance with the applicable National Research Ethics Service guidance, ethical approval for the study was not required (National Research Ethics Service, 2008). However, written informed consent to use images was obtained, according to routine practice in our institution.

All patients in our institution with solid breast masses identified during routine breast scans were scanned using the Aixplorer ultrasound system (SuperSonic Imagine, Aix en Provence, France) between 19 April 2010 and 20 December 2010, and subjected to needle core biopsy and/or surgical biopsy according to standard clinical protocols, were included in this study.

The study population included women with symptoms and women with screen detected abnormalities. Approximatley 30 women under 25 years with clinically and sonographically benign lesions did not undergo biopsy and therefore were excluded (Smith and Burrows, 2008; Maxwell and Pearson, 2010). There were no other exclusions. All women were scanned and biopsied by one of three breast radiologists or an advanced radiography practitioner trained to perform and interpret breast ultrasonography. These practitioners had between 5 and 20 years of breast ultrasound experience and had at least 3 months’ experience of performing shear wave elastography of solid breast lesions.

Greyscale and elastography images were obtained during the standard ultrasound appointment. The combined elastography and greyscale ultrasound examination time was between 3 and 10 min, 1–2 min of which were spent on acquisition of the elastography images. The elastography colour map findings were taken into account in the diagnostic management of the patients but the quantitative measurements were produced and analysed later to minimise impact on workflow. Extracting the quantitative data at the end of the clinic took 1–2 min for each lesion.

At least two orthogonal greyscale images of each solid lesion were obtained; these static images underwent retrospective BI-RADS classification by a breast radiologist, blinded to the elastography and pathology findings. To avoid bias, this reader did not participate in the acquisition of elastography images in the study patients. The BI-RADS categories 1–3 were taken as negative since the American College of Radiology (ACR) guidelines state that such lesions can be managed without immediate biopsy. The BI-RADS scores of 4 or 5 were taken as positive (ACR, 2011).

Four elastography images – two in each of two orthogonal planes – were taken of each lesion. The probe was held still over the lesion for about 10 s to allow the shear wave image to build up. If the patients were breathing heavily, they were asked to hold their breath during acquisition. During acquisition no pressure is applied through the transducer to prevent artefactual stiffness from being recorded. There are several basic pitfalls when performing shear wave examinations, such as exerting excess pressure with the probe, not holding the probe still for at least 10 s to allow the elastography image to build, and not placing and sizing the region of interest (ROI) appropriately. However, it only takes a few weeks to learn to avoid these pitfalls and it is our experience that shear wave elastography is not a difficult technique for the experienced radiologist/sonographer. The quantitative elasticity values were obtained by moving a delineated ROI over the colour map. The ROI utilised in all cases was the smallest possible (diameter 2 mm). As the ROI moves, the figures change in real time so the ROI can be moved to the stiffest part of the image. In a subset of 30 patients, an additional observer produced a further set of four elastography images. The average (mean) measurements from the four images were used for analysis. In the 30 cases where a second set of four images was acquired by a different observer, values from the first observer were used for the main analyses but the mean elasticity values produced by the first observer were additionally compared with those produced by the second observer and the intraclass correlation coefficient was calculated.

A cutoff value for mean elasticity of 50 kilopascals (kPa) was used for benign/malignant differentiation on shear wave elastography, as this level had been validated in a previous study (Evans et al, 2010). Maximum elasticity value was also recorded.

Core biopsies were performed using a 14-g automated gun using at least two core passes. When repeat biopsies, vacuum assisted biopsies/removals or surgery were performed, the final diagnosis was used for analysis.

Benign/malignant classification by BI-RADS scoring of greyscale ultrasound scans and by shear wave elastography using the defined cutoff value were compared with histology to give figures for sensitivity, specificity, positive and negative predictive values (PPV and NPV) and accuracy. These performance criteria were then compared across greyscale BI-RADS and shear wave elastography findings. The performance of shear wave elastography combined with BI-RADS classification, whereby a lesion with either BI-RADS 4–5 greyscale scores or an averaged mean elasticity of over 50 kPa was classed as malignant, was also assessed. Histological findings from surgery, if performed, or otherwise from core biopsy were used as the gold standard. Only invasive cancer and ductal carcinoma in situ (DCIS) were classed as malignant.

In all, 173 women with 175 breast lesions were included. The age range was 18–94 years, and the mean age was 56 years. In all, 130 (74%) lesions were in the symptomatic population while 45 (26%) had been detected at mammographic screening. The mean ultrasound size of the symptomatic lesions was 20 mm (range 3–70 mm) and of the screen detected lesions was 17 mm (range 4–80 mm). BI-RADS scores were BI-RADS 2: eight lesions (5%) BI-RADS 3: 41 lesions (23%), BI-RADS 4: 41 (23%), and BI-RADS 5: 85 (49%). The eight lesions reclassified as BI-RADS 2 by the retrospective classifier were diagnosed by biopsy as fibroadenoma (five cases), fat necrosis (one case), and fibrocystic change (one case). Histology of the study group showed 64 benign lesions and 111 cancers (108 invasive cancers and 3 DCIS).

Statistical analyses

Intraclass correlation coefficients for absolute agreement were calculated using PASW 18 software (IBM Corporation, Somers, NY, USA). Fisher’s exact tests, with associated measures, including sensitivity, specificity and accuracy, were used to compare greyscale BI-RADS with shear wave elastography. The null hypothesis was rejected at an α level of 5% (P⩽0.05).

Results

Reproducibility

The agreement in a subset of 30 lesions between measurements of the mean stiffness on four elastography images per lesion, acquired by two independent operators, is shown in Figure 1. The intraclass correlation coefficient was 0.87. Compared with the entire study population, this subgroup showed no significant difference in terms of origin (symptomatic or screening), lesion size, or whether the lesions were benign or malignant.

Figure 1
figure 1

Scatter plot showing agreement between measurements of the mean stiffness on four elastography images per lesion acquired by two independent operators in a subset of 30 lesions. The intraclass correlation coefficient for absolute agreement is 0.87 (95% confidence interval: 0.75–0.94).

Benign lesions

The histopathology of the 64 benign lesions in the study group and their elastography findings are documented in Table 1. The mean US size of these benign lesions was 16 mm (range 3–70 mm). Of the 37 fibroadenomas in the study group, the numbers correctly classified as benign by shear wave elastography, BI-RADS, and the combination of elastography and BI-RADS were 31 (84%), 29 (78%), and 26 (70%), respectively. The mean value for mean stiffness of fibroadenomas was 40 kPa (see Figure 2). Fifteen benign lesions had mean stiffness values above the 50-kPa threshold. These were six fibroadenomas, two papillomas, two showing fibrocystic change, two fat necroses, and one each of surgical scar, dense fibrous tissue and phyllodes tumour. Ten of these lesions also had suspicious greyscale images. The benign lesions most frequently misclassified as suspicious by BI-RADS were fibroadenoma (n=8), fibrocystic change (n=3), fat necrosis (n=3), and papilloma (n=3). Five benign lesions classified as BI-RADS 2 or 3 had suspicious elastography. These were three fibroadenomas, one phyllodes tumour, and one with fibrocystic change.

Table 1 Benign lesion histopathology and elastography findings
Figure 2
figure 2

Shear wave elastography and greyscale ultrasound image of a fibroadenoma. The shear wave image shows the lesion to be blue (soft). The mean stiffness of the lesion was 30 kPa.

Benign lesions of uncertain malignant potential

Nine such lesions, five papillomas, three phyllodes tumours (two benign and one intermediate), and one LCIS were included in the study. Six of these lesions had mean kPa values <50 of which five were also classified as BI-RADS 3.

Ductal carcinoma in situ

Three DCIS lesions (two high grade and one intermediate grade) were identified in the study group. The average US size was 16 mm (range 11–22 mm). Two were correctly classified as positive by BI-RADS greyscale and one by shear wave elastography. The average mean stiffness was 48 kPa (range 25–107).

Invasive cancer

The 108 invasive cancers had a mean US size of 22 mm (range 4–80 mm). These cancers consisted of 91 ductal cancers, 9 lobular cancers, 5 tubular cancers, and 3 cancers of other types (pathological details are shown in Table 2). In all, 105 (97%) of the 108 invasive cancers were correctly classified by shear wave elastography (see Figure 3) and 104 (96%) by BI-RADS. All of the invasive cancers misclassified as negative by one modality were correctly classified as positive by the other (see Figure 4A and B). The histopathological characteristics of the cancers missed by each modality are shown in Table 3. The average mean stiffness value of invasive cancers was 153 kPa (range 14–288).

Table 2 Pathological details of the 108 invasive cancers
Figure 3
figure 3

Shear wave elastography and greyscale ultrasound image of a grade 3 invasive ductal carcinoma. The shear wave image shows the mass and the stroma around the mass to be yellow and red (stiff). The mean stiffness of the lesion was 155 kPa.

Figure 4
figure 4

(A) Greyscale ultrasound image of a solid lesion classified as BI-RADS 3. (B) Elastography image showing high peritumoural stiffness (mean kPa 148). Percutaneous core biopsy and subsequent surgery confirmed the presence of an invasive ductal carcinoma.

Table 3 Characteristics of malignant lesions missed by BI-RADS (1–3) classification of greyscale imaging or shear wave elastography (stiffness <50 kPa)

Diagnostic performance in lesion differentiation

The number of true positive, false positive, true negative, and false negative results for each parameter and for the combination of BI-RADS and shear wave elastography are shown in Table 4.

Table 4 Performance of BI-RADS classification of greyscale images, shear wave values and combined results for 175 solid breast masses (95% confidence intervals in brackets)

For mean elasticity vs greyscale BI-RADS, the performance results were sensitivity: 95% vs 95%, specificity: 77% vs 69%, PPV: 88% vs 84%, NPV: 91% vs 90%, and accuracy: 89% vs 86%. None of these differences in performance parameters were statistically significant. The results for maximum elasticity were sensitivity 97%, specificity 68%, PPV 84%, NPV 92%, and accuracy 86%.

The combination of mean shear wave elastography and BI-RADS greyscale ultrasonography, when a positive result from either was counted as malignant, yielded similar specificity, PPV and accuracy compared with BI-RADS and shear wave separately (all P-values >0.05). The results for the combination were specificity 61%, PPV 82%, and accuracy 86%.

The combination of BI-RADS greyscale and mean shear wave elastography gave a sensitivity of 100%, statistically significantly superior to either BI-RADS or shear wave alone (P=0.03 and P=0.03, respectively). The combination also gave statistically significantly superior NPV (100%) compared with either BI-RADS or shear wave alone (P=0.01 and P=0.02, respectively).

Discussion

Two previous small studies, each with around 50 patients, showed that shear wave elastography had similar or better diagnostic performance compared with BI-RADS classification of greyscale ultrasound images (Athanasiou et al, 2010; Evans et al, 2010). The first of these studies used a prototype shear wave elastography machine which yielded a single elasticity measurement; this gave statistically superior performance compared with BI-RADS classification of greyscale images acquired on a different ultrasound machine. All lesions in the study were mammographically occult and the authors did not address the question of the reproducibility of shear wave elastography measurements (Athanasiou et al, 2010).

The second study indicated that shear wave elastography showed similar performance to BI-RADS classification of greyscale images in differentiating benign from malignant solid breast masses in a group consisting of both symptomatic and asymptomatic women. Both the greyscale and shear wave elastography images were obtained using the same commercially available machine. This study was based on the average mean stiffness measured from two elastography images, one image taken in each of two orthogonal planes. A subset of 15 patients had images taken by two operators and the intraclass correlation coefficient of the average mean stiffness results was 0.80 (Evans et al, 2010).

The large multi-centre BE1 study has recently reported its results (Berg et al, 2012). This study showed that by allowing reclassification of BI-RADS 3 and 4a results on the basis of shear wave elastography results specificity can be improved without adversely affecting sensitivity. The BE1 study found maximum elasticity to have superior performance compared with mean elasticity in benign malignant differentiation. In contrast, we found the results for mean elasticity to be superior than those found using maximum elasticity.

If similar results can be replicated by other investigators, then the use of shear wave imaging in combination with greyscale ultrasound may enable current management of benign appearing solid masses (BI-RADS 3) to be improved by removing the need for core biopsy or short-term follow-up. Currently, such lesions in women aged over 25 years usually undergo either ultrasound guided core biopsy or ultrasound surveillance at 6 monthly intervals.

The inclusion of eight lesions classified as BI-RADS 2 in this current study was because these lesions were classified as BI-RADS 3 by the original operator and biopsied but reclassified by the radiologist reviewing the greyscale images retrospectively.

Reproducibility of diagnostic tests across observers is an important consideration. A previous study of shear wave elastography (Evans et al, 2010) gave an intraclass correlation coefficient of 0.80 for agreement between two operators, based on the averaged values from two images acquired by each. In the current study, the number of images for each shear wave examination has been increased from two to four and as a result, in a subset of 30 patients, the intraclass correlation coefficient for agreement between two operators improved to 0.87 (95% CI 0.75–0.94). Ideally, the entire cohort would have been scanned by two independent operators to provide a larger reproducibility data set. However, the numbers in this subset were sufficient to give an intraclass correlation coefficient with tight confidence intervals so further duplicate scanning was not justified. The two operators in this subset analysis had both undergone applications training spread over 1 week, followed by 3 months experience of performing shear wave examinations.

Because many cancers are not uniformly stiff but have a halo of peritumoural stromal stiffness, it is recommended to keep the ROI as small as possible and to measure the stiffest tissue anywhere within or adjacent to the lesion.

Extracting the quantitative data is not time consuming so generating each patient’s elastography data during their scan appointment time and using the information to influence management (e.g., whether to perform a biopsy or not) is a practical possibility.

In the United States, the move from performing biopsy on all solid breast lesions to placing low risk lesions on short-term surveillance began after Thomas Stavros’s landmark study in 1995 (Stavros et al, 1995). In this study of 750 lesions, strict criteria for the greyscale ultrasound diagnosis of benignity were applied, resulting in the impressive results of 98.4% sensitivity for malignancy and 99.5% NPV. The high sensitivity resulting from application of these strict criteria was achieved at the cost of a poor PPV of 38%. In the current study, the sensitivity and NPV of BI-RADS (95% and 88%) were poorer than in Stavros’s study but the PPV was superior at 84%. Such comparisons are complicated by the very different proportion of benign and malignant lesions in the two study populations because a large proportion of cancers increases the pretest probability and promotes a high PPV, whereas a large proportion of benign cases promotes a high NPV. The higher proportion of cancers in the current study compared with that of Stavros et al is probably due reduced breast awareness in young Scottish women and less recall of probably benign masses at screening in our population. To overcome this problem, Receiver Operating Characteristic (ROC) analysis could be used, but it has recently been argued that ROC analysis of BI-RADS data is not appropriate because BI-RADS categories do not represent an ordinal scale of risk (Jiang and Metz, 2010). As both greyscale ultrasound and shear wave elastography have sensitivities of 95% in our study, the combination of the two will theoretically miss only one cancer in 400, assuming there are no tumour characteristics which increase the chance of both greyscale and shear wave ultrasound misclassifying the lesion as benign. The number of cancers classified as benign by each modality in this study is too small to draw firm conclusions regarding which subtypes of cancers may be misclassified by each modality. However, both modalities miss a range of lesions of different histological grade and type. It has previously been shown that small cancers are less stiff than large ones (Evans et al, 2010). It is therefore possible that shear wave elastography may be less sensitive in a population with a higher proportion of small cancers. The exact sensitivity of a combined greyscale ultrasound and shear wave study is thus unknown. In a clinical setting, the ultrasound and shear wave findings would be interpreted in the context of mammographic features, clinical findings, and patient’s age. Clearly, the pre-test probability of malignancy in symptomatic women is strongly age related. This suggests that even women older than the 25-year age threshold currently applied in most UK centres, who have benign ultrasound and shear wave findings in the context of a clinically benign mass, may not always require biopsy or follow-up.

One downside of such an approach would be that a number of benign lesions of uncertain malignant potential would remain undiagnosed and not removed. In this current study, 55% of such lesions were classified as BI-RADS 3 and had mean stiffness values of <50 kPa.

The combination of BI-RADS assessment of greyscale images and shear wave elastography is extremely sensitive for the detection of malignancy. This study of 175 lesions confirms that the shear wave elastography parameter of mean stiffness is highly accurate in differentiating between benign and malignant solid breast masses and has performance parameters at least as good as BI-RADS evaluation of greyscale images. None of the cancers was negative on both shear wave and greyscale imaging, that is, for the combined modalities the sensitivity and NPV were 100%. If these findings can be replicated, then it may be possible to reduce the number of women subjected to biopsy or short-term follow-up for benign-appearing solid breast masses.