Barrett’s esophagus (BE) is a pre-neoplastic condition that is associated with an increased risk of developing dysplasia and esophageal adenocarcinoma (EAC)1,2,3. It is defined as endoscopically visible columnar epithelium containing goblet cells (intestinal metaplasia). Although the American Gastroenterological Association has not specified a length requirement1, the American College of Gastroenterology requires extension at least 1 cm proximal to the gastroesophageal junction4. BE is a genetically unstable metaplastic epithelium that accumulates an increasing number of genetic and chromosomal abnormalities as it progresses to low-grade dysplasia (LGD), high-grade dysplasia (HGD), and eventually EAC5,6,7,8,9,10. Because dysplasia is currently the primary clinical biomarker used to identify patients who are at an increased risk for EAC, clinical guidelines recommend that BE patients undergo periodic endoscopic surveillance with biopsies to detect dysplasia4. This allows for risk stratification and management of BE patients based upon the presence and grade of dysplasia prior to the development of EAC. The detection of HGD usually prompts endoscopic therapy, generally in the form of radiofrequency ablation (RFA) with or without endoscopic mucosal resection (EMR), due to its more frequent association with EAC compared with LGD11,12,13,14,15,16. However, a significant number of BE patients with dysplasia (~20%) are resistant to endoscopic therapy, and recurrences or progression to EAC during endoscopic therapy are not uncommon12,17,18,19,20,21,22,23. As for LGD, although continued surveillance every 12 months is an acceptable approach, there has been a shift toward endoscopic ablation therapy in recent years4,14,15,16,24,25,26. Subsequently, there is greater emphasis on optimization of the diagnosis of dysplasia as well as identification of patients who are more likely to progress to HGD/EAC and/or have a poor response to endoscopic therapy. However, considering the annual cancer risk for patients with non-dysplastic BE (NDBE) is low (0.1–0.5% per year)27,28,29, identification of reliable biomarkers of low risk to allow prolongation of surveillance intervals compared with the current recommendations of repeating endoscopy every 3–5 years in those with NDBE4 remains an important goal of biomarker research.

Even though endoscopic therapy has revolutionized the treatment of BE patients, the current surveillance protocols based on the histologic classification of BE have several shortcomings, including limited sampling of the affected BE segment (leading to false negative biopsy results), sampling error of potentially neoplastic lesions, and interobserver variability among pathologists in the diagnosis and grading of dysplasia, particularly for LGD30,31,32,33,34,35. In fact, there is evidence that the current surveillance protocols may not be effective in reducing mortality from EAC, with one study demonstrating that patients with fatal disease were nearly as likely to have received surveillance (55%) as were controls (60%)36. Furthermore, the rate of missed HGD/EAC (i.e., diagnosed within 1 year of negative endoscopy) is high (19–24%)37, suggesting that early repeat endoscopy, ideally within 1 year of an initial BE diagnosis, may be crucial, although the cost-effectiveness of this approach remains to be determined. Notably, in a recent meta-analysis of 24 cohort studies of NDBE or LGD patients followed for at least 3 years after index endoscopy, Visrodia et al. reported that ~25% of EACs and 27% of HGD/EACs were classified as missed37. When only NDBE patients were considered, the rates of missed EACs and HGD/EACs were 24% and 19%, respectively37.

Consequently, there is an increased interest in ancillary tests that could (1) improve the diagnostic accuracy of dysplasia (and its grading) in challenging situations to avoid a repeat endoscopic examination with biopsies (potentially more expensive than most ancillary tests); (2) predict which NDBE or LGD patients are at a higher risk for developing HGD/EAC (including missed lesions) so that such patients can be identified early and successfully treated with endoscopic therapy to prevent progression to EAC; (3) identify patients who are less likely to develop HGD/EAC so that the surveillance of low-risk patients can be reduced; and/or (4) predict those more likely to have a poor response to endoscopic therapy. In this regard, a variety of biomarkers and assays, such as p53 immunostaining38,39,40,41,42, Wide Area Transepithelial Sampling with Three-Dimensional Computer-Assisted Analysis (WATS3D)43,44,45,46,47,48,49,50, TissueCypher51,52,53,54,55,56,57,58, mutational load analysis (BarreGen)59,60,61,62, fluorescence in situ hybridization (FISH)7,63,64,65,66,67, and DNA content abnormalities as detected by DNA flow cytometry68,69,70,71,72,73,74,75,76,77 have been extensively evaluated. Although none of these studies have comprehensively evaluated the potential utility of these biomarkers in reducing mortality from EAC compared with the current surveillance standards, they have demonstrated a potential benefit when used in combination with histologic findings to assist in the diagnosis and/or risk stratification of BE and dysplasia. As such, this review provides an overview of these biomarkers and tests that appear most promising based on the availability of multiple published results and/or on their commercial availability.

Dysplasia as a biomarker for risk stratification

Currently, dysplasia is the primary clinical biomarker used for risk assessment in the surveillance and management of BE patients. Morphologically, dysplasia is defined as unequivocal neoplastic epithelium that remains confined within the basement membrane of the epithelium from which it developed, and it is classified as (1) negative for dysplasia, (2) indefinite for dysplasia (IND), (3) LGD, or (4) HGD3,30,31,78. The rationale for its use as the primary clinical biomarker is based on the premise that EAC in BE patients develops through a sequence of molecular (i.e., loss of CDKN2A followed by TP53 inactivation and aneuploidy) and morphologic changes that begin with intestinal metaplasia and then progress from LGD to HGD, and ultimately to EAC3,5,68,71,78,79,80,81,82. This is also supported by multiple outcome studies demonstrating a strong correlation of higher EAC rates with increasing levels of dysplasia. While the annual cancer risk for NDBE patients is low (0.1–0.5% per year)27,28,29, HGD is considered a key premalignant step that is associated with a greater risk of either already having EAC or developing it on follow-up (16–100%)83,84,85,86,87,88. The natural history of LGD is more controversial, with variable progression rates ranging from 0.4 to 13.4% per year89,90,91. It is worth emphasizing that it is not often possible to distinguish true progression from missed lesions in these outcome studies. In other words, a patient with LGD may progress from an unsuspected HGD in the same site or elsewhere in the esophagus. In such a case, HGD/EAC detected on follow-up may represent either true progression or a delayed/missed diagnosis.

Unfortunately, dysplasia has a number of limitations as a biomarker. First, dysplasia is often focal and may not be endoscopically visible, so sampling error is a major issue as most surveillance techniques sample only a minority of the BE segment. Although Reid et al. reported that four-quadrant biopsies taken every 1 cm in the BE segment (also known as the “Seattle protocol”) can consistently detect early cancers arising in HGD92, most endoscopists do not adhere to this protocol and take too few biopsies, compounding the problem of sampling error. Second, consistent diagnosis and grading of dysplasia by histology is challenging, as exemplified by a relatively high degree of interobserver variability in the histologic classification of BE among pathologists, particularly toward the lower end of the spectrum30,31,32,34,93. The most pronounced variability is linked to the diagnosis of LGD, with a recent study illustrating sub-optimal interobserver agreement for LGD (kappa = 0.11) even among gastrointestinal (GI) pathologists32. In another study, up to 85% of LGD cases were downgraded to NDBE or IND following expert pathology review91. Even though an excellent interobserver agreement for HGD among GI pathologists has been reported in earlier studies30,31, a more recent study demonstrated that upon review of 485 HGD samples from both academic and private centers by experienced GI pathologists, up to 40% of these cases were reinterpreted as LGD, IND, NDBE, or no BE93. Consequently, both the American College of Gastroenterology and the American Gastroenterological Association strongly recommend that all potential dysplasia cases be confirmed by at least one experienced GI pathologist before embarking on a management plan4,94. This recommendation is further supported by several studies demonstrating a strong correlation between the number of pathologists who agree with a diagnosis of dysplasia and the rate of neoplastic progression. For instance, Skacel et al. showed that the rate of progression was 80% when three GI pathologists agreed on a diagnosis of LGD, while the rate was 41% when two GI pathologists agreed95. Finally, even if the issues stated above could be resolved, there are no observable histologic features in NDBE or LGD on hematoxylin and eosin (H&E) staining that can accurately identify those patients most likely to develop HGD/EAC versus remain stable for years. Indeed, recent studies have suggested that many EACs develop through a more direct, accelerated pathway in which TP53 mutation is followed by doubling of the whole genome, rapidly resulting in genomic instability, oncogenic amplifications, and EAC, rather than through the stepwise accumulation of tumor suppressor alterations96,97. This accelerated pathway to EAC might explain in part why endoscopic surveillance is sometimes unsuccessful in detecting dysplasia before the development of EAC in some BE patients36. Overall, these results suggest that additional or alternative biomarker(s) may be useful to better risk stratify BE patients.

p53 immunostaining as a diagnostic and risk stratification biomarker

Immunohistochemistry (IHC) for p53 to confirm a dysplasia diagnosis or predict likelihood of progression to EAC is of interest but has limitations, as summarized by others38,98. The TP53 gene encodes p53, which prevents mutations. Normal cells have low levels of this protein in their nuclei, but the gene and protein are upregulated in the presence of DNA damage or stress, resulting in DNA repair, growth attenuation, and apoptosis. In dysplastic cells and EAC, mutations in TP53 lead to aberrant nuclear accumulation of abnormal p53 protein (which has a long half-life) that can be detected on immunostaining (Fig. 1A, B). Alternatively, truncating mutations/bi-allelic inactivation of TP53 lead to complete loss of nuclear expression of the protein, termed the “null” pattern (Fig. 1C, D). Light and patchy staining using p53 IHC reflects normal physiologic activity of the protein to maintain cell health and is the pattern of cells that are TP53 wild-type (Fig. 1E, F). However, in one study of p53 staining, aberrant expression was detected in ~10% of cases regarded as non-dysplastic, ~40% of LGD, ~85% of HGD, and all of EACs39. Strong nuclear staining aligns with TP53 mutations but can still be detected in cases of LGD lacking TP53 mutations. Bian et al. reported that although 95% of cases interpreted as LGD had p53 expression on IHC, TP53 mutations were only detected in about a third40.

Fig. 1: Different p53 expression patterns.
figure 1

A, B Strong and diffuse p53 overexpression is seen in a case of HGD. C, D This example shows complete absence of p53 staining (null pattern). The base of the squamous epithelium shows normal positive staining (internal control). E, F Wild-type pattern of p53 staining in NDBE shows scattered, faintly positive nuclei.

Pathologists in many institutions, particularly in the UK and Europe, have advocated for the use of universal p53 IHC in BE cases to detect dysplasia that might be otherwise overlooked, to the point that the British Society of Gastroenterologists endorsed adding it reflexively in routine practice99. The recommendation seemed to reflect studies directed at predicting progression of NDBE to HGD/EAC rather than establishing an initial diagnosis. In one study, scoring p53 immunostaining as “significant” in the presence of strong or absent staining versus “not significant” resulted in kappa scores on the order of 0.6 (strong reproducibility), whereas scoring morphologic features as “negative for dysplasia” versus “IND” versus “LGD” versus “HGD” (4 categories) resulted in kappa scores of 0.3, an unsurprising result since grouping cases into 2 categories versus 4 produces greater observer variability ab initio41. In fact, when the authors grouped the morphologic interpretation into only two categories as they had done for p53, namely “definite dysplasia” versus “no dysplasia” on H&E, they achieved a comparable kappa score of 0.55 for morphology alone, diminishing their conclusions concerning p53 considerably. Nonetheless, the results of many studies have supported the use of p53 IHC as a marker of the likelihood of progression to HGD/EAC in patients whose biopsies show H&E findings of negative for dysplasia, IND, or LGD. The latter studies are summarized by Srivastava et al.38.

More recently, Redston et al. studied “progressors” versus “non-progressors” gleaned from a large commercial laboratory system42. The authors used a retrospective set of over 500 BE patients with or without known progression from negative for dysplasia, IND, or LGD to HGD/EAC. To establish their IHC scoring system (Table 1), the authors obtained DNA for sequencing from 92 BE samples derived from 28 progressors and 6 non-progressors. TP53 mutations were identified from 50 of the samples, specifically from 21 patients who progressed and 3 who did not. In ~90% of cases, the TP53 mutational status correlated with p53 immunostaining results. The authors validated their p53 staining criteria using 50 NDBE and 50 HGD biopsies. They found abnormal p53 staining in 4% of NDBE and 96% of HGD, thereby confirming their scoring criteria. In the testing phase, amongst 646 NDBE patients, 20 progressed to LGD, and 10 to HGD/EAC. Abnormal p53 immunostaining was detected in half of the progressors, resulting in good specificity but poor sensitivity. Essentially, amongst 646 NDBE patients, adding p53 staining offered additional information for only 15, and arguably, progression to LGD is not truly progression. The authors suggested that patients with abnormal p53 expression in NDBE have comparable rates of progression to those who have LGD. They further suggested annual endoscopy for such persons. However, the study was limited by the lack of uniformity of the screening and surveillance methods of the gastroenterologists submitting their materials to the commercial laboratory. Also, it is worth noting that although the risk of progression to HGD/EAC is reported to decline with an increased number of endoscopies showing NDBE100,101, most studies on ancillary tests, including those of p53, do not clarify the number of negative endoscopies prior to the development of HGD/EAC, complicating the interpretation of their outcome data.

Table 1 Scoring method for p53 used by Redston et al.42.

In a 2018 study, Ten Kate et al. reported that simply refining histologic criteria for diagnosis of LGD identified BE patients likely to progress102. Similarly, other refined histologic criteria allowed another group that included one of us (EAM) to essentially eliminate the IND category with excellent prediction of outcome103. Ten Kate et al. also used p53 staining alone and achieved similar results to those afforded by use of H&E alone, with some synergy for the two combined but probably not enough to support reflex testing102. Years ago, two of us (GYL and EAM) were part of a group that also achieved excellent prediction of outcome using H&E alone despite imperfect interobserver variability31,104. We would also point out that the Kaplan–Meier curves for progression of NDBE with and without abnormal p53 staining from the study by Redston et al. do not differ dramatically because so few patients without histologic dysplasia progress regardless of p53 immunostaining status42.

Incorporating reflex IHC for p53 is not terribly expensive in the individual patient, and reimbursement is readily obtained. The 2021 Medicare fee schedule listed a 2021 figure of $99.82 for a global code of 88342 (immunostaining; technical only $67.41) and modified it to $106.07 (technical only $70.82), whereas H&E global code (88305) affords $66.76 (technical only $32.06), which was updated to $71.52 (technical only $33.84). This means that adding a p53 stain increases the cost per biopsy by two and a half fold. This might be prohibitively expensive if p53 staining is added to every single esophageal biopsy demonstrating intestinal metaplasia. No cost analysis was provided by Redston et al.42, although pathologists might be motivated by payments to add p53 staining to all BE samples that lack dysplasia or show LGD. Overinterpretation of normal staining, however, might result in unnecessary surveillance or ablation procedures.

Most laboratories have used p53 immunostaining for years in evaluation of samples from several organ systems. In esophageal biopsies, however, in expert hands, p53 staining is not necessary to diagnose HGD in BE, which is itself an excellent marker for high risk for progression to EAC104, and many gastroenterologists request second opinions for diagnoses of LGD and HGD since either is currently an indication for ablation of the affected segment4,99. The updated 2021 Medicare reimbursement fee for the code for outside consultation (88321) is $102.49, which is slightly cheaper than a p53 stain. Adding p53 may have some value in assessing LGD or adding diagnostic precision for cases regarded as IND103. However, using positive p53 immunostaining to justify endoscopic therapy in IND or LGD patients, when the concordance between IHC and mutation analysis is less than perfect (~90%), may mean overtreatment in ~10% of patients.

WATS3D as a diagnostic biomarker

WATS3D or Wide Area Transepithelial Sampling with Three-Dimensional Computer-Assisted Analysis (CDx Diagnostics, Suffern, NY) is an adjunct test to targeted and random four-quadrant esophageal biopsies using three-dimensional computer-assisted tissue analysis. As discussed previously, the current screening and surveillance guidelines for BE and associated dysplasia require sampling of any visible mucosal abnormality followed by systemic random four-quadrant forceps biopsies obtained at 1–2 cm intervals (Seattle protocol). However, this recommended protocol is time-consuming, labor intensive, and subject to sampling error. As such, the rationale for using WATS3D is to overcome these inherent problems associated with extensive blind sampling43. In WATS3D, abrasive brushes are used to circumferentially sample the esophageal mucosa. The sampling consists of individual cells as well as mucosal strips reported to measure up to 150 μm in thickness. The material is first analyzed by an imaging system using a neural network optimized for evaluation of the esophageal mucosa. The computer system scans, analyzes, and integrates up to fifty 3-μm optical slices. Ultimately, the system builds three-dimensional images of esophageal glands, and flags goblet cells and dysplastic cells that are displayed for confirmation by a pathologist (Fig. 2). The 2019 guidelines of the American Society for Gastrointestinal Endoscopy conditionally endorsed the use of WATS3D based on low quality evidence for screening and surveillance of BE, in addition to Seattle protocol biopsy sampling for patients with known or suspected BE44.

Fig. 2: Representative images of WATS3D.
figure 2

The images show  (A) NDBE, (B) LGD, (C) HGD, and (D) EAC. The images were reproduced with permission from Elsevier47.

Several studies have demonstrated a significant increase in the detection rates of BE and dysplasia when WATS3D was used adjunctively to the combination of both targeted and random four-quadrant biopsies. For instance, in a large multicenter prospective study of 12,899 patients undergoing BE screening and surveillance and evaluated by 58 community endoscopists, WATS3D was reported to detect additional 213 patients with dysplasia (versus 88 cases detected on biopsy alone), increasing the overall detection of dysplasia by 242%46. WATS3D also identified 2570 additional BE cases (versus 1684 BE cases by the combination of targeted and random biopsies alone), increasing the rate of BE detection from 13.1 to 33%. However, among the purported 213 “new” dysplasia cases, 128 (60%) were in fact classified as IND rather than as dysplasia by WATS3D, significantly reducing the reported increased detection rate of dysplasia. Furthermore, the increased detection rate of BE was based on the diagnosis of intestinal metaplasia, but the possibility that the cardia was sampled could not be excluded, further weakening the validity of the results.

Only a small series involving 160 BE patients tackled the issue of HGD/EAC detection by WATS3D  47. In a multicenter, prospective, randomized trial of referred BE patients at 16 medical centers, Vennalaganti et al. reported that the addition of WATS3D to biopsy sampling yielded additional 23 cases of HGD/EAC. Among these 23 patients, 11 were classified by biopsy as NDBE and 12 as LGD/IND. However, the vast majority of these patients (91.3%) had been previously diagnosed with HGD/EAC on prior biopsies, and most (78%) were confirmed to have HGD/EAC on follow-up biopsies. Also, this study was performed in a high-risk BE population at referral centers, and thus it is not representative of community GI practices and of the BE population at large. Nonetheless, this series is important in that it demonstrates an increased detection rate of high-grade lesions by WATS3D, and the biopsy diagnoses had been confirmed by GI pathologists.

Long-term outcome studies of dysplasia diagnosed solely on WATS3D are limited. In a study of over 4000 BE patients who had two WATS3D separated by ≥12 months, Shaheen et al. reported that individuals without dysplasia on WATS3D had a very low risk of progression to HGD/EAC (0.08% per patient-year), while those with LGD had a higher rate of 5.79% per patient-year50. However, as noted by the authors, no comparison could be made to the progression rate of LGD detected by microscopy alone. Interestingly, the authors also evaluated the category of crypt dysplasia and reported its risk of progression (1.42% per patient-year) higher than those with no dysplasia but lower than those with LGD. The heterogeneous nature of this diagnostic category likely explains the results.

There are limitations with WATS3D. First, although a study examining the interobserver agreement among pathologists using WATS3D found substantial agreement for LGD, HGD, and no dysplasia48, the diagnostic criteria used in WATS3D have not been independently tested. Also, the data used to construct the algorithm that creates three-dimensional images of esophageal glands and adequacy criteria used for this test are not well delineated. Furthermore, regenerative epithelial changes in deep glands could be easily misinterpreted as HGD on a single plane of analysis as in WATS3D, an issue that merits additional evaluation49. Another hindrance to the full validation of this technology is that this commercial test is interpreted by a limited group of pathologists. Independent reproduction and evaluation of WATS3D diagnoses in academic settings with expert reviews of all forceps biopsies by specialized GI pathologists (by all means not infallible) would go a long way to address some of these criticisms. Finally, as noted above, whether the progression rate of dysplasia detected by WATS3D differs from that of dysplasia identified by forceps biopsy and microscopy alone remains to be established.

The issue of additional cost of performing WATS3D has not been extensively evaluated. The cost of WATS3D has been reported to be in the range of $700–800 by our endoscopist colleagues, but whether using this commercial test as an adjunct to traditional endoscopic surveillance is cost effective in the long-term management of BE patients remains to be thoroughly evaluated. Also, the potential value of WATS3D in the era of ever improving advanced endoscopic imaging techniques has not been examined. As endoscopists improve their ability to detect ever more subtle lesions previously described as ‘invisible’, it may lessen the need for broad blind sampling, such as WATS3D.

TissueCypher as a diagnostic and risk stratification biomarker

The objective of TissueCypher is to evaluate samples from BE patients diagnosed as negative for dysplasia, IND, or LGD on routine histologic evaluation to identify those patients most likely to progress to HGD/EAC so that intensified screening or ablation can be offered to them. Similarly, the technique is intended to identify patients who are unlikely to progress such that their surveillance can be reduced.

TissueCypher uses immunofluorescent labeling of sections from formalin-fixed paraffin-embedded (FFPE) samples for p16, AMACR, p53, HER2, CK20, CD68, COX-2, HIF-1α, and CD45RO, together with Hoechst staining dye (Fig. 3)51,52,53,54,55,56,57,58. Hoechst dye allows fluorescent detection of DNA105, thereby permitting image analysis software to identify nuclei as discrete objects in tissue. It also allows the software to assess nuclear area, solidity, and DNA content. Some of the markers are combined on the same slide51,52. The slides are then used to perform image analysis with an image analysis algorithm. The image analysis algorithm quantifies 15 different “image features” (Table 2). The quantified image features are then combined into a risk score. Samples are still reviewed in the typical manner (routine diagnosis by local pathologists) and then sections are prepared and subjected to the TissueCypher staining and algorithm.

Fig. 3: Representative images of TissueCypher.
figure 3

A–D show a NDBE biopsy from a 66-year-old man with 8-cm BE segment and who was diagnosed with EAC at a surveillance endoscopy 3.9 years later (progressor). TissueCypher scored this specimen high risk. E–H show a NDBE biopsy from a 69-year-old man with 11-cm BE segment with 5.6 years surveillance data showing no progression (non-progressor). TissueCypher scored this specimen low risk. A and E show p16-green, AMACR-red, and p53-yellow; B and F show CD68-green and COX-2-red; C and G show HIF-1α-green and CD45RO-red; D and H show HER2-green and CK20-red. Nuclei labeled by Hoechst are shown in blue in all panels.

Table 2 Core features assessed by TissueCypher52.

This method offers the advantage of using a variety of markers with a consistent interpretation, thus eliminating interobserver variability, although this does not necessarily mean an accurate interpretation. Using the company’s platform, a risk score for progression is stratified as low, intermediate, or high, but there is some advantage to combining the intermediate and high risk scores. Although similar data are reported in all studies from the TissueCypher team51,52,54,55,56, the initial and some recent studies were performed in Europe, and in 2020, a US-based study from two institutions was added53. The latter study was a case-control study from patients with biopsy diagnoses of negative for dysplasia (n = 227), IND (n = 23), and LGD (n = 18). The samples were from 58 patients who progressed to HGD/EAC (median time to progression of 2.7 years; 7/58 progressed after 5 years), and from 210 patients who did not progress (median surveillance time of 7 years). In this study, the prevalence-adjusted proportions of patients scoring low, intermediate, and high risk using the TissueCypher method were 84.2%, 9.4%, and 6.4%, respectively. The sensitivity and specificity of the test at 5 years for the 3-tier TissueCypher classification (low, intermediate, and high risk) were 29% and 86%, respectively, and 40% and 86%, respectively, for the 2-tier classification (low and intermediate/high risk combined). By comparison, the sensitivity and specificity of an expert diagnosis of LGD were 19% and 88%, respectively, and the sensitivity and specificity of the initial community diagnoses of LGD (i.e., diagnosis recorded in the health records) were 26% and 66%, respectively. Of 51 patients who progressed within 5 years, 14 scored high risk, 6 scored intermediate risk, and 31 scored low risk. Among 210 patients who did not progress, 13 scored high risk, 18 scored intermediate risk, and 179 scored low risk. Using the TissueCypher test, the prevalence-adjusted positive predictive value (PPV) was 23%; i.e., 23% of patients who score high risk would progress to HGD/EAC within 5 years. The prevalence-adjusted negative predictive value (NPV) was 96.4%. The risk prediction test also showed improved risk stratification when compared to p53 alone using the automated scoring.

Overall, expert pathologists’ diagnosis of LGD outperformed TissueCypher in specificity and PPV, but TissueCypher was more sensitive. However, this was not the case for samples from patients with no dysplasia. Patients without dysplasia as confirmed by expert pathologists who scored high risk were at about 5-fold increased risk of progression as compared to patients without dysplasia who scored low risk using TissueCypher. The adjusted PPV for the test in expert pathologist-confirmed NDBE was 26%, indicating that 26% of patients without dysplasia but with a high risk score using TissueCypher will progress within 5 years, a rate similar to that associated with an expert diagnosis of LGD.

TissueCypher is a send-out test (Cernostics, Pittsburgh, PA) and costs about $5,000. A cost analysis sponsored by the company claimed that it would be cost-effective after 5 years by reducing the number of patients requiring surveillance and reducing EAC-associated deaths57. It does not change initial evaluation of patients by routine histology. It has been assigned a CPT code by Medicare (0108U). However, because of the high “up front” costs of TissueCypher, most insurers do not cover the testing. Also, the roughly similar performance characteristics of TissueCypher to histologic evaluation may not justify changing surveillance intervals based on the results, and thus the suggested cost benefit may not materialize, especially given its high cost. As noted above, based on the company’s data, a diagnosis of LGD by an expert pathologist offers more specificity than the test, and obtaining an expert opinion is certainly substantially cheaper. Regardless, the test is consistent, not subject to human observer variation, and outperforms pathologists in identifying patients without dysplasia who are likely to progress to HGD/EAC.

Mutational load (BarreGEN) as a diagnostic and risk stratification biomarker

Mutational load (ML) analysis provides a measure of cumulative genetic aberrations and instability at 10 key genomic loci by assessing DNA damage around tumor suppressor genes associated with progression to HGD/EAC59,60,61,62. It can be assessed using a commercially available test (BarreGEN, Interpace Diagnostics, Pittsburgh, PA), with its main objective being to detect dysplasia and risk stratify BE patients. To perform this assay, H&E-stained slides are first evaluated to identify relevant histologic targets (e.g., LGD) that are micro-dissected from 1 to 3 unstained FFPE sections (4 μm in thickness)60,61,62,106. Greater than 90% of each micro-dissected target should contain epithelial cells, from which purified DNA is prepared. Polymerase chain reaction (PCR) and quantitative capillary electrophoresis methods are performed on all micro-dissected areas. ML specifically assesses the presence and extent (clonality) of loss of heterozygosity (LOH) and new alleles consistent with microsatellite instability (MSI) for each micro-dissected target. The following 10 loci are examined, with associated tumor suppressor genes in parentheses: 1p (CMM1, L-myc), 3p (VHL, HoGG1), 5q (MCC, APC), 9p (CDKN2A), 10q (PTEN, MXI1), 17p (TP53), 17q (RNF43, NME1), 18q (SMAD4, DCC), 21q (TFF1, PSEN2), and 22q (NF2). LOH is categorized as either high (> 75% of the DNA has LOH) or low clonality (50–75% of the DNA has LOH) and assigned values of 1 and 0.5, respectively. The value of the first MSI at a genomic locus is 0.75, and each additional MSI is assigned a value of 0.5. The highest weighted value at each locus is determined based on the values for low and high clonality LOH and MSI at that locus (e.g., the weighted value of high clonality LOH is 1, which is the highest possible weighted value at each of the 10 loci). The sum of all weighted values for all 10 genomic loci is defined as the ML for that micro-dissected target (range: 0–10).

In a case-control study involving 69 BE patients (including 23 progressors and 46 non-progressors), ML assessment was able to risk stratify patients with NDBE or LGD at baseline with respect to progression to HGD/EAC within a mean follow-up time of 4 years61. A mean ML score was significantly higher in progressors (ML = 2.2) than non-progressors (ML = 0.4) (p < 0.001).61 No progressor had a ML of 0 at baseline compared with 54% of non-progressors. Sensitivity was 100% at ML ≥ 0.5, and specificity was 96% at ML ≥ 1.5. Similarly, in a retrospective study of 28 IND patients, patients who progressed to HGD had higher levels of genomic instability (ML ≥ 1.5)62. At this threshold, the risk of progression to HGD was 33% (versus 0% in those with an ML < 1.5; p = 0.005), with a sensitivity of 100% and a specificity of 85%. Overall, these results indicate that genomic alterations as measured by ML often predate the development of HGD/EAC, potentially allowing ML to be a useful biomarker for predicting disease progression. In addition, Ellsworth et al. demonstrated a signficant correlation of higher ML with increasingly severe histologic grade of BE-associated lesions: ML = 1.1 for IND, 2.2 for LGD, and 3.3 for HGD (p < 0.001)59. These results suggest that ML may serve as an adjunctive test in patients with equivocal histology.

The biggest appeal of ML assessment is that it allows a direct correlation with morphology and provides an objective quantitative measure of the presence and extent of molecular alterations associated with development of dysplasia and EAC, eliminating human interobserver variability. However, similar to other PCR-based tests, ML assessment may be hampered by insufficient amounts and/or poor quality of DNA as is often the case in FFPE mucosal biopsies. Also, micro-dissection of tiny histologic targets seen on H&E slides is susceptible to sampling error. Another limitation is that purified DNA rather than crude lysate should be used whenever possible, as ML signal in crude lysate could be “muted”106. In fact, using crude lysate, there was no difference in mean ML between progressors (ML = 0.73) and non-progressors (ML = 0.74) (p = 0.93) in a nested case-control study (involving 48 progressors and 101 non-progressors), failing to validate the previous finding that ML could be useful in risk stratifying BE patients. Finally, as noted above, BarreGEN is not fully validated for commercial use at this time, and it is unclear when this test will be available for clinical use.

FISH as a diagnostic and risk stratification biomarker

FISH (fluorescent in situ hybridization) is a technique that utilizes fluorescently labeled DNA probes to detect chromosomal abnormalities. To explore its potential utility in the diagnosis and risk stratification of dysplasia in BE patients, several studies, all conducted at Mayo Clinic, utilized a 4 locus-specific probe set targeting 8q24 (MYC), 9p21 (CDKN2A; alias P16), 17q12 (ERBB2; alias Her-2/neu), and 20q13 (ZNF217)7,63,64,65,66,67. Each cell is categorized as either normal (i.e., having 2 signals per probe) or abnormal (i.e., having more or less than 2 signals per probe) (Fig. 4). Detectable chromosomal alterations include polysomy (≥3 signals for ≥2 loci), single locus gain (3–9 signals of a single locus and two signals of other loci), amplification of a single locus (≥10 signals of a single locus and two signals of other loci), and single locus loss (0–1 signal of a single locus and two signals of other loci). The test can be performed on either FFPE tissue7,66 or endoscopic brushing specimens63,64,65,67.

Fig. 4: Representative images of FISH signal patterns.
figure 4

A Normal FISH result shows 2 signals of each of the 4 probes. B Homozygous loss of 9p21 shows no red signal. C Polysomic FISH result shows ≥ 3 signals of ≥2 probes. The FISH probes are labeled with Spectrum Aqua (8q24), Red (9p21), Green (17q12), or Gold (20q13) fluorophores. The images were reproduced with permission from Elsevier63.

In a FISH analysis of a range of histologic lesions from 10 esophagectomy specimens from BE patients, polysomy was found to be more prevalent in HGD (88%) and EAC (100%) than in NDBE (<10%) and LGD (57%) (p < 0.001), whereas single locus gain was most commonly observed in LGD (28% versus 12% of NDBE versus 8% of HGD)7. Also, in a recent multicenter study, if ≥10% of cells had polysomy in the specimen, FISH was able to differentiate between HGD/EAC and the remaining histologic diagnoses with a sensitivity of 80% and a specificity of 88%66. Furthermore, in a retrospective analysis of 245 BE patients with a history of HGD, using a cutoff of more than 4 of 100 cells demonstrating polysomy, the risk of EAC was significantly higher within 2 years among patients with a polysomic FISH result (14.2%) compared with those without a polysomic result (1.4%) (p < 0.001)65. In addition, Timmer et al. demonstrated that in BE patients with HGD or intramucosal adenocarcinoma (IMC)  treated with ablation with or without preceding EMR, polysomy was associated with a lower probability of achieving complete eradication of HGD/IMC (HR = 0.57, p = 0.002) in a univariate analysis67. Given its high diagnostic accuracy for identifying HGD/EAC and potential to detect HGD/EAC on follow-up, polysomy in combination with histology may be able to serve as a confirmatory marker of HGD/EAC and screening tool to identify patients at highest risk for subsequent detection of HGD/EAC compared with those with non-polysomy.

Despite these promising results, no reference laboratory currently offers this test (apparently due to the lack of demand), although in the past it was available at Mayo Clinic and Neogenomics. However, many academic centers and commercial laboratories routinely run FISH, which can be completed in a few days, and the commercial availability of these probes (~$800 per probe, Abbott Molecular Inc., Des Plaines, IL) allows these laboratories to validate and bring up the same assay if needed. Also, FISH may be more sensitive than other tests such as DNA flow cytometry (described below) by virtue of having the low threshold for a positive polysomy result (e.g., 4 polysomic cells)63,65,67. Yet, genetic alterations as detected by FISH are limited to specific gain/loss of genes targeted by the probe set, and thus other non-targeted chromosomal alterations would not be detected, potentially missing some high-risk individuals who could be identified by DNA flow cytometry. Furthermore, FISH often identifies cells with minimal DNA alterations, such as 9p21 (CDKN2A) loss, which often do not cause noticeable morphologic abnormalities. Thus, a positive (especially non-polysomy) FISH result does not always indicate the presence of dysplasia.

DNA content abnormalities as detected by DNA flow cytometry as a diagnostic and risk stratification biomarker

Since the 1980s, a number of studies have consistently demonstrated the potential utility of DNA flow cytometry in the diagnosis and risk stratification of dysplasia in BE patients68,69,70,71,72,73,74,75,76,77. Although its availability has been limited to few medical centers due to perceived technical demands and use of fresh tissue in earlier studies68,69,70,71, subsequent studies have successfully employed FFPE tissue for DNA flow cytometric analysis to generate high-quality DNA content histograms, demonstrating the feasibility of this methodology72,73,74,75,76,77. For optimal results, the computer program Multicycle (De Novo software, Glendale, CA) should be used to analyze DNA content histograms68,69,70,71,76,77. The published consensus guidelines for clinical DNA flow cytometry should be followed107,108. Most epithelial cells are normally in the G0/G1 phase of the cell cycle and have diploid (2 N) DNA content, while less than 6% of cells have tetraploid (4 N) DNA content (G2) (Fig. 5A, B). Aneuploidy is defined as an extra G0/G1 peak that is bimodally separated from the normal diploid G0/G1 peak (Fig. 5C, D). The presence of a G2/tetraploid (4 N) fraction greater than 6% (with DNA index of 1.9–2.1) is also classified as abnormal due to its strong association with neoplasia (Fig. 5E, F)5,69,71,76,77.

Fig. 5: DNA content histograms of NDBE and HGD.
figure 5

A, B NDBE is characterized by the presence of intestinal metaplasia, but there is no dysplasia. The DNA histogram shows a normal diploid population (green). C, D HGD is characterized by severe cytologic atypia with enlarged, hyperchromatic, rounder nuclei. The DNA histogram demonstrates a discrete aneuploid peak (red) that is bimodally distinguishable from the normal diploid peak (green). E, F Another example of HGD shows atypical glands lined by highly pleomorphic cells with enlarged nuclei. There is an elevated 4 N fraction greater than 6% (with DNA index of 1.9–2.1) in the DNA histogram. No distinct aneuploid population is present.

A recent retrospective study analyzed 80 FFPE BE samples with HGD, 38 LGD, 21 IND, and 14 NDBE and reported that the frequency of DNA content abnormalities (aneuploidy or elevated 4 N fraction) increases with increasing histologic grade of dysplasia: 0% of NDBE, 9.5% of IND, 21.1% of LGD, and 95% of HGD77. As a diagnostic marker of HGD, the estimated sensitivity and specificity of abnormal DNA content were 95% and 85%, respectively. Interestingly, DNA flow cytometry also identified a subset of LGD and IND patients who are at higher risk for subsequent detection of HGD/EAC, with the univariate hazard ratios (HRs) of 7.0 and 20.0, respectively (p < 0.001)77. Considering that endoscopic therapy is increasingly being recommended for LGD patients26, abnormal flow cytometric results at baseline LGD or IND could potentially enable clinicians to recommend endoscopic therapy, whereas continued surveillance may be an acceptable approach in the setting of normal flow cytometric results. Furthermore, Bowman et al. recently demonstrated that abnormal DNA content in baseline HGD/IMC can serve as a predictive marker of persistent/recurrent neoplasia following endoscopic therapy, with the univariate and multivariate HRs of 3.8 (p = 0.007) and 6.0 (p = 0.003), respectively76. This suggests that the detection of DNA content abnormalities in baseline HGD/IMC may help to identify high-risk BE patients who may benefit from alternative therapeutic strategies (e.g., different ablation technique, combined endoscopic modalities, or endoscopic submucosal dissection) as well as long-term follow-up with shorter surveillance intervals following endoscopic therapy.

There are some advantages of using DNA flow cytometry. First, DNA flow cytometry is an inexpensive send-out test ($350 at ARUP laboratories; CPT code: 88182) that can be completed within 2–3 days. Second, DNA flow cytometric markers of dysplasia or progression (aneuploidy or elevated 4 N fraction) are usually absent in NDBE72,75,76,77,109,110, and features potentially altering the histologic interpretation (i.e., increased acute inflammation or ulceration) do not cause aneuploidy or elevated 4 N fraction, which can be very helpful in evaluating IND cases69,111. In fact, many genetic and chromosomal abnormalities detected in BE (including 9p LOH [site of CDKN2A], 17p LOH [site of TP53], and mutations of TP53 and CDKN2A) tend to occur early and frequently throughout large areas of BE5,6,7,8,9,10,112,113,114, even before the first histologic sign of dysplasia, limiting their utility as a diagnostic or prognostic marker of dysplasia in BE patients.

In conclusion, as the current surveillance methods based on the histologic diagnosis and classification of dysplasia imperfectly assess the risk of BE patients, especially those with IND or NDBE histology, there is an increasing demand for ancillary tests to aid in the diagnosis/grading of dysplasia and risk stratification of BE patients. In cases with equivocal histology, one may argue that a repeat endoscopic examination with biopsies may provide the answer without the need of an ancillary test. However, this approach is likely to be more expensive than most ancillary tests. In this regard, several biomarkers and assays, including p53 IHC, WATS3D, TissueCypher, mutational load assessment (BarreGen), FISH, and DNA content abnormalities as detected by DNA flow cytometry have been demonstrated as ways to support a dysplasia diagnosis and aid in risk assessment for the development of HGD/EAC (Table 3). More importantly, many of these tests are currently available in academic centers and commercial laboratories, and often utilize FFPE, obviating the need to obtain separate samples. Although none of these tools are widely used in practice, there is an increased interest among gastroenterologists to pursue ancillary tests in BE surveillance biopsies, as they have shown promising results in identifying early neoplasia and could potentially serve as adjuncts to histologic evaluation. By providing information that cannot be assessed by morphology alone, especially if the cost is reasonable (i.e., cheaper than repeat endoscopy with additional pathologic evaluation), these tests may become attractive tools, especially for patients with inconsistent diagnoses, IND, or LGD histology. Like many molecular tests (e.g., next-generation sequencing) currently used in the diagnosis and management of many diseases, incorporating these tools in the management of BE patients, in conjunction with histologic evaluation, may allow for more precise surveillance and/or earlier treatment in patients at higher risk of progression, while avoiding unnecessary interventions or surveillance in those at lower risk. Prospective studies on these biomarkers (including assessment of their potential utility in reducing mortality from EAC) as well as cost-effective analysis compared with the current surveillance methods are singularly missing. Until these comprehensive data exist, it is impossible to fully evaluate their potential impact and better tailor their potential roles in the care of BE patients.

Table 3 Summary of biomarkers or ancillary tests.