The need for specialist review of pathology in paediatric cancer.

A retrospective histopathological review of 2104 cases of solid tumour was carried out to assess the variability in diagnosis of childhood cancer. Cases were subject to three independent, concurrent opinions from a national panel of specialist pathologists. The conformity between them was analysed using the percentage of agreement and the kappa statistic (kappa), a measure of the level of agreement beyond that which could occur by chance alone, and weighted kappa (w kappa), which demonstrates the degree of variation between opinions. The major groupings of the Birch-Marsden classification were used within which tumours were assigned for kappa analysis according to the clinical significance of the differential diagnoses. The mean agreement for all tumours together was 90%; kappa = 0.82, w kappa = 0.82. Retinoblastoma achieved the highest kappa value (1.0) and lymphoma the lowest (0.66). Of the cases, 16.5% had their original diagnoses amended and the panel confirmed the original diagnosis of paediatric pathologists in 89% of cases compared with 78% for general pathologists. The varying levels of agreement between experts confirm the difficulty of diagnosis in some tumour types, suggesting justification for specialist review in most diagnoses. Specialist training in paediatric pathology is also recommended.

between the reviewers to assess the scale of any variation. We also compared the original diagnoses with those arrived at by the panel.
We recognize that in this analysis we are not comparing 'like with like' in that the current reviewers will have had access to stains and techniques that were not available to the original pathologists.
However, we chose to undertake this investigation to gain some indication of whether there have been changes in the classification of childhood tumours.

MATERIALS AND METHODS
Cases reviewed comprised all solid tumours diagnosed between 1957-92; leukaemia cases were excluded, because of the poor preservation of bone marrow specimens over time. The review was co-ordinated by Dr A Hugh Cameron (AHC), Consultant Histopathologist at the Children's Hospital, Birmingham (BCH) from 1957-84. Each case was subject to three independent, concurrent opinions i.e., those of AHC and two further pathologists; the 13 referees were recruited on the basis of their professional experience and specialist interest in particular tumour groups.
The review diagnoses were based on at least three sections of the material, one stained with haematoxylin and eosin, this being the major routine diagnostic method (Triche, 1992), and the other Pathology in paediatric cancer 1157  two unstained to enable special stains as chosen by the referee if required. No other diagnostic aids were supplied, the pathologists being obliged to treat it as a 'blind' exercise. The review opinions were stored on a computer database and collated when the exercise was completed. When at least two of the three opinions coincided, this was accepted as agreement and confirmed as the final review diagnosis ('consensus'). The Birch-Marsden classification of childhood tumours (Birch and Marsden, 1987) was used, which subdivides the cases into ten major solid tumour groups. Within these, the categories chosen for comparison in the statistical analysis were assigned by three consultant paediatric oncologists (JRM, BJM, MCGS). Distinction was first made between malignant or benign tumour; then opinions were grouped broadly on the basis of differential diagnoses that would involve major treatment variations. Thus, a difference in classification does not merely represent an academic histopathological difference but could have implications for clinical care. Variability of opinion, even between experts, can be summarized statistically in two ways, the first being a simple measure of unanimity (i.e. the proportion in which there was agreement). However, as this does not take account of the role of chance (i.e. a 'best guess' diagnosis in the event of doubt) or subjectivity in the process, a statistical assessment is desirable. In order to evaluate the consistency between the reviewers, we used the kappa (K) statistic (Cohen, 1960(Cohen, , 1968Maxwell, 1977;Altman, 1990), which is based on a nominal scale of categories for analysis and assesses the agreement between independent observers beyond that which would occur by chance (a value of 1.0 indicates perfect agreement). As interpretation of the kappa statistic is subjective, requiring ad hoc assignment (Bland and Altman 1986;Maclure & Willett 1987, Altman, 1990, we chose the following scale: <0.50, 0.5-0.74, 0.75-0.89 and >0.89, representing poor, fair, good and very good agreement respectively. Weighted kappa (wic) analysis (Cohen, 1968;Altman, 1990) was also included to assess the degree of variation when the opinions differed. This process creates an ordinal scale by assigning graded 'weights' (or penalties) to each category outside the diagonal line that links the agreed cases, according to the number of categories by which it differs. Again, a value of 1.0 denotes no variation between opinions. In this setting, wic demonstrates the clinical significance of the disagreements.
British Journal of Cancer (1997) 75(8), [1156][1157][1158][1159] 0 Cancer Research Campaign 1997 The results of the analyses are summarized below, with one category, soft tissue sarcoma (STS), described in detail. Table 1 displays the opinions of two reviewers in the grid formed from the five main categories, the horizontal showing those of the first referee, the vertical those of the second and the diagonal line reveals those cases in which there was agreement (213/243, 88%). The kappa statistic (0.80) shows good agreement between these two reviewers beyond that which could occur randomly and wK (0.89) implies that when disagreement did occur, the clinical significance of the variation was not major. Table 2 illustrates the results of the review for the whole STS group, in which the large number of disagreements resulted in only fair agreement (K=0.72), although the high wK (0.86) suggests that, overall, these disagreements were again not major in terms of clinical significance. Table 3 summarizes the results for the whole series of tumours using the means of the three separate analyses and shows variations in agreement from 100% in retinoblastoma to 78% in lymphoma. The overall level of agreement was very good (90%), with high K (0.82) and WK (0.82) values. Of the ten tumour groupings, only soft tissue sarcoma and lymphoma showed 'fair' agreement (K<0.75). The effect of wK on the analyses is seen in CNS tumours which did not have the highest percentage of agreement (85%) but had the highest wK (0.93) showing that, in the 15% of cases in which disagreements did occur, these were of minor clinical significance. This result could also reflect the number of categories in the analysis, which in tum is influenced by the treatment available for these tumours.
We found that 348 (16.5%) of the 2104 original diagnoses were amended in some way by the review panel. In addition, in 23 cases that had been diagnosed as malignant, the final consensus was that they were not neoplasms, and they were deleted from the Register (Table 4). Table 5 illustrates the results of the comparison between the original and the review diagnoses, in which the difference between the basic percentage of agreement and the kappa values is quite marked. For example, in renal tumours, there was very good agreement (92%) with the original diagnosis, but the relatively low K (0.68) indicates that the agreement beyond chance was only moderate.
In order to identify any indication of differences between those cases originally diagnosed by paediatric and general pathologists, Table 6 compares the cases originating from BCH (diagnosed by a paediatric pathologist) with those from other hospif'als (general pathologists). The panel agreed with the former diagnoses more often than with the latter (K=0.76 and 0.59 respectively). Only in epithelial tumours (carcinoma) was there better agreement with the panel for those diagnosed outside the paediatric centre (80% vs 73%, Ki=0.73 vs 0.62).

DISCUSSION
Expert pathology review has been shown to be important in the diagnosis of cancer at any age (Presant et al, 1986;Segelov et al, 1993). For example , Segelov et al (1993), in their review of adult testicular tumours, found that in 28 out of 87 (32%) patients, the diagnosis made on referral to the specialist centre differed from the original. As paediatric tumours are less common and treatment effects potentially more damaging, specialist diagnostic expertise in referral centres is vital. It has been suggested that the results of variation studies of the pathology diagnostic process could and should have an effect on practice (Machin and Parmar, 1994).
The present large study is the first of its kind in paediatric disease, covering the whole spectrum of childhood solid tumours. In addition to comparing initial and final diagnoses, as has often been done in pathology review reports from clinical trials, we have assessed the level of disagreement between reviewers. Freedman and Machin (1993) identified two main issues in the design of observer agreement studies in pathology review, the first being the number and selection of referees. Ours were selected on the basis of their professional experience and specialist interest in specific tumour types, and each case was subject to three opinions, this being deemed appropriate to allow a consensus to be reached. A second requirement is replicate assessment of slides to quantify how much of the observed non-uniformity is due to intra-rather than inter-observer variability, but this aspect was not assessed in the current study. However, as AHC was both original pathologist for many of the BCH cases and also a member of the panel, we assessed changes in his opinion, as a way of testing for intra-observer differences and found that his diagnosis differed in only 8% of cases (57/664).

Pathologyin paediatriccancer 1159
The 90% overall level of agreement is good. Kappa and weighted Kappa (0.82 and 0.82 respectively) illustrate further the levels of agreement beyond chance and of variation between reviewers and show that, for most categories, these were also good. This 'league table' of results could be seen as a guideline as to which tumour types might benefit from second opinion before treatment is instituted. It is increasingly common practice for collaborative clinical trials in paediatric oncology to demand specialist review, and our results confirm the justification for this as part of good clinical practice as even small levels of disagreement could have clinical significance for the patient. The degree of inter-observer variability reported underlines the inherent subjectivity and possible limitations of a single diagnosing pathologist in difficult cases. Consistency in terminology and nomenclature is essential and could best be achieved through uniformity of specialist training in paediatric pathology.
The comparison between initial and review diagnoses is limited in retrospective reports by the consideration of advances in knowledge. We have therefore attempted to allow for changes in nomenclature, by grouping 'new' diagnoses under their previous classifications (e.g. primitive neuroectodermal tumour (PNET) with medulloblastoma, etc.). Table 5 shows that, in this comparison, there was less good overall agreement than was seen within the review panel (85% compared with 90%). This illustrates that paediatric pathology has advanced and that tumours can now be more reliably identified (although we are unable to specify further the roles in this study of specialist stains or other factors). Technical advances are continuing to be made, thus constantly improving diagnostic precision, and newer techniques, such as molecular genetics (which is currently more of a research tool), will become indispensable. The analysis by originating hospital shows that paediatric pathologists were more likely to arrive at the panel's definitive diagnosis than were general pathologists. An exception was. noted for carcinomas which are generally regarded as 'adult' cancers and which appeared to be more successfully diagnosed by the adult (80% agreement) than the paediatric (73%) pathologists; this presumably reflects their greater familiarity with this form of the disease.
This study shows good overall agreement in the majority of tumour types, although it also demonstrates that, even among experts, identification of clinically significant groupings, based on histopathological examination alone, was not unanimous. This conclusion must be placed in the context that the final diagnosis in the clinical setting does not depend on pathology material alone but is supplemented by other diagnostic information. Our results do not imply, therefore, that any patients will necessarily have been misdiagnosed, and no judgment is implied in the results of this review either of inaccuracy or of infallibility. We do not suggest that there is an association between a change of diagnosis as a result of this review and the appropriateness of original treatment or ultimate outcome. We would simply say that our results support the case for routine review in most childhood tumours to improve the reliability of this component of the diagnostic process.