Reproducibility of measurements of potential doubling time of tumour cells in the multicentre National Cancer Institute protocol T92-0045

We compared the flow cytometric measurement and analysis of the potential doubling time (Tpot) between three centres involved in the National Cancer Institute (NCI) protocol T92-0045. The primary purpose was to understand and minimize the variation within the measurement. A total of 102 specimens were selected at random from patients entered into the trial. Samples were prepared, stained, run and analysed in each centre and a single set of data analysed by all three centres. Analysis of the disc data set revealed that the measurement of labelling index (LI) was robust and reproducible. The estimation of duration of S-phase (Ts) was subject to errors of profile interpretation, particularly DNA ploidy status, and analysis. The LI dominated the variation in Tpot such that the level of final agreement, after removal of outliers and ploidy agreement, reached correlation coefficients of 0.9. The sample data showed poor agreement within each of the components of the measurement. There was some improvement when ploidy was in agreement, but correlation coefficients failed to exceed values of 0.5 for Tpot. The data suggest that observer-associated analysis of Ts and tissue processing and tumour heterogeneity were the major causes of variability in the Tpot measurement. The first two aspects can be standardized and minimized, but heterogeneity will remain a problem with biopsy techniques. © 1999 Cancer Research Campaign

Radiotherapy remains an effective treatment modality in cancer management, being administered to 50-60% of all patients with malignant disease. Improvements in the success of radiotherapy can be expected from better dose prescription and the rational use of altered fractionation schedules and radiomodifiers and by the selection of patients for the most appropriate treatment.
Proliferation of tumour clonogens during treatment is thought to be a major reason for the failure of conventional (6-7 weeks) fractionation schedules to cure some tumours (Withers et al, 1988;Fowler and Lindstrom, 1992). If tumours that are likely to undergo rapid repopulation could be identified prior to treatment, reducing the overall treatment time using accelerated fractionation schedules (Peters et al, 1988;Fowler, 1990;Dische et al, 1997;Horiot et al, 1997) might increase the probability of controlling them.
Currently, the use of the halogenated pyrimidines and flow cytometry (FCM) is considered to be the best method to measure proliferation rates in vivo as the technique requires only a single biopsy to measure the potential doubling time (Begg et al, 1985). However, evidence that the T pot measurement can predict the outcome of radiotherapy has still to be established unequivocally (Begg, 1995). The reasons for this are twofold. Firstly, many of the studies in which T pot measurements have been applied to radiotherapy patients suffer from small patient numbers (Begg et al, 1990;Bourhis et al, 1993;Corvo et al, 1993). Secondly, the accuracy and reproducibility of the T pot measurement may not be adequate.
In 1992, a cooperative was set up to address these two issues in a large multicentre study of conventional radiotherapy. The NCIsponsored T92-0045 trial consists of 23 European centres, which set out to accrue 1000 patients in four tumour localizations (head and neck, rectum, uterine cervix and bronchus); all patients were treated by conventional radiotherapy and in all cases a pretreatment T pot measurement was made. To date, over 800 patients have been entered into the trial.
In this paper, we report on a study of 102 sequential specimens from 97 patients entered into the trial. Three laboratories have investigated aspects of sample preparation and analysis to establish where the major sources of variation exist. We discuss whether quality control guidelines and a consensus opinion can increase the reliability and clinical utility of the measurement.

Patients
Material for the study was selected on a consecutive basis, the only criterion being that there should be enough tumour material. A total of 102 specimens were processed and analysed in the three centres and a single set of disc data, from these same patients, was analysed in each centre. The specimens consisted of 25 cervix, 36 oral cavity, 35 rectum and six lung tumours originating from 11 of the participating centres. Each patient had received an intravenous injection of 100 mg m -2 iododeoxyuridine (IdUrd) (NCI, Investigational Drugs Branch, Brussels) several hours prior to the Reproducibility of measurements of potential doubling time of tumour cells in the multicentre National Cancer Institute protocol T92-0045 surgical procedure. The median time between injection and surgery was 6.33 h (range 3-10 h, 5.83 and 7.0 h being the 25th and 75th quartiles).

Sample preparation and staining
The specimens were fixed in ice-cold 70% ethanol for storage. Prior to the study, each centre agreed on a basic sample preparation and staining protocol. The fragments of tissue were minced using scissors and placed in 5 ml of 0.1 M hydrochloric acid containing 0.4 mg ml -1 porcine trypsin (Sigma, St. Louis, MO, USA) and incubated at 37°C until the tissue was digested; this was typically between 45 and 60 min. The resultant nuclei suspension was filtered through 35-µm nylon mesh and centrifuged at 2000 r.p.m. for 5 min. The resulting pellet was resuspended in 2 ml of 2 M hydrochloric acid for 15 min to unwind DNA partially, allowing monoclonal antibody access to IdUrd. The samples were then washed twice with 5 ml of phosphate-buffered saline (PBS) to remove the acid and then resuspended in 0.5 ml of PNT (PBS containing 0.5% normal goat serum and 0.5% Tween-20) and 20 µl of mouse anti-5-bromo-2 ' -deoxyuridine (BrdUrd)/IdUrd monoclonal antibody (Dako). The tubes were incubated for 1 h at room temperature in the dark. After washing with PBS, the pellets were resuspended in 0.5 ml of PNT containing 20 µl of goat antimouse IgG (whole molecule) fluoroscein isothiocyanate (FITC) conjugate and incubated for 30 min at room temperature. After washing with PBS, the specimens were resuspended in 1 ml of PBS containing 10 µg ml -1 propidium iodide.

Flow cytometry
The samples were analysed on FACScan flow cytometers (Becton Dickinson, San Jose, CA, USA) in each centre. The DNA signal from propidium iodide was collected into the FL3 channel and cell doublets discriminated using the width and area signal. The FITC signal from IdUrd was collected on a logarithmic amplifier in the FL1 channel. At least 10 000 single events were collected.

Data analysis
Data analysis consisted of three distinct procedures: the decision as to the ploidy status, the setting of the regions to detect IdUrd incorporation and the setting of the regions to measure T s . The initial decision as to the ploidy status governed the regions that were used to assess the kinetic parameters. A tumour was considered diploid if only one stem line of cells could be observed. The decision as to whether a tumour was diploid or tetraploid was based on both the proportion of G 2 /tetraploid G 1 cells and the presence of IdUrd labelling. If significant labelling, distributed in the expected pattern, was associated with a 4n to 8n population, it was considered tetraploid irrespective of the proportion of cells. If the proportion of diploid G 2 cells exceeded 15%, then tetraploidy was classified. The major G 1 peak classified aneuploid tumours, but polyploidy was noted. The decision as to whether a tumour was hypo-or hyperdiploid was based on which population contained proliferating cells, i.e. if the first G 1 peak showed proliferation this would be assumed to be the tumour population and the classification would be hypodiploid. In some tumours with near-diploid DNA, it was not possible to analyse the labelled populations separately and these represented a category of tumours that were classified as aneuploid but analysed as diploid. Figure 1 shows the FCM profiles and regions required for analysing both a diploid and an aneuploid tumour. After a region (R1) was set for doublet discrimination on the FL3 area and width dot plot, a second region (R2) was set on the gated FL3 area and FL1 dot plot, to discriminate the IdUrd-labelled cells. The lower limit of this region was set by experience rather than using a control. The data for LI and T s were analysed from singleparameter DNA histograms generated from the total and the IdUrd-labelled cells only.
In diploid tumours, a total of four markers were set. On the total DNA profile, M1 and M2 identified the G 1 and G 2 populations, respectively, for analysis of T s . On the IdUrd-labelled DNA profile, M3 and M4 marked the divided cells and those still moving through S-phase for correction of the LI and calculation of T s respectively. In aneuploid tumours, four further regions were set. M1 and M2 identified the diploid and aneuploid G 1 /G 0 populations for calculation of the DNA index, M2 and M3 contained the G 1 and G 2 of the aneuploid population and M4 described the total number of aneuploid cells. In the IdUrd-labelled DNA profile, M5 and M6 were used to correct for cell division in the diploid and aneuploid population, M7 delineated the population still traversing S-phase for the calculation of T s and M8 measured the total number of aneuploid labelled cells.

Calculation of DNA index, LI, T s and T pot
All data were handled electronically; the numerical information was imported directly into an Excel spreadsheet from the FCM analysis program (PC Lysys).
The DNA index was calculated using the ratio of M2/M1 for aneuploid tumours. The total LI (TLI) was calculated making a correction for those cells that had divided between injection and biopsy. For diploid tumours, the calculation was as follows using the numbers of cells in each of the following regions: The T s was calculated using the relative movement method (Begg et al, 1985) in which the mean DNA content of the G 1 and G 2 populations and that of the IdUrd-labelled cells, yet to divide, are required to calculate a parameter termed the relative movement (RM). The regions used for this are shown below, with the corresponding regions for aneuploid tumours in parentheses.
The T s was calculated from the RM using the original assumptions that the RM at time zero was 0.5 because of the uniform distribution of labelled cells throughout S-phase and that the value reaches 1.0 at a time equal to T s because of uniform progression of labelled cells through S. The T s was simply calculated from the relationship:

time between injection and biopsy)
RM -0.5 The T pot was computed from the T s and the LI using the formula of Steel (1977) T pot = λ × T s LI where λ is a correction factor for the age distribution of the population and was assumed to be 0.8.

Statistical analysis
A variety of statistical tests were applied, including linear regression analysis, Spearman's rho, κ measure of agreement and Bland-Altman methods for assessing agreement (Bland and Altman, 1986). Statistical analyses were carried out using JMP (SAS Institute, Cary, NC, USA). Table 1 summarizes the cell kinetic parameters obtained for the complete set of samples in each of the three centres. The nomenclature was such that the first letter represented the centre where the sample was processed and the second letter indicated where the sample was analysed; thus, LG signifies that the sample was processed in Lausanne and analysed in the Gray Laboratory. Considerable variation was present within and between data sets. The most notable discrepancies were the differences between the median and mean values for T s , indicating a non-symmetric distribution. This was particularly evident within the single data set analysed by all three centres in which the standard deviations were greater than those in samples processed separately in each centre. This variability was translated into the data for T pot , which showed significant differences between the data sets. The underlying reason for this result was the presence of spuriously high T s values within some of the disc analysis data sets and which were not reciprocated within the sample data set. In one specimen, an extreme value of 1432 h was calculated for T s in one centre in comparison with 14.1 and 61.1 h from the other two centres. This was a particularly difficult and confusing profile to analyse and, of the 102 specimens, there were 12 in which either one or two of the centres recorded T s values that were considered to be spuriously high (> 40 h). Censoring of these data resulted in better agreement between mean and median values for T s and T pot (Table 1) but was without effect on LI.

Spearman rank correlation analysis
The true test of the T pot measurement is whether the estimated values are reproducible between different observers and different centres in the ranking of tumours as fast or slow. Table 2 demonstrates that the major discrepancies in the single data set reside within the calculation of T s as evidenced by the low correlation coefficients for both RM and T s . The agreement in determining the total LI was consistently high between all observers. The agreement in calculating the aneuploid LI was less precise and, as will be discussed later, was primarily due to interpretation of the DNA profile. T pot relies on both the LI and T s and, as a result, the agreement for this parameter was intermediate among that found for its determinants. However, the agreement was improved considerably by censoring the specimens with outlying T s values. The level of concordance in T pot between the three centres resulted in correlation coefficients ranging from 0.78 to 0.86.
The correlation coefficients when samples were processed and analysed in different centres fell dramatically for all parameters. The loss of agreement might result from variability in sample processing, staining and running and from tumour heterogeneity. Consideration of the LI data would appear to suggest that the LI might be the major determinant. The level of agreement was better for the aneuploid LI than for the total LI, the reverse of the disc data. Within a specimen, it is likely that the LI of the tumour cells alone might be more consistent between different areas than the total LI, which includes stromal and infiltrating cells. The correlation for T s was only slightly worse than for the disc data. The combination of these two parameters resulted in correlation coefficients of 0.30-0.45 for T pot irrespective of whether the complete or restricted data sets were considered. The disc and sample comparison between the Besancon and Gray Laboratory data sets resulted in correlation coefficients similar to those seen within the sample comparison.

Kappa statistic analysis
The kappa correlation (κ) was chosen to measure the degree of agreement between samples. In this analysis, each parameter was classified according to whether it was above or below the median value for each individual data set, and the results for LI, T s and T pot are shown in Table 3. The data reiterate the findings of the Spearman's analysis. In the disc-only data, the level of agreement in TLI was excellent (> 0.8), T pot was in the good category (0.61-0.8) and the T s values fell into the moderate category of agreement (0.41-0.6). In the sample and the disc-sample data, the majority of κ values fell below 0.5, which indicated poor agreement.

Bland-Altman analysis of agreement
The level of agreement was assessed using the procedure of Bland and Altman. The data are presented in Table 4 for TLI, T s and T pot for the restricted data set of 90 patients. The mean difference gives an indication of measuring bias in each data set and centre and two standard deviations defines the limits within which the differences would be expected to be found. Within the complete (data not shown) and censored data set, the TLI shows excellent agreement in the disc data, with mean differences of less than 0.1%. The 95% confidence intervals on the mean difference were no more than ± 0.3% around the mean value. The majority of differences would be found within ± 4.0% of the mean. However, within the sample and sample/disc data sets, the mean difference ranges from -2.0% to 3.9% and the standard deviations increased to between 5% and 6.0%. This would result in wide expected limits of agreement of ± 10-12% around the mean. The data for T s indicate variable agreement in both the disc and sample data, with the mean differences ranging from -3.3 to 1.9 h. There was a trend for both Besancon and the Gray Laboratory to produce lower T s values than Lausanne. The important parameter is the standard deviation, which varies between 4 and 6 h for each combination, indicating that observers or centres could differ by as much as ± 12 h from the mean value. The variability in T s was translated into differences in T pot . Within the restricted data set, the mean differences were reduced to ± 2 days and the limits of agreement ranged from 4 to 10 days around the mean.

The influence of DNA ploidy
The analysis of T pot is dependent on the classification and interpretation of the DNA profile. In the complete data set, 34 tumours were classified as diploid and 20 were classified as aneuploid (with the same DNA index) by all observers and centres; these represent 53% of the specimens. A further five specimens were uniformly classified as aneuploid but with some discrepancy in the DNA index values. In a further 25 specimens, four out of the five combinations agreed on either an aneuploid (18 cases) or a diploid (seven cases) profile. The majority of these discrepancies (22 cases) arose within the sample data. To examine the influence of DNA ploidy, the disc data and sample data sets were analysed separately to establish which factors were attributable to observer variation alone and which to observer and sampling variation.

Disc data set
Forty-two specimens were agreed as diploid and 41 were uniformly classified as aneuploid with similar DNA indices, representing 81% of the data set. The discrepant tumours were eight in which two observers classified the specimen as aneuploid and the other reported a diploid profile and six in which two diploid values were recorded and one aneuploid. One specimen had three different aneuploid values, three specimens had two similar aneuploid indices and one discrepant and one specimen was classified as diploid, hyperdiploid and hypodiploid. In the restricted data set (90 specimens), 42 specimens were diploid, 34 were uniformly aneuploid, six were classified as aneuploid by two observers and diploid by the other and five were classified as diploid by two observers and aneuploid by the other. Only three specimens did not fit into these classifications. It was noted that the 12 specimens excluded from the censored data set were either aneuploid or discrepant cases. Table 5 summarizes the improvement in agreement when consensus on ploidy was reached. The agreement in TLI was further increased to a coefficient of, on average, 0.97 while ALI coefficients exceeded 0.9. There was a significant increase in agreement in T s compared with the data in Table 2; the coefficients reached 0.8 for diploid tumours and 0.6-0.7 for aneuploid tumours. These improvements in agreement in LI and T s were translated into an increase in the correlation coefficients for T pot , with values of 0.9 overall in the restricted data set. Again, agreement was better in diploid tumours (0.94) than that obtained in aneuploid specimens.

Sample data
Ploidy agreement in the sample data set was worse than that found within the disc data set. In the 102 samples, 38 were classified as diploid by all three centres but only 21 were aneuploid with the same DNA index; this represents 58% of specimens. In a further 11 specimens, two diploid and one aneuploid classifications were recorded and, in 18 cases, two similar aneuploid values and one diploid were obtained. The rest consisted of seven cases in which there were two different aneuploid values accompanying one diploid, five cases with two similar aneuploid values with a different aneuploid DNA index and two cases with three different aneuploid indices. Table 6 summarizes the correlation data obtained after ploidy agreement in the sample data. It can be seen (in comparison with Table 2) that there is some improvement in the correlation coefficients for TLI, T s and T pot . This result was more evident in the complete data set rather than in the censored data set, as most of the censored data were due to aberrations in the disc data. Unlike the disc data (Table 5), the improvement in correlation for T pot was not superior in diploid compared with aneuploid tumours, although the correlation coefficients for T s were superior in the diploid group. The correlation coefficients for TLI, T s and T pot were not dissimilar to each other, with average values ranging from 0.5 to 0.7, but agreement was generally better in TLI than in T s .

Proliferative classification using cut-off values
The ultimate requirement of the T pot measurement will be to classify tumours as fast or slow with some degree of certainty. In this analysis, the LG data were considered to be the 'true' data and the others compared with this. Figure 2 shows the data plotted as a ratio of the median LG value (3.44 days) for the restricted data set. The data in the upper left and lower right quadrants indicate the misclassified measurements. LL would wrongly classify 10% as fast and 12% as slow; for LB it would be 18% and 8%, GG 22% and 9% and BB 4% and 9%.

DISCUSSION
In order to use a FCM measurement in clinical practice for the potential selection of patients for more appropriate treatment schedules, the sources of variation within the technique need to be understood and minimized. In common with all FCM-based methods there will be variation among centres associated with differences in instrumentation and laboratory techniques. Some of these can be minimized by using the same model of flow cytometer, agreed machine set-up and standardized laboratory procedures. This will leave the main source of variation to be attributable to what has been termed the interaction component. This comprises sample heterogeneity and inconsistencies in sample preparation, staining and analysis. If T pot values are to be meaningful and reproducible within and between laboratories, then this variation must be eliminated or minimized otherwise the technique will not be transportable or the data interchangeable. The measurement of T pot from a single sample represents a relatively complex FCM procedure, many aspects of which have the potential to introduce variation (Terry and Peters, 1995). These include the sample itself, the tissue digestion with pepsin, the staining procedure (particularly the denaturation step), the interpretation of the DNA profile and the region setting for relative movement and labelling index analysis. Against this background, either the methodology must be robust or the tolerance limits of the measured parameter must be wide enough to permit some variation without compromising the clinical significance of the ultimate value. The NCI T92-0045 study was designed to address these two issues firstly with the comparisons reported here and, secondly, with the ultimate application of the measured values, with detailed knowledge of their variation, to the clinical data.
T pot reproducibility has been the subject of three previous reports (Wilson et al, 1993a;Haustermans et al, 1995;Tsang et al,  1995), which were restricted to two-centre analysis. The same general conclusions were reached in these studies, namely that the estimation of T s is the major source of analytical variation and that sample processing and tumour heterogeneity account for the variation between centres. In this present multicentre study, we found the measurement of LI to be robust and reproducible in the disc data, with Bland-Altman analysis revealing no evidence of any systematic errors, as was found in one of the previous studies (Haustermans et al, 1995). Correlation analysis, both Spearman's and kappa, showed a high level of agreement among observers. Concordance of absolute values was extremely high, with linear regression analysis resulting in slopes of greater than 0.9 and intercepts of less than 0.01 (data not shown) for all three combinations. Indeed, if the LI was used to rank tumours according to their proliferative characteristics, only two specimens would be classified wrongly as fast and two as slow using this data set. The measurement of LI is a relatively simple procedure requiring only two regions. However, the first region, which delineates the IdUrd-labelled cells, is set subjectively and has received some criticism and discussion (White and Terry, 1992). The data in this study would suggest that fitting a distribution to the unlabelled G 1 and G 2 populations, in the IdUrd dot plot, and using standard deviations to set the lower limit of detection is unwarranted.
The estimation of T s was problematical. The analysis depends on the same initial region to delineate the IdUrd-labelled cells (which has been shown above to be reproducible) and a further three regions to measure the mean DNA content of G 1 and G 2 and of the IdUrd-labelled cells which have not divided. This should be a relatively straightforward procedure (as described in Figure 1) but is subject to the complexity of the IdUrd-DNA distribution. Twelve specimens were classified as outliers with the common feature of a T s value that was considered unreasonable, either too long (11 specimens) or too short (one specimen). All 12 specimens were classified as aneuploid by one or more observers, and 8 of the 12 specimens had discrepancies in the ploidy value. The majority of these specimens (11 cases) had low RM values of between 0.5 and 0.6 caused by multiploid DNA and the selection of the wrong G 1 or G 2 peak in the DNA profile for the RM calculation. It has been suggested that the ratio of G 2 to G 1 should be considered constant such that only the mean G 1 need be calculated (Begg et al, 1988). This procedure improves the correlation in the T s in the disc data but is without effect in the sample data.
The inconsistencies in region setting were highlighted by the section dealing with concordance in DNA ploidy prior to evaluation of the T s . Indeed, even when all observers agreed that the profile was diploid, the correlations in T s failed to reach a value greater than 0.8. Analysis of the raw data in those diploid tumours  that failed to correlate revealed that there was very good agreement in the mean values for the G 1 population but discrepancies in the measured mean DNA values for G 2 and the region defining the IdUrd-labelled cohort. The discrepancy in G 2 can arise from the tightness of the computer-generated region, which is dependent on the definition of the peak. The RM region is dependent on the tightness of the G 1 region and whether or not it has been set juxtaposed to this region (see Figure 1). In aneuploid tumours that were ascribed the same DNA index by all observers, the correlation coefficients reached only 0.6 to 0.7 for T s . The underlying reason for this was again variation in the region delineating the IdUrd-labelled cohort and to a lesser extent the G 2 population. The discrepancies with the former arose because of the presence of the diploid S and G 2 populations within the RM window. In some instances, the observers had attempted to eliminate them from the measurement by setting the lower limit of the analysis window to the right-hand side of the diploid G 2 rather than juxtaposed to the aneuploid G 1 . This procedure is acceptable only if the IdUrd-labelled cohort of the aneuploid population has clearly progressed through S-phase, such that it is distinguishable from the diploid labelled population.
Although T s was the dominant feature in introducing variability into the agreement in the T pot measurement, the LI determines the extent of intrinsic variation in T pot . This is primarily due to the broader distribution of potential values (40-fold variation) for LI compared with T s (eight-fold variation). To some extent, the reproducibility of LI overcomes the inherent problems within the T s estimation to produce excellent correlation coefficients for T pot once outliers and ploidy agreement are taken into account (0.9 or greater). In the disc data set, this cohort represented 75% of the original number of specimens analysed.
The sample data presents problems that have wider implications for the utility of T pot as a predictive measurement. The sample data were subject to all forms of potential variation and introduced the component due to sample processing and tumour heterogeneity. Tables 2 and 6 demonstrate that the correlation between all three centres is drastically reduced, compared with the disc data, for each of the parameters in the T pot measurement. Correlation analysis yielded values of no greater than 0.5-0.6 after accounting for ploidy differences; this comparison also introduced the greatest errors in the previous two reports of T pot reproducibility (Wilson et al, 1993a;Haustermans et al, 1995).
It was not possible to design this study to distinguish between sample processing and tumour heterogeneity because of the limitations of tissue. Clues from the data suggest that heterogeneity may be the dominant feature and thus a common problem to all biopsybased measurements. In particular, the correlation for aneuploid LI exceeded that for total LI within the sample data, which is the converse of the disc data. The explanation for this reversal may reside within the consideration of genotypic and phenotypic heterogeneity within solid tumours (Shackney and Shankey, 1995). Tumour growth is determined by proliferation, differentiation and cell death, each of which is genetically controlled but subject to microenviromental stimuli such as nutrient and oxygen deprivation. Proliferative heterogeneity between samples from the same tumour will certainly arise as a function of differentiation, tumour growth pattern, host cell infiltration and vascular perfusion. Each of these will cause areas of microregional variation in the percentage of proliferating and non-proliferating cells and in the ratio of tumour cells to normal cells. The aneuploid LI is subject to the genotypic and phenotypic variability induced by differentiation, growth patterns and tumour perfusion. However, the total LI reflects the influence of both normal and tumour cells and may be also affected, to a greater extent, by heterogeneity due to variation in host cell content.
The mean intratumoral coefficient of variation (CV) of the three samples processed and analysed in each centre was 49% for T pot . This was less than the 63% reported in the largest study of intratumour variability in T pot (Wilson et al, 1993b) in which six biopsies from 30 colorectal tumours were studied. In the colorectal study, CVs of 28% and 36% were reported for LI and T s compared with 38% and 27% in this present study. These data demonstrate that both T s and LI are important variables and can introduce variability into the ultimate T pot measurement. As has been found in other studies (Begg et al, 1988;Bennett et al, 1992;Wilson et al, 1993a), the intertumour variation in T pot , 127% in this present study, far outweighs the intratumour variability. This indicates that, despite heterogeneity, proliferation differences between tumours should be detectable.
This present study demonstrates that significant differences exist between laboratories in the measurement of T pot and that these could result in misclassification of tumours as fast or slow by one or other centre. The following recommendations can minimize the analytical and processing errors but heterogeneity is inherent to the biology of individual tumours.
(1) Dissociation of the specimen into nuclei can introduce potential errors, particularly if underdigested (Terry and Peters, 1995). It is important to maximize the yield, in individual tumours, to increase the chance of obtaining a representative sample. The timing of dissociation should be adjusted for individual specimen needs, and it is not satisfactory to use a constant time interval for all tumours.
(2) Staining procedures should be standardized to that recommended by a laboratory with experience and the quality assessed by intercomparison of samples.
(3) Machine conditions should be standardized, although this will be different for different machines. Doublet discrimination should be used, the FITC signal should be collected with log amplification and at least 10 000 events should be recorded. The last is only a guideline as, for each sample, a significant number of events in the regions of interest need to be collected. (4) Interpretation of the DNA profile should follow the guidelines already suggested (Shackney et al, 1993). Aneuploid populations should not be considered if they represent < 5% of the specimen unless the IdUrd staining can aid in their identification. All specimens of near-diploid DNA should be regarded as hyperdiploid unless the proliferating cells are clearly seen to emanate from the peak with less DNA. In near-diploid DNA profiles it will not always be possible to analyse the aneuploid population separately and all tumours should be analysed as a single population. (5) In tumours with multiple DNA peaks, the choice of which aneuploid population to analyse should be based firstly on the magnitude of the population and, secondly, on the distribution of IdUrd labelling. The largest population would normally be analysed unless more proliferation was associated with a minor peak; this may well be the evolving clone of the tumour. The G 1 /G 2 ratio should always be checked to ensure that regions have been set around the appropriate population.
(6) T pot should not be analysed in profiles that show no evidence of cell division in the time between injection and biopsy. This may well be a function of the time interval, which should not be less than 4 h. (7) T pot should not be analysed in profiles in which the region of interest is significantly impeded by overlapping cell populations from the diploid component or other aneuploid clones. It is not possible to set strict guidelines to this potential artefact, and this should be gained by experience or with reference to profiles obtained by other expert researchers. (8) The regions to measure LI and T s should be as described in Figure 1, with the tightness of the G 1 and G 2 regions being determined by the CV of the corresponding peaks. Cell cycle analysis of the DNA profile may represent a better method of assessing the mean DNA contents of G 1 and G 2 and should be carried out if possible. The RM window should be set juxtaposed to the G 1 region, except in aneuploid tumours in which the labelled population has clearly progressed through its S-phase to a value greater than the diploid G 2 . (9) The transcription of data should be electronic to avoid errors. (10) The calculation of T pot has been carried out using the Begg algorithm in this study, but several other alternatives give different absolute values of T pot but similar ranking of tumours (White et al, 1990). The clinical significance of each derivation of T pot will be assessed in the final correlation of the measurement with clinical outcome in T92-0045 trial. (11) The question of heterogeneity can only be tackled by obtaining a large specimen and assessing at least two, and more if possible, areas from the sample. This may not always be possible and the quality of the nuclei suspension becomes crucial to ensure that it is representative. Whatever the sample, it should always be accompanied by histological assessment to ensure that a tumour is present and to give insight into the composition of the specimen.
The overall conclusion from this intercomparison is that agreement can be reached on the measurement of T pot in at least 75% of specimens by different observers analysing the same data. However, in a multicentre study, processing and analysis should be restricted to a single centre of excellence for the most consistent results. This does not preclude the interinstitutional comparison of results as long as the potential errors are clearly understood.