The United Kingdom Childhood Cancer Study: objectives, materials and methods

An investigation into the possible causes of childhood cancer has been carried out throughout England, Scotland and Wales over the period 1991–1998. All children known to be suffering from one or other type of the disease over periods of 4–5 years have been included, and control children matched for sex, age and area of residence have been selected at random from population registers. Information about both groups of children (with and without cancer) has been obtained from parental questionnaires, general practitioners' and hospital records, and from measurement of the extent of exposure to radon gas, terrestrial gamma radiation, and electric and magnetic fields. Samples of blood have also been obtained from the affected children and their parents and stored. Altogether 3838 children with cancer, including 1736 with leukaemia, and 7629 unaffected children have been studied. Detailed accounts are given of the nature of the information obtained in sections describing the general methodology of the study, the measurement of exposure to ionizing and non-ionizing radiation, the classification of solid tumours and leukaemias, and the biological material available for genetic analysis. © 2000 Cancer Research Campaign


Origin
Cancer in children under 15 years of age is rare, accounting for less than 1% of malignancies diagnosed each year in developed countries (Draper, 1995;Coleman et al, 1999). The types of cancer that occur in children are generally more responsive to chemotherapy than those in adults and, in recent years, improved treatment protocols have resulted in significant increases in survival for a wide range of diagnostic groups. These improvements have been particularly marked for leukaemia, which accounts for around a third of all childhood malignancies, and more than 70% of children diagnosed in the UK with this once rapidly fatal disease now survive into adulthood (Draper, 1995;Coleman et al, 1999). Cancer in children, nevertheless, remains a significant health problem. Of the 10 million children in the UK, around 1200 (one in every 8000) are diagnosed with the disease each year: about one child in every 600 developing it before their 15th birthday (Parkin et al, 1998;Coleman et al, 1999). Despite the marked improvements in treatment, about one in ten of all childhood deaths occurring after the first month of life is attributed to cancer and among 5-to 14-year-olds cancer is the second most common cause of death, after accidents (Botting and Crawley, 1995;Fear et al, 1999). Unfortunately, the marked advance in childhood cancer treatment seen over the last 20 years has not been matched by similar advances in knowledge of its aetiology and the causes of the majority of childhood cancers remain unknown (Draper et al, 1995;Miller et al, 1995;Little, 1999).
When, in 1990, this situation was discussed informally by members of the Radiation and Cancer Subcommittee of the United Kingdom Coordinating Committee on Cancer Research (UKCCCR) the Advisory Group on Non-Ionising Radiation of the National Radiological Protection Board (NRPB) and the Medical and Scientific Advisory Panel of the Leukaemia Research Fund (LRF), the only known causes apart from the hereditary factors that caused a few cases, were exposure to X-rays in utero or after birth, some types of chemotherapy given for a previous cancer, and infection with the Epstein-Barr virus. Many ideas had been floated, but often without distinguishing between the different types of the disease. Leukaemia had been the most extensively studied, although not always by sub-type. Among the many possible causes that had been suggested for leukaemia (or more specifically, in some studies for acute lymphoblastic leukaemia (ALL)) were: (1) exposure of the father to ionizing radiation before the child's conception (Gardner et al, 1990), or to a variety of other agents in the occupational and home environments The United Kingdom Childhood Cancer Study: objectives, materials and methods (van Steensel-Moll et al, 1985;Arundel and Kinnier-Wilson, 1986;Lowengart et al, 1987); (2) exposure of the fetus transplacentally to a variety of N-nitroso compounds (Magee et al, 1976) or infection (Knox et al, 1983); and (3) exposure of the child postnatally to radon in house air (Henshaw et al, 1990) and to chloramphenicol given therapeutically (Shu et al, 1987). A history of viral infections (McKinney et al, 1987) and of infection arising from the mixing of urban and rural populations (Kinlen, 1988;Kinlen et al, 1990), or following a paucity of common infections in infancy (Greaves, 1988) and a deficit of immunizations (Kneale et al, 1986;McKinney et al, 1987) had also been associated with the disease.
Specific exposures suggested for specific sites or types of cancer other than ALL had, by 1990, included: chloramphenicol (Shu et al, 1987 and maternal drug use prior to or during pregnancy in relation to acute non-lymphoblastic leukaemia (Robison et al, 1989) barbiturates, and parental exposure to substances such as solvents (Gold and Gordis, 1978;Gold et al, 1979) and N-nitroso compounds (Preston-Martin et al, 1982) for central nervous system (CNS) tumours; and maternal exposures during or just prior to pregnancy to drugs (including recreational drugs and anti-convulsants), alcohol and hair colourants for neuroblastoma (Lipson and Bale, 1985;Kramer et al, 1987) and for Wilms' tumour (Bunin et al, 1987). It had also been suggested that Hodgkin's disease in children might arise as a rare sequel to infection with a common virus (Gutensohn and Cole, 1981;Gutensohn and Shapiro, 1982) and that children with rhabdomyosarcoma were less likely to be immunized than controls (Grufferman et al, 1982;Hartley et al, 1988).
For childhood cancer more generally, exposure to extremely low frequency electromagnetic fields from the passage of electricity (Wertheimer and Leeper, 1979) and the administration of vitamin K within the first week of life (Golding et al, 1990) had also been suggested as having a role.

Objectives
It seemed, therefore, that it would be scientifically, socially and economically sensible to mount a national study of all types of childhood cancer on a large enough scale to provide sufficient data to test adequately the major hypotheses about the possible causes of such cancers, excluding only wide-ranging family studies of genetic factors.
After detailed discussions, five broad hypotheses were defined that the study should attempt to test. These were that childhood cancer might be caused by: 1. a child's exposure to ionizing radiation (natural or man-made) either during the mother's pregnancy or after birth 2. similar exposure to potentially hazardous chemicals, some of which have been associated with specific cancers in adult life 3. exposure of the parental germ cells to either radiation or hazardous chemicals before the child's conception 4. extremely low frequency electromagnetic fields post-natally, particularly in the case of brain cancers and leukaemia 5. in the special case of leukaemias and lymphomas, abnormal responses to one or more common infectious agents.
Two specific hypotheses were considered within the umbrella explanation of the fifth hypothesis. First, some subtypes (in particular common ALL) follow (i) paucity of infectious exposure in infancy, and (ii) late or delayed exposure shortly before the onset of symptoms (3-12 months prior to diagnosis). Secondly, the risk of leukaemia (and possibly lymphoma) is increased in children in rural areas of marked population mixing. In either case, the leukaemia may also be associated with genetic factors that influence the immune response -in particular, HLA class II alleles or haplotype.

Organization
A management committee was consequently set up comprising epidemiologists, paediatric oncologists, laboratory scientists and a representative of the NRPB, the members of which planned the conduct of the study under the aegis of the UKCCCR. The committee divided England & Wales into the nine regions shown in Figure 1, each of which became the responsibility of a leading epidemiologist who, in conjunction with local paediatric oncologists and the help of the UK Children's Cancer Study Group, arranged for the collection of the necessary information.
A similar study was initiated independently at about the same time in Scotland under the aegis of the Scottish Department of Health. Early contacts enabled it to be agreed that similar data would be collected in much the same way both in Scotland and in England & Wales, that the principal Scottish investigator would join the Management Committee and that the principal results obtained in all three countries would be pooled for central analysis. Consequently Scotland became, in effect, the tenth region in a study, named the United Kingdom Childhood Cancer Study (UKCCS), that covered the whole of the UK, apart from Northern Ireland.
With the rare exceptions described in the section on General Methodology, the study was planned to include all children up to 15 years of age registered with a Family Health Services Authority (FHSA) in England & Wales and with a Health Board (HB) in Scotland who were diagnosed as having cancer. Case accrual began in Scotland on 1 January 1991 and in England & Wales on 1 April 1992 and was planned to continue until at least 1000 cases of ALL had been studied, which it was thought should take about 4 years. In the event, accrual ceased in different areas at different dates between December 1994 and December 1996. For each child with cancer, two controls were sought matched on sex and month and year of birth from the same FHSA or HB list as that on which the child with cancer had been registered before the cancer was diagnosed. Altogether the study included 1461 children with acute lymphoblastic leukaemia, 2377 children with other types of cancer and 7629 control children (47 children with cancer having only one control).
Details of the information obtained and how it was collected are described in sections on materials and general methodology, electromagnetic fields, radon and terrestrial gamma radiation, classification of solid tumours, biological samples and classification of leukaemia, and genetic susceptibility.

MATERIALS AND GENERAL METHODOLOGY
The ten UKCCS data collection centres ( Figure 1) were responsible for all aspects of regional study organization including the day-to-day running of the study, obtaining permission from local ethical committees and liaising with appropriate health care professionals. An outline of a typical organizational chart for data collection is given in Figure 2.
A common protocol was agreed by the Management Committee at the outset of the study. The need to respond at a local level to prevailing circumstances -such as specific requests by local ethical committees etc. -resulted in a certain amount of regional variation. The main departures from the protocol, together with basic regional information, are listed by region in Table 1.

Subjects
In England & Wales, the UKCCS study population was defined as children (0-14 years) registered with one of the 98 FHSAs. Under the National Health Service, all General Practitioners (GPs) and their patients are registered with their local FHSA. This comprises around 98% of the total population (RCGP, 1987). In Scotland, where the system is similar but independent, the study population was defined as children registered with one of the 15 Health Boards (HBs) and has been described separately (McKinney et al, 1995).

Cases
The study began in Scotland on 1 January 1991 and on 1 April or 1 September 1992 in the regions of England & Wales (see Table 1). Following these start dates, children registered with an FHSA/HB who were diagnosed either with a confirmed malignancy or with any tumour of the CNS, as defined in the classification scheme devised by Birch and Marsden (1987), were potentially eligible for the study. In Scotland, case accrual ceased in December 1994, and in England & Wales it was restricted in all regions to patients with leukaemia and non-Hodgkin's lymphoma throughout 1995, and leukaemia alone throughout 1996.
Each regional centre was responsible for ensuring the completeness of ascertainment of cases diagnosed in residents within their boundaries. The majority of cases were notified directly from regional treatment centres by paediatric oncologists belonging to the UKCCSG (Mott et al, 1997). In other hospitals, and certain specialist units that treated adults as well as children, individually tailored referral systems were put in place. In addition, crosschecks were made against regional cancer registries and against the National Registry of Childhood Tumours (NRCT) (Stiller et al, 1995).
Detailed diagnostic information was obtained from multiple sources. For leukaemias, the principal diagnostic sources were the Medical Research Council's treatment trials (MRC, 1997(MRC, , 1998. Information about leukaemic children who were not enrolled in trials was obtained from one of three sources: the UKCCSG, the NRCT, or the individual consultant treating the child. In addition, cytogenetic data on trial and non-trial patients were obtained from the Leukaemia Research Fund's cytogenetics database held at the Royal Free Hospital (Moorman et al, 1996), and molecular diagnostic information on patients with ALL was provided by the central reference laboratory for UKCCS pretreatment samples at the Leukaemia Research Fund Centre, Institute of Cancer Research. To obtain reliable information about the diagnosis of cancers other than leukaemia, a histopathology review database was specially created for the purposes of the study. Detailed information about this database, and the diagnostic verification procedures used, are given later.

Controls
Each case child was individually matched with two control children of the same sex, date of birth and region of residence. The mechanism by which this was achieved varied slightly from one region to another (Table 1).
Throughout the majority of English study regions controls, matched on sex and age (month and year of birth), were randomly selected from the same FHSA list as their corresponding case. This was achieved by obtaining computerized lists of children registered with each FHSA on 1 January and 1 July each year: the potential controls being randomly selected from the list on which the case child had appeared immediately prior to the diagnosis of the cancer. When the case child was less than 1 year old, however, controls were chosen from the first FHSA list on which the case appeared. Because of computing difficulties, two English study regions did not randomly select potential controls from within the same birth month as the corresponding case. Instead, the FHSA downloads were sorted in date of birth order and controls with the closest birthdates to the case selected (hence, most controls were born on the same day as their corresponding case) (Table 1).
In Scotland, the system was similar to the main method described above except that the Health Boards randomly selected the controls (McKinney et al, 1995). Unfortunately seven of the eight Welsh FHSAs and one English FHSA declined to participate in the study. For each case diagnosed within their boundaries the corresponding control selection had two stages: two GPs were randomly selected and approached and, if they agreed, potential controls (of the same sex and born within 3 months of the corresponding case) were randomly selected from their practice lists.
Whatever procedure was used to identify potential controls, the GPs of the first two identified were approached and, with their permission, the parents of the children were contacted and asked to participate in the study. When the GP refused permission to contact, or the parents refused to participate, another control was selected by the same method, and so on until two control families participated ( Figure 2). All details of control replacements were logged, and details of non-participating controls retained (see Registration below).
Notes: a For cases diagnosed before 9/93 FHSA records ordered by date of birth and same sex and controls with closest birth-date selected. b As in protocol but controls selected by the health boards. c Each pair of controls matched on sex and month of birth (± 3 months) randomly selected from each of two randomly selected GP lists. d One FHSA withheld permission for data access. Controls selected as in note c. e For cases diagnosed before 4/96, controls selected as in note a.
Ethnicity was ignored during control selection, but a third control family was, where possible, interviewed in place of the second family when the parents of both control children were of a different ethnic group to the parents of the case child. In most regions this additional selection was possible only when the case was white and the two interviewed controls were not.

Ineligible children
Children (cases and controls) were ineligible if they had been born outside Great Britain or had a prior malignancy. For the purposes of the study, all control children were assigned a 'pseudodiagnosis' date that coincided with the exact age at which their corresponding case was diagnosed. Children who themselves, or whose parents, were resident outside Great Britain in the 3 months leading up to diagnosis/pseudo-diagnosis were considered ineligible. For ethical reasons, children in residential local authority care at diagnosis/pseudo-diagnosis (< 1% of the total childhood population) were also excluded from the study.

Registration
At registration the date of birth, sex, initial diagnosis and home address of all eligible cases diagnosed in the regions were recorded, regardless of whether their families subsequently participated in the study. Controls were selected only for participating cases. All control children whose GP was approached were registered in the system, regardless of whether their GP/parents agreed to participate. The system was designed so that the origins of all controls attached to a case were known, the key fields being: 1. Eligibility: yes or no 2. Identification number: chronological order in the selection sequence 3. Replacement number: identification number of the control replaced 4. Choice number: 1st, 2nd, etc. (this variable is less than the identification number when new controls replace ineligible controls).
For those children whose families were recruited, the address at diagnosis/pseudo-diagnosis was recorded at registration. For those whose families did not participate, the initial contact address supplied by the treating hospital (cases) or listed on the FHSA/HB database (controls) was recorded.

Participation
Eighty-seven per cent of the parents of the 4433 case children agreed to be interviewed but only 64% of the 11 987 potentially eligible control children. The numbers not participating for different reasons are shown in Figure 3.
A few (3% of the cases and less than 1% of the controls) were not approached. This occurred when the mother (biological or non-biological) of the child had died and no suitable surrogate could be found, when the parents did not speak English and for whom no suitable translator could be found, or when the family emigrated shortly after diagnosis/pseudo-diagnosis. For both cases and controls, parental refusal to participate was the main reason an interview was not secured, accounting for 279 (47%) of the 595 non-interviewed cases and 2625 (60%) of the 4358 non-interviewed controls. Refusals by medical practitioners to allow their patients to participate in the study was also important: permission to approach families was not given by the consultants treating 116 cases and by the GPs of 1012 controls. For controls, another important factor for not securing an interview was 'failure to trace': the address held by the GP of 648 (5% of the total and 15% of the not interviewed) was incorrect, and the family's correct address was not traceable.

Parental interview
Mothers and fathers were interviewed using separate, but similar, questionnaires. The parent(s) or guardian(s) with whom the child was living at the time of diagnosis/pseudo-diagnosis were approached and interviewed first, regardless of whether they were the biological or non-biological parent(s). In the case of the latter, permission was then sought to contact and interview the biological parent(s).
A trained interviewer using a structured questionnaire interviewed all participating parents. Telephone interviews were conducted only when face-to-face interviews could not be arranged. When one parent (usually the father) could not be contacted, the available parent (usually the mother) was asked to act as a surrogate.
Full residential and occupational histories, including specific information about occupational exposures and individual housing characteristics, were recorded for each parent. To improve the quality of these data, a form asking parents to list the places where they had lived and the jobs they had had was sent out in advance of the interview. At interview, mothers and fathers were also asked about their own health, social habits and illnesses in their families. Additional sections on pregnancies and on the index child's health, schooling and social history were incorporated into the mother's questionnaire.
On completion of the interview, parents were asked whether they were willing to be contacted again. In particular, consent was sought for their participation in both the ionizing radiation (radon and gamma) and the non-ionizing radiation (EMF) components of the study (Figure 2). Signed agreement was also requested for blood samples of the cases to be taken at a later date and for medical and other records to be accessed.

Medical note abstractions
With the interviewee's and their GP's permission, the GP notes of index children and their biological parents were abstracted onto  Similarly, details about the mother's pregnancy with the index child were transferred from her obstetric notes and the index baby's neonatal notes (where they existed) to structured obstetric abstraction forms. More information about the variables included on these forms is given elsewhere (Roman et al, 1997;McKinney et al, 1999).

Ionizing and non-ionizing radiation
The arrangements made for measuring the children's exposure to electromagnetic fields and to radon and terrestrial gamma radiation are described on pp. 1087-1090.

Biological samples
After obtaining parental permission, pretreatment blood samples were sought from all children with cancer, and marrow samples from all with leukaemia and lymphoma. Following interview, blood samples were also sought from the case children's firstdegree relatives or, when appropriate, from the children during remission. The number of samples collected and what was done with them are described later.

Data management and census linkage
Registration and interview data were coded and entered into computers in the regional centres using the database management system FoxPro version 2.6 (FoxPro, 1991  (WHO, 1990(WHO, , 1992. On completion of regional checking, the data were further scrutinized, merged and transferred to Microsoft SQL (structured query language) Server version 6.5 at the Study's national centre in Leeds. At the same time, the regionally collected case registration data were linked to various national diagnostic and biological databases, and consequent discrepancies resolved, as described later.
With a view to linking to areal information derived from the 1991 census (Census, 1991), particular attention was paid to collecting valid postcoded addresses for certain key points in time. For participants, these included the homes the child lived in at the time of diagnosis/pseudo-diagnosis and at the time of birth. For non-participants the only available address was the 'contact' address, as held by the FHSA/HB or by the GP. The postcodes of all addresses were checked and, where necessary, automatically reassigned using QuickAddress (1999). About 5% of all addresses could not be resolved in this way, and postcodes were manually allocated. Each address was then linked to one of the 108 336 enumeration districts (EDs) in England & Wales or 38 084 output areas (OAs) in Scotland using dictionaries of correspondence between postcode and census units. The establishment of this linkage enabled aggregate information derived from the census to be used as an areal indication of factors such as wealth and population density.
The 'deprivation' index used in the present report is similar to that developed for a national study of the geographical distribution of childhood leukaemia (Draper et al, 1991). For each ED in England & Wales and OA in Scotland, the following proportions were calculated: 1. households without a car 2. overcrowded households (more than one person per room) 3. persons unemployed: the denominator being persons who were 'economically active' (over 16 years of age, with a potential to be gainfully employed or in receipt of unemployment benefit).
In simple terms, following a series of standard transformation procedures, the standard deviation from the mean of each of the three variables were summed for each area (Townsend et al, 1988). The 146 420 small areas were then ranked by the resulting 'deprivation' index (low values indicating relative affluence and high values deprivation) and divided into seven equal categories (categories one to six each containing 20 917 areas and category seven 20 918). To enable the contribution of the three components of the index to be examined separately, each component proportion was divided into seven categories (as for the deprivation index). For overcrowding, areas with zero proportions formed the bottom category, with the remaining areas being divided into six equally sized groups.

Characteristics of participants
The characteristics of the 3838 children diagnosed with cancer whose families participated in the study are summarized in Table 2.
Detailed diagnostic information about 1313 (90%) children with ALL and 188 (68%) children with other forms of leukaemia was obtained from MRC treatment trials (Table 2). Confirmatory diagnostic data on the 235 children with leukaemia who were not entered into a trial and the 2102 children with other forms of cancer were obtained from a variety of sources (see pp. 000 and 000) There are more boys than girls in each malignancy group. The largest sex difference was for lymphomas, where 72% of those diagnosed were male, and the smallest sex difference was for tumours of the CNS where just over 50% were male ( Table 2). The age-distributions also varied by cancer type. Nearly 60% of the 1461 children with ALL were diagnosed before their 5th birthday, the median age at diagnosis being 4.4 years. By contrast, around 60% of the children with Hodgkin's disease were diagnosed after their 10th birthday, the median age at diagnosis being 11.9 years. Although 3791 (99%) cases had two interviewed controls, 47 (1%) had only one because a second interview was not secured before the end of the study (Table 3). Further controls were selected if those first selected did not participate. Both participating controls of 1760 (46%) cases were first choices, as was one of the two participating controls of 1597 (42%) cases; the remaining 481 (13%) cases had no first choice controls interviewed.
For each child enrolled in the study, up to four parents (biological and non-biological) were interviewed, with 121 cases (3%) and 182 controls (2%) having three or more participating parents. Overall, the biological mothers of 3818 (99%) cases and 7600 (99%) controls were interviewed. The corresponding proportions for biological fathers were slightly lower, at 3596 (94%) for cases and 7010 (92%) for controls.
The characteristics of the parents interviewed are summarized in Table 4. If both biological and non-biological parents were interviewed, the data for biological parents are given precedence. It should be noted that mothers and fathers were not necessarily living together at the time of the interview.
There are a number of differences between cases and controls with respect to the characteristics listed in Table 4. For fathers, but not for mothers, surrogate interviews were twice as common among controls as among cases: information on about 28% of control fathers compared to 14% of case fathers being supplied by someone else (P < 0.05). For both cases and controls, the surrogate in over 99% of instances was the child's mother. The time interval between diagnosis/pseudo-diagnosis and interview varied markedly between cases and controls: the medians being 5.8 months and 14.0 months for case and control mothers respectively.
At the time of interview, parents of control children were more likely than parents of case children to be married (but not necessarily to each other) and to own their own home. At the time of the child's diagnosis/pseudo-diagnosis, participating control parents were, on average, significantly older and of a higher social class (as assigned on the basis of their own occupation). With respect to the latter, the largest difference occurred in the proportion who were not employed at the time of interview: for cases and controls the proportions being 58% versus 47% and 22% versus 19% for mothers and fathers respectively (Table 4).
More information about the socio-economic differences between participating cases and controls is given in Table 5, which shows the census-derived findings for areal deprivation assigned on the basis of the child's address at the time of diagnosis/pseudodiagnosis. Overall, the data indicated that participating control families tended to live in more affluent areas than case families: there being less unemployment and overcrowding and fewer homes without access to a car. Interestingly, in agreement with the findings for social class assigned on the basis of parental occupation (Table 4), the only significant heterogeneity was for the proportion unemployed within an area.
That these observations are principally due to participation bias is supported by the findings presented in Table 6, which compares the deprivation distributions of interviewed and non-interviewed first-choice controls. Within this group there are marked differences between the 5530 who were interviewed and the 2102 who were not, the latter being far more likely to live in deprived areas than the former. Importantly, the deprivation distribution of the 7632 first-choice controls was not significantly different from that of the 3838 cases distribution shown in Table 5.

Discussion
The response from case families was good, with 3838 (87%) of the 4433 eligible cases identified participating. Direct assessment of All UK enumeration districts divided into seven equally sized categories. b The lowest group contains all zero counts, with the rest divided into six equally sized categories. the completeness of UKCCS case ascertainment for the years covered by the study is not straightforward. Ineligible case children (e.g. those who were born abroad) were not registered in the UKCCS, and data on all eligible diagnoses were collected in all regions only for the years 1993 and 1994. Broad comparison with national data for 1981-1990 suggests, however, that the proactive case ascertainment methods used in the study were efficient. Further, the age and sex distributions of the diagnoses included in the UKCCS are reassuringly similar to those reported for Britain in the 10-year period 1981-1990 (Parkin et al, 1998). Participation was not as high for control families as for case families: parental interviews being secured for 7629 (64%) of control families compared with 3838 (87%) of case families. The most important reason for not having an interview was parental refusal, with 2625 (22%) control parents and 279 (7%) case parents refusing to participate in the study. In addition to refusals, 5% of the selected control families did not live at the address supplied by the medical authorities and could not be traced to any other address. This 5% error rate for addresses is in line with other reports (RCGP, 1987;Page, 1991;Roberts et al, 1995).
Like all case-control studies that have relied on the recruitment of individuals for collection of information, the UKCCS findings have to be carefully evaluated in the light of participation bias. Participation bias is introduced when those who respond to a study differ in important respects from those who do not, the consequences in case-control studies invariably being exacerbated by the fact that cases and controls are not equally affected. The design of the UKCCS allows the potential effects of participation bias to be examined in more detail than has been possible in other largescale case-control studies of childhood cancer McBride et al, 1999): the census linkage enabling the characteristics of areas in which participators lived to be compared with those of non-participators. The findings from this areal analysis confirm that the socio-economic differences between interviewed cases and controls arose because controls categorized as deprived were less likely to participate than controls who were more affluent.

POWER FREQUENCY MAGNETIC FIELDS: DESIGN CONSIDERATIONS
When the UKCCS was initiated, previous studies had suggested that a possible excess of childhood malignancy, particularly leukaemia and brain cancer, might be associated with above average levels of exposure to 50 or 60 Hz electromagnetic fields. The evidence, however, was inconclusive and further studies based on appropriate measurements were needed to elucidate the nature of the relationship between 50 and 60 Hz electromagnetic fields and childhood malignancy (Advisory Group on Nonionising Radiation, 1992).
We describe here the design of the part of the UKCCS relevant to exposure from 50 Hz magnetic fields (and harmonics below 800 Hz) generated by the distribution and use of electricity. Residential 50 Hz electric fields were the subject of a pilot study embedded in the magnetic field study and will form the subject of a separate paper.
The primary hypothesis, based on the results of the studies reviewed by the NRPB's Advisory Group on Non-ionising Radiation (1992Radiation ( , 1994, was that risk for some childhood malignancies is positively associated with the average magnetic field level to which a child is exposed in the year before diagnosis. In particular, average exposures greater than 0.2 microtesla (µT) in the year before diagnosis would confer greater risk for (i) all leukaemias; (ii) ALL and (iii) CNS tumours, than average exposure less than 0.1 µT. A further hypothesis is that risk for each of the above diagnostic categories increases steadily across the range of dose from less than 0.1 µT to greater than 0.4 µT.
During the course of the UKCCS, but before any analyses were undertaken, further measurement-based studies had refined the hypothesis, suggesting that if any risk did exist it was typically above the 95th percentile of the distribution in the controls of these studies Michaelis et al, 1997;Dockerty et al, 1998;McBride et al, 1999).

Assessment of EMF exposure
For the purposes of this study EMF exposure is defined as the time-averaged estimate of whole body average magnetic flux density as derived from the true root-mean-square (RMS) measurements made in three orthogonal directions. It is a physical entity for which a 'gold standard' measure is theoretically available, at least for the average field over a year -defined as the exposure of interest in this study. Personal monitors worn for extended periods of time would provide an approximation to the annual average exposure of an individual. Their use in this study was ruled out on two grounds: firstly, exposure measured by personal monitor after a diagnosis of malignancy cannot be taken as a measure of predisease exposure, since the lifestyle will change; secondly, the cost for the numbers involved would be excessive.
The measurement protocols in the UKCCS were based on a pilot study conducted by NRPB. A total of 51 children from the North East of the Leeds metropolitan area, of a similar age and sex distribution to all children diagnosed with cancer in the UK, wore a magnetic field monitor for 48 h. Contemporaneous measures of 50 Hz magnetic flux density were taken in the family room (2 h), at the centre of up to five rooms in the home (5 min per room), on the centre of the bed (2 × 5 min, separated by 2 h) and at the bedside (48 h). Measurements were made also in the classroom (5 × 5 min) of those children attending a school or nursery.
On the basis of this pilot study, the 48-h personal exposure of children was moderately well correlated with combined short-term measurements made in the bedrooms and family rooms. The correlation was improved when the assessment of exposure included school measurements and/or residential measurements over 48 h (r ≥ 0.8). It was concluded that household assessment could provide an adequate measure of an individual's exposure, provided that schools were also measured when relevant.
The pilot study indicated that a restricted set of measurements would classify, with acceptable sensitivity and specificity, an individual into the lowest 90% of exposure. For exposures in the top 10%, more extensive measurements would be required, to give a more precise exposure estimate. Resources, however, were not available for extensive measurements in the households of all cases and controls. A two-phase approach was therefore adopted for the UKCCS.
In the first phase, data were collected for a series of matched case-control pairs with a single control per case. Information on EMF exposures was gathered from five different sources (i) specified measures in the child's home; (ii) specified measures in schools or other institutions, such as purpose built nursery schools, attended by the child; (iii) a questionnaire on electrical appliances in the home; (iv) the proximity and type of overhead power lines nearby from an external sources questionnaire (ESQ) and (v) electricity company databases of historical load data and other operating characteristics.
In the second phase, more extensive measurements of EMF levels were made in the home for all children indicated by phase I to be in the top 10% of exposures, together with the corresponding matched case or control. The value taken to approximate the 90th percentile of the phase I measurement was 0.1 µT. Also included in phase II were individuals living in the proximity of high voltage overhead lines and underground cables and those with appliances specified in the phase I questionnaire.
Measurements were made after diagnosis, because of the retrospective nature of the study. Retrospective line load data for relevant high voltage lines were obtained from the electricity industry for the time when measurements were taken and for the year of interest, i.e. the year before diagnosis or for controls pseudodiagnosis. The line load data were used to compute fields for both the year of interest and for the measurement period. These data were used to adjust the measurements to generate a historical exposure estimate for the year of interest.
To prevent identification of high and low measurements by study technicians, meter readings were not displayed. Information on the levels of EMFs in individual homes and schools were provided to study participants on request, but otherwise remained strictly confidential.

The power of a two-phase approach to exposure measurement
It is straightforward to evaluate the power of a two-phase design compared either to a design in which all individuals have only phase I measurements, or to one in which all individuals have phase II measurements.
In the simple case where in the whole population the top p% of phase II exposures have a relative risk of r compared to the remainder, and the phase I measurement is used to identify q% of individuals, there are approximately 2q% of case-control pairs on whom phase II measurements are made.
Figures 4A (r = 1.5) and 4B (r = 2) consider a study of 500 case-control pairs. The three curves give the power to detect a difference at the 5% significance level for a study in which all individuals have phase II measurements, a study in which all individuals have only phase I measurements, and the two-phase design as described. The benefit of the two-phase approach depends on the sensitivity and specificity of phase I compared to phase II, but for a wide range of values there is a major gain in power for relatively little cost, compared with a phase I only study.

Inclusion
All families who had agreed to take part in the UKCCS were considered for inclusion in the EMF study. Eligibility for inclusion of both cases and controls was based on the eligibility of the home address, since exposure was based on household measurements.
Home addresses, of both cases and controls, were eligible if the child had lived there for 12 months or more before diagnosis or pseudo-diagnosis and was still living there. 'Homes' included fixed-site caravans, but not 'mobile' addresses. If a child was aged less than 12 months at (pseudo) diagnosis, an address was eligible if the child had lived there continuously between birth and the time of (pseudo) diagnosis. No attempt was made to measure previous homes.
If a case family was ineligible (did not live in current home for 12 months prior to diagnosis) or refused to take part in the EMF study, no control was invited to participate. Otherwise, the family of the matched control with the lower identification number was approached. If this first control family was ineligible or refused to participate, then the second control was approached and incorporated in Phase I, if eligible. If both controls were ineligible or refused to participate, the exposure of the case was not measured.
Due to delays in starting the EMF study, there was an interval of variable length between the date of (pseudo) diagnosis, and the date of the Phase I measurements. This affected the number of case-control pairs on whom measurements were available.
Schools were eligible if the child had attended them for 15 or more hours per week during the winter (October to March) immediately preceding (i.e. not including) the date of (pseudo) diagnosis. If the child attended more than one school in this period, the school in which the longest time was spent, in terms of the total number of hours, was chosen. 'School' included pre-schools in established, purpose-built buildings.

Instrumentation
Commercially available Emdex II magnetic field meters (Enertech Consultants Ltd, USA) were used to measure resultant magnetic fields in the broadband frequency range 40-800 Hz. The meters incorporate three orthogonal sensing coils to measure true RMS magnetic flux density over the range 0.01-300 µT. The measurement resolution is 0.01 µT and the overall accuracy within the range 0.01-10 µT is ± 10% ± 0.005 µT. Computer memory inside the meter allowed data logging over extended periods. For short- One-stage: phase II measure One-stage: phase I measure Two-stage: combination of phase I and phase II One-stage: phase II measure One-stage: phase I measure Two-stage: combination of phase I and phase II P = 0.1,q = 0.2, relative risk = 2.0

Figure 4
Comparison of three sampling strategies: n = 500, P = proportion of controls exposed (phase II measure), q = proportion classified as exposed by the surrogate (phase I measure) term measurements, of two hours or less, magnetic flux density was recorded at sampling intervals of 1.5 and 3 s in the phase I and phase II assessments respectively. The sampling interval was adjusted to 10 s for longer measurements.
The instruments were calibrated in a Helmholtz coil facility which had an uncertainty of ±2.5%, traceable to national standards. Instruments were calibrated before issue to the study technicians and at intervals during the course of the study.
A check source generating a magnetic flux density of about 5 µT was developed to assess instrument integrity prior to and following each set of site measurements. General faults or the failure of any single sensing coil could be detected. Check source records were downloaded with measurement data following each site visit.
A subset of phase II case-control assessments also recorded electric field strength for a pilot study, using the same meters fitted with an external sensor. Based on laboratory tests the attenuation of magnetic flux density by the sensor was about 4%.

Phase I residential measurements
The phase I residential measurements comprised in order: (i) three 3-min spot measurements taken in the centre of the child's bedroom, at the centre of the child's bed, and on the centre of the child's pillow; (ii) a 90-min measurement taken in the centre of the main family room; (iii) a repeat of the three spot measurements after the 90-min measurement.
Apart from the readings made on the bed, all these measurements were made with the instruments placed at a height of 1 m from the floor in a polypropylene stand and at least 1 m from any operating appliances.
The aim was to measure the homes of the case and of the corresponding control less than 4 months apart. If the period between measurements was greater than this, the earlier measurement was repeated. If a household measurement was repeated, the earlier reading was retained for study of repeated measurements.
During the phase I household visits, questions were asked about the time that the child spent in bed and at school, which were used in the calculation of time-weighted exposures.

Phase II residential measurements
The matched case-control pair of phase II measurements were made as close as possible to each other and within a 4-week period.
Where the phase I questionnaire identified that a child's exposure might include exposure from a night storage heater or underfloor heating in the bedroom, then the phase II measurements covered a period during which the particular appliance was in use, typically during the winter months.
Phase II measurements took place during a period agreed with the appropriate electricity company as 'a period of typical operation of the local distribution system'. The measurements comprised: (i) four 3-min spot measurements taken at the centre of the family room, at the bedside position to be used for the 48-h measurement, at the centre of the child's bed and at the centre of the pillow; (ii) a 48-h measurement taken by the side of the middle of the child's bed; (iii) a repeat of the four spot measurements after the 48-h measurement.
Again, apart from the readings made on the child's bed, all of these measurements were made with the instruments placed at a height of 1 m from the floor in a polypropylene stand and at least 1 m from any operating appliances. Tamper-proof holders were used for the extended 48-h measurements by the side of the child's bed.

Measurement in schools
School measurements took place when the heating systems were operating normally, which in England & Wales was taken to be the months October to March inclusive. There were two measurement schemes. The first was used when the child occupied a single classroom for the majority of the time spent at school during the relevant winter period, typically in primary schools. This scheme consisted of five 2-min spot measurements near the centre and four corners of this single room. When many classrooms were used, typically in secondary schools, spot measurements were made in up to five of the rooms most frequently used during the relevant winter period. In each of these rooms, a single measurement was made near the room centre, the measurement time totalling 10 min and the measurements in the different rooms being of equal duration. All measurements were made at a height of 1 m from the floor and at least 1 m from any operating mains appliance.

Appliances
During the phase I visit, questions were asked on use of underfloor heating and night-storage heating in the house including the child's bedroom, with observation of position and type. These appliances were chosen firstly on the basis that the position of the appliance and child were well defined over the period of use and secondly that the appliance field contribution over the likely period of use could increase mean exposure from less than 0.1 µT to greater than 0.2 µT.
Where such an appliance was identified and used, the case-control pair qualified for the phase II assessment; this was carried out when the night-storage heater/under-floor heating was operating normally. Using appropriate periods of the 48-h measurement, specific exposure calculations were undertaken for these appliances, their contribution considered to be material only during the winter night-time. Use of an electric blanket was also ascertained; if a blanket was used and was available, permission was asked to send it to NRPB for detailed magnetic flux density measurements to assess the exposure.
In phase I exposure estimates, computed appliance field over the relevant period of use was combined with the measured background field by means of root sum of squares (RSS). Algorithms for computing appliance field were derived from investigations in the laboratories of NRPB and electrical testing houses, corroborated by evidence from UKCCS measurements. The same approach was adopted for phase II where there was no operating heating appliance during the overnight bedroom measurements.
Further questions were asked on other specific household sources of EMF (hairdryers, microwave ovens, electricity meters and central heating systems) to assess potential misclassification. This is to be the subject of secondary analyses. supply close to homes and schools. Designed in cooperation with National Grid Company (NGC) and the Regional Electricity Companies (RECs) for England & Wales, and Scottish Power and Scottish Hydro Electric in Scotland, the specific purposes of the questionnaire were as follows: to identify high voltage lines and/or underground cables that had the potential to produce annual average field levels above 0.1 µT at the home and/or school; to obtain load and other circuit information to enable reconstruction of historical exposure; to check that the electricity distribution system was operating typically at the time of measurement; to identify substations and particular types of low voltage circuits that were near to the location of interest. The ESQs, blinded with respect to case-control status, were evaluated by NRPB.
Entry into phase II, on the basis of the ESQ was determined by the following criteria for England & Wales: (i) an NGC line within 400 m or underground cable located within 100 m of the home; (ii) a REC line of 66 kV or above, located within distances up to 200 m from a home; (iii) a REC line of 11-33 kV located within distances up to 80 m from a home; (iv) an operating substation or a phase-separated underground cable of 33 kV or above within 20 m; (v) a 3-phase 415 V distribution circuit located within 2 m of the home; (vi) atypical conditions of distribution circuits. In Scotland, equivalent criteria based on line voltages were used.
The threshold distances used to determine the relevant phase II circuits were based on design rating and therefore judged to be conservative. Typical loads on a REC line were found to be less than 20% of the circuit rating. Analysis of load data from a sample of NGC circuits during a winter period had indicated previously that 50% and 95% of the circuits had respective average loads of less than 30% and 50% of their rating (D Renew, NGC, personal communication).
External sources questionnaires were issued also for the nonmeasured participants in the main study. These were interviewed cases and controls who were either ineligible for EMF measurements (because they had moved house either during or since the time of interest) or who were eligible but had declined to participate in this part of the study. To investigate the possible effects of refusal bias, ESQs were also issued for a random sample of one thousand (approximately two-thirds) of the first-choice controls who had refused to take part in the full study.

Historical exposure computations
Line load data were requested for all overhead lines whose voltages were 66 kV and above within certain threshold distances from the location of interest. For REC lines, threshold distances were used to identify homes and schools where the annual average magnetic field could exceed 0.15 µT. The distances were based on maximum loads as defined by the circuit rating, and relative phasing information. In practice, loads were well below the circuit rating as described in the previous section. For transmission lines, the annual load current or half of the average circuit rating and other detailed operational information, were used to define line-specific threshold distances. These distances were established for annual average magnetic fields of 0.1 µT. On a similar basis, load data were also requested for phase-separated underground cables of 66 kV and above, located within 20 m of the property.
National Grid Company's EM2D program was used to compute magnetic field levels. The program, evaluated by NRPB for the purposes of the study, was used to calculate the RMS magnetic flux density averaged over the measurement period and over the year of interest, which were used in the exposure algorithm. To set up a model, circuit parameters including line/tower type, conductor height above ground, distance, measurement height, phasing of circuits and current direction, were used as input data. A three-dimensional (3D) version of the program was used in some circumstances.
Load data contemporaneous with the period of measurement and the year of interest were requested. These were usually provided as a continuous series of load current/power values for each circuit, sampled at 1-min intervals for phase I, and at halfhour intervals for the year of interest. To allow for the temporal characteristics of line loading (Kaune et al, 1998), the values were sorted into separate date-time files to cover the appropriate periods of the year of interest. Where companies could not provide data for the relevant period, data for the closest alternative period were requested. If load data were not available, engineers were requested to provide best estimates based on available company records.
In a previous study carried out by NGC (Swanson, 1995), good agreement was found between instantaneous measured magnetic fields and the computed values. Using a similar computational approach to that used in the UKCCS, the error was estimated to be ± 6% at 0.1 µT. in the UKCCS, the typical overall uncertainty was estimated to be ± 20%, and checks with measurements implied worse case uncertainty of approximately ± 40% within 50 m of a power line, falling to ± 30% at 100 m. The additional computational error was mainly due to uncertain ESQ information, particularly the location and elevation data, and departures from the 2D modelling assumption.
In the UKCCS the year of interest exposure adjustment was made for all cases and controls where homes and schools attended met the criteria outlined above and for which load data were received. The calculation was based on a root sum/difference of squares method. A residual component, representing a constant 'local source' was derived by subtracting the contemporaneous calculated field from the measurement, and this RSS was combined with the year of interest calculated field. There were instances where the calculated field was greater than the measured field and these were ascribed to the inaccuracies in the parameters for the computational model, which have been described above. A comprehensive review was carried out of the modelling parameters used. Where calculated fields remained greater than the contemporaneous measured fields, the year of interest exposure was determined by adjusting the measured field by the ratio of the calculated fields for the two periods.

Algorithm for combining measurements
On the basis of heating appliances contributing to exposure only during winter months, average exposure in the year preceding (pseudo) diagnosis was estimated using the following algorithm: The weights (W 1 -W 4 , ΣW i = 1) were individually calculated for each child to reflect the time spent in bed and in school, as recorded in the study questionnaire. Where any of the information needed to compute the weights was missing, average, age-related values were used.
In phase I, bed and home non-bed exposures were estimated from measurements in appropriate locations covered by a 2-h period. Bed exposure was estimated from bedroom spot measurements and the average of the 90-min family room measurement was used as an estimate of exposure for time not spent in bed or at school. In phase II, bed exposure and home non-bed exposure were estimated from the 48-h measurement and the phase I family room measurement. School exposure was common to both phase I and phase II.
Where necessary, to allow for changes in line loading and circuit configuration between the year of interest and the time of measurement, home and school measurements were adjusted to reflect the situation in the year before (pseudo) diagnosis. For bed-winter exposure, measurements were combined with appliance fields where appropriate. For bed-summer exposure, measurements were unadjusted.

Characteristics of the phase I and phase II measures
The validity of the measures used in this study has been evaluated against the use of personal monitors in a substudy carried out by NRPB. This 'gold standard' validation study, estimated individual exposure by the use of personal monitors in three separate weeks during the year on a sample of 100 healthy children. It will be  reported elsewhere. The sample was chosen to have 50 children from 50 households with phase I measurements less than 0.1 µT, and 50 with phase I measurements greater than 0.1 µT.

Phase I measurements
The distribution of phase I measurements based on nearly 5000 measurements to the end of the study is displayed for annual average exposure, on a log scale, in Figure 5. The distribution is strikingly close to log normal, with a geometric mean of 0.032 µT, and a retransformed standard deviation of 0.031 µT. The mean phase I exposure level falls between the geometric mean levels of 0.029 and 0.037 µT, published previously for UK homes (Merchant et al, 1994a(Merchant et al, , 1994bPreece et al, 1996;Swanson and Kaune, 1999). The residential and school components of the total exposure have a similar distribution. There was some evidence of diurnal and seasonal variation in phase I measurements, reflecting the known daily and seasonal changes in the use of electricity. Table 7 indicates the correlation between the different components of the phase I measurements, and with the overall residential exposure from phase I. All correlations are given for data on a log scale. Table 8 gives the correlation between phase I measurements taken at different times in the same home, by the time interval between the two measurements. The data include a sample taken in Scotland to examine replication of phase I measurements. The correlation shows no decrease with time, for intervals of up to 4 years between measurements.
These phase I Tables, and those for phase II, are based on all measurements taken. This includes measurements on individuals who will not be included in the final analysis; for example, those for whom the matching case or control was not measured, or those found to be ineligible for the study as a whole.

Phase II measurements
By the end of the study, nearly 1200 phase II measurements were made. Seasonal variation similar to that in the phase I data was observed. Table 7 gives the correlation between the different component measures in phase II and with an overall phase II household measure based on the 48-h bedside measurement, together with the corresponding phase I measures, all on a logarithmic scale. Table 9 gives the correlations between phase I and phase II estimates of total residential exposure, for different intervals of time separating them. The correlation is only weakly related to the length of the interval. The correlations are higher in Table 9 than in Table 8, which reflects the greater variance among individuals on whom phase II measurements are taken. The standard deviation of the differences between the two measurements (on a logarithmic scale) is almost identical in the two situations, about 0.8.
The relationship between the two measurements can also be expressed in terms of sensitivity and specificity as mentioned earlier. Table 10 cross-tabulates the two measures in terms of the cut-off for phase II, i.e. 0.1 µT, and the level at which risk is hypothesized to be elevated, i.e. 0.2 µT. The individuals in Table  10 with phase I measurements less than 0.1 µT represent only 15% of all such individuals, whereas all those with phase I measurements greater than 0.1 µT should have phase II measurements and thus be included in this Table. With this correction, the sensitivity and specificity of phase I with respect to phase II are 0.80 and 0.98 respectively. As can be seen from Figure 4A and B, these values suggest that the two-phase approach will lead to a substantial increase in power and be close to that achieved by taking phase II measurements on everyone.

Analysis
The diagnostic categories on which the primary analysis will focus are defined by the full study hypotheses, namely all leukaemias, ALL, CNS tumours and others.
The data relating to EMF exposure are classified in three groups.
In Group A, complete data (i.e. residential and school measurements where necessary and external source data) are available for both members of each case-control pair. It includes approximately  60% of all interviewed cases and a corresponding number of controls, not all which were necessarily first choice. In Group B, EMF measurements are not available for each member of a pair, but external source data are available both for the case and the first-choice control.
In Group C, external source data are available for a random sample of 1000 of the 1582 first-choice controls for the cases with measurements in Group A that did not have full measurements and may or may not have had family interviews.
The primary analyses have been based on the case-control pairs with full EMF data available (Group A). The numbers of such case-control pairs for each main diagnostic category are given in Table 11. Approximately 20% of case-control pairs had phase II measurements. Annual average exposures based on the phase II measurements were used in the analysis when available on both a case and the corresponding control. If a phase II measurement was available for only one of a case-control pair, then phase I measurements were used for both. For the remaining 80% of case-control pairs, the annual average exposure based on the phase I measurement was used.
We will use as the exposure of interest the arithmetic average exposure in the year preceding (pseudo) diagnosis. Analyses will focus on comparisons between high (greater than 0.2 µT) and low (less than 0.1 µT) exposures and on a linear relationship between risk and dose.

RADON AND TERRESTRIAL GAMMA RADIATION
Environmental exposure to low-level ionizing radiation is principally from radon and its decay products, cosmic rays, terrestrial gamma rays and radionuclides in food and drink (Hughes and O'Riordan, 1988). The ability of high linear-energy transfer alpha particles present in the terrestrial environment to cause cell damage, potentially leading to malignancy, has prompted particular interest. Radon (including its decay products) is established as a possible cause of lung cancer (BEIR IV, 1988) but other health effects have been suggested such as leukaemia in both adults and children (Henshaw et al, 1990).
Radon gas tends to become trapped within houses and may prove to be a source of a significant dose of alpha radiation. Representative measurements of the amount of radon in homes have been made on a national scale by the NRPB (Wrixon et al, 1988). Radon levels, however, vary between adjacent houses and direct measurements rather than geographic generalizations are essential to determine the levels to which individuals have been exposed. Attempts have, therefore, been made to measure the levels of radon in the present and past homes of all the children participating in the UKCCS.
Absorbed dose rates from external gamma sources are about twice as great indoors as outdoors in the UK. The mean indoor absorbed dose rate of 60 nGy per hour, which translates into an effective dose in dwellings of 280 µSvy -1 , is lower than in most countries studied (United Nations Scientific Committee on the Effects of Atomic Radiation, 1993) but is based on survey data from some years ago and a sample size of 2300 households. External gamma radiation indoors has also been measured and this survey will materially add to knowledge of UK gamma distribution generally as well as specifically investigating any association with childhood cancers.

Materials and methods
As part of the study, a face-to-face interview was conducted with each child's parents. A full residential history for the child was collected and all UK addresses lived in by the child, from birth to diagnosis, for 6 months or more were targeted for measurement of both radon gas and terrestrial gamma radiation. A written request to participate, in what was described as a 'Child Health Survey', was made to the occupants of each past residence, with a follow-up letter if there was no response to the first. Following agreement, detectors were sent to each household with instructions to place one in the main bedroom and one in the main living area. Two passive radon detectors, provided by the NRPB, were used to measure the concentration of radon gas (Wrixon et al, 1988) and two passive detectors were used to measure the terrestrial gamma radiation. After 6 months had elapsed, a letter was sent recalling the detectors, which were returned to the NRPB in pre-paid envelopes for processing and measurement of the cumulative radon exposure indicated by the alpha-particle tracks made by the radon decay products. We describe here the methods by which exposure to radon was assessed, using the data obtained for the control children to illustrate it. Details of the less complex assessment of exposure to terrestrial gamma radiation will be reported later with the results. Table 12 shows the number of control households approached, the number that participated for which measurement was obtained and the fate of the detectors that were sent out. Although detectors were intended to be in place for 6 months, the period over which measurements were actually made varied. The analyses were restricted to those measurements from detectors which had been in place for between 5 and 7 months within a household for both living room and bedroom. After excluding faulty detectors and other missing information, there were 11 356 detector measurements available for 5678 control households (44.5% of the initial total).

Outdoor correction
The modelling of seasonal correction factors assumes the frequency distribution of radon concentrations in households is log-normal. Previous research, based on the NRPB data, suggested  the overall distribution followed a log-normal distribution more closely when each household measurement had 4 Bq m -3 subtracted (Gunby et al, 1993). This was informally termed an 'outdoor correction' factor as limited surveys of radon indicated outdoor levels to be low, around 4 Bq m -3 . The assumption driving the 'outdoor correction' was tested on the UKCCS data prior to correction for season.

Seasonality
The aim of the radon survey was to measure average concentration of radon gas over a year. Radon becomes trapped within a building, as it is drawn up from the underlying rock and released gradually from the house through windows, doors and other ventilation. Different levels of heating and ventilation throughout the course of a year cause radon levels to fluctuate by season (Pinel et al, 1995). Due to the constraints of time placed on an epidemiological study, it is usual for radon detectors to be in place for approximately 6 months, therefore, a multiplicative factor is required to correct for seasonal variation. The NRPB have derived corrections from their own national survey of houses (Wrixon et al, 1988). These, however, are not based on the same sampling frame as the UKCCS data which is for houses in which there are young children. For this reason, seasonal correction factors were derived from the UKCCS sample, extending standard methodology by testing the validity of regional-specific factors. In broad terms, the methodology of Pinel et al (1995) uses the set of household measurements to calculate the mean concentration of radon for each month of the year; these monthly values are then used to derive seasonal correction factors. The measurement, m i for each household i, obtained by the NRPB was in kilobequerels per cubic-metre hours (integrated exposure). This converts to a mean concentration of where p i is the number of days the instruments were in place. Measurements often differ substantially between living room and bedroom, due to living rooms generally being closer to the ground floor; the source of radon. The NRPB surveyed times spent within different parts of the house (Wrixon et al, 1988) which suggested an average person spent 55% of their time in home in the bedroom and 45% in the other living areas. Previous UK studies (Darby et al, 1998) have used this division to allocate weighted average concentrations for each house, and this was used to calculate m i in preference to a simple average.
The mean concentration of radon for each month was calculated as for all months, j = 1, …, 12 with the middle day of the measurement period for dwelling i falling between the middle of the month jϪ1 and the middle of the month j. This ensures the measurement duration is symmetrical around the month it is assigned. N j is the number of measurements in the jth month. The most appropriate locational measure for a log-normal distribution is the geometric mean, which was used throughout the modelling procedures to estimate the average radon concentration. The geometric mean concentration for each month was calculated as d j = exp(x j ).
A sine-cosine curve was fitted to the geometric mean monthly concentrations. The derived seasonal variation allows a correction factor to be calculation (Pinel et al, 1995).

Seasonality correction by region
Seasonality is partly a function of the magnitude of variation in local climatic conditions, and the severity and lengths of seasons differ substantially across Great Britain. Therefore, it seems reasonable that seasonality in levels of radon concentration may vary across the UKCCS study regions, a hypothesis tested formally using these data. Mean monthly radon concentrations were calculated separately for households within each UKCCS study region, and these estimates were used with an indicator variable for the study region as a formal test of whether estimated coefficients were significantly different between regions. Statistically significant differences would support use of regionspecific seasonal correction factors.

Results
The frequency distribution of the radon concentration by household (regionally adjusted for season) is log-normally distributed, as would be expected for data of this type ( Figure 6). Prior to seasonal correction the annual average radon concentration (arithmetic mean) for all households was 21.32 Bq m -3 , ranging between 0.36 and 1192.97 Bq m -3 (Table 13). Regional annual averages, prior to seasonal correction, ranged from 16.36 Bq m -3 in Scotland to 34.43 Bq m -3 in the South West study region. Household measurements of radon concentration, prior to 'outdoor correction', closely follow a log-normal distribution and there was no improvement in fit after the correction (data not shown).
Seasonal correction factors for a detector in place for 6 months exactly are shown in Table 14. The measure of the proportion of variation in the data explained by the model R 2 , are reported. All models fit the data with between 0.50 (South West) and 0.79 (Trent) of the variation explained by the seasonal curve. The model which used mean monthly radon concentrations calculated for each region separately, with an indicator variable for region, gave a significantly better fit than a model without the term  (Table 15). This provides confirmation that the calculation of seasonality correction factors on a region-specific basis is appropriate. Table 16 shows the annual average household concentration of radon for each study region after applying the region-specific seasonality correction factors. The South West study region continued to measure the highest average levels, 34.47 Bq m -3 and Scotland the lowest, 16.74 Bq m -3 . Table 17 divides the households into categories, the highest category containing 0.7% of all measured control households, being that equal to or above the 'action level' of 200 Bq m -3 , determined by the NRPB as the level above which remedial work to a house should be carried out.

Discussion
The UKCCS has collected radon gas measurements for a large number of residences, which are linked to information about the families living there. These measurements will provide the basis for an epidemiological analysis of the association between radon in homes and the incidence of childhood cancer. The study for which these measurements were made involved a populationbased sample of children with cancer and two controls. The homes for the selected control children were representative of homes in the general population containing children of the same age and sex as the affected children. Satisfactory measurements were, however, obtained for less than half the target number and the effect of non-compliance on the risk estimates will have to be taken into account in studies of the association of childhood cancer with radon exposure.
The levels of radon gas in homes were generally low, with 74% of households having annual average radon concentrations of less than 25 Bq m -3 . The seasonally adjusted (arithmetic) average annual concentration for the entire study, 24.7 Bq m -3 , was higher than the NRPB estimates for UK housing of 20.5 Bq m -3 (Wrixon et al, 1988). This may reflect the differences between the sampling frames, the present one being houses with children, and between the UK housing stock at the two periods. The low proportion of households exceeding the government 'Action Level', 0.7% above 200 Bq m -3 , is consistent with, but higher than, the national average of 0.5% (Kendall et al, 1988).   In contrast to our findings, previous research based on NRPB data suggested that the overall distribution of radon followed a log-normal distribution more closely when the entire data set had 4 Bq m -3 removed (Gunby et al, 1993). This was informally termed an 'outdoor correction' factor as limited surveys of radon indicated outdoor levels to be low: around 4 Bq m -3 . Unfortunately, exact comparisons with our data could not be made as the results presented by Gunby et al (1993) were truncated, and omitted the first 5% of the data. The occupancy patterns used for the allocation of a weighted average household radon level were from a national study based on persons of all ages. Two small studies carried out for the UKCCS on children do not contradict these findings.
The data were examined regionally because of concern that seasonal patterns in radon concentrations may vary across the country because of differences in climate, housing type and behaviour. Examination of the data confirmed the fact that seasonal correction factors would be more appropriately calculated on a region-specific basis. The study regions were defined a priori for the purposes of data collection, were geographically contiguous and contained approximately similar numbers of persons. To check that the hypothesis can be supported by the data, the sine-cosine seasonality curve was modelled for region-specific mean radon concentrations. The model fit was improved after adding a variable for the region, as would be expected after inspection of the seasonal correction factors, with Scotland and the North West being significantly different from the other study regions.
The data set described here documents an important profile of radon gas concentrations in a specific section of UK housing; namely, that inhabited by children. This is the largest data set of measurements aimed not at identifying areas affected by high levels of radon (Miles et al, 1996) but at characterizing a population-based sample of the population. The appropriate allowance for geographical differences in seasonality provides a basis for epidemiological investigations into the association between radon and childhood cancer.

DIAGNOSTIC REVIEW AND CLASSIFICATION OF SOLID TUMOURS
For the purposes of data analysis cases must be classified according to diagnosis into a manageable number of groups. Descriptive epidemiological data on cancer are often presented in terms of categories in the neoplasms chapter of the International Classification of Diseases (WHO, 1992). This system is based on anatomical site and while this is broadly satisfactory for adult onset cancers, where over 85% of non-haematological malignancies are carcinomas (Berg, 1996) this is not satisfactory for paediatric cancers which form a specialized group of rare neoplasms with adult-type carcinomas being exceedingly uncommon (Stocker and Askin, 1998). Paediatric cancer data are more satisfactorily grouped, utilizing the International Classification of Diseases for Oncology (ICD-O) (WHO, 1990). In this system tumours are described in terms of morphology and topography. The second edition of ICD-O (ICD-O2) includes more than 1600 morphology codes (M codes) and more than 300 topography codes (T codes), consequently hundreds of thousands of M and T code combinations are possible. A system that groups combinations of M and T codes together is, therefore, required to provide a smaller number of diagnostic categories for practical purposes. This report describes the diagnostic review of non-leukaemia cases and the classification scheme based on ICD-O2 which has been employed to group cases for analysing data to test the UKCCS hypotheses.

The review process
All but the leukaemia cases were eligible for diagnostic review regardless of whether the children's parents were subsequently interviewed for the study. Registration details were obtained principally from the UKCCSG Data Centre. During the period of the study more than 80% of all incident cases of cancer in children in England, Scotland and Wales were referred for initial treatment to UKCCSG paediatric oncology member centres. Registration details for remaining cases were obtained from the UKCCS regional epidemiology centres or, for a small number of cases, through the National Register of Childhood Cancers. UKCCS registration numbers for each eligible case were obtained from the  respective UKCCS regional epidemiology centres to allow diagnostic data to be linked to the epidemiology data. Pathology reports, and as appropriate, radiology reports were obtained for all non-UKCCSG cases and for those UKCCSG cases where this information was not appended to the registration form. More than half of all solid tumour cases were entered into national or international clinical trials. Special histopathological review is automatically carried out for all such cases entered into trials and it was agreed nationally that diagnostic review pathology reports for them would be made available to the UKCCS. Members of the pathology review panels associated with each clinical trial additionally agreed to review diagnoses of respective non-trial cases eligible for the UKCCS. Collation of clinical trial diagnostic review reports and organization and coordination of review of non-trial cases nationally was carried out centrally for the UKCCS at the CRC Paediatric & Familial Cancer Research Group   0  0  0  0  0  0  2  3  2  3  Other specified renal tumours  3  1  2  2  4  0  0  0  9  3  Total renal tumours  15  6  64  62  21  18  6  7  106  93  Hepatoblastoma  5  5  11  4  0  1  2  0  18  10  Hepatic carcinoma  0  0  0  0  1  2  0  1  1  3  Total hepatic tumours  5  5  11  4  1  3  2  1  19  13  Osteosarcoma  0  0  0  3  11  8  17  17  28  28  Chondrosarcoma  0  0  0  0  0  0  1  1  1  1  Embryonal rhabdomyosarcoma  2  3  30  19  19  11  7  4  58  37  Alveolar rhabdomyosarcoma  1  1  7  4  1  3  2  3  11  11  Other and unspecified rhabdomyosarcoma  1  0  4  3  2  4  1  1  8  8  Other specified bone tumours  0  1  0  0  1  0  1  0  2  1  Intracranial & intraspinal soft tissue tumours  0  0  0  3  0  0  0  0  0  3  Other specified soft tissue tumours  3  4  6  1  10  6  11  9  30  20  Unspecified soft tissue tumours  1  1  4  1  3  1  4  0  12  3  Total sarcomas and other mesenchymal tumours  8  10  51  34  47  33  44  35  150  112  Gonadal germ cell tumours  4  0  14  2  0  5  5  5  23  12  Intracranial & intraspinal germ cell tumours  3  1  1  2  2  3  17  8  23  14  Other non-gonadal germ cell tumours  3  2  3  12  1  1  1  1  8  16  Gonadal carcinoma  0  0  0  0  1  0  0  1  1  1  Total germ cell, trophoblastic and other gonadal tumours 10  3  18  16  4  9  23  15  55  (PFCRG) in Manchester. Pathology review panels for Hodgkin's disease, non-Hodgkin's lymphoma, brain tumours, neuroblastoma, renal tumours, liver tumours, bone tumours, soft tissue sarcoma and germ cell tumours were already set up in association with clinical trials. An additional pathology panel was set up to review diagnoses of the remaining rare tumour groups. Thus, the diagnoses of all cases both clinical trial and non-clinical trial were reviewed by the same specialist panels. Following receipt of registration details, confirmation of eligibility for UKCCS and ascertainment of a UKCCS registration number, clinical trial status was obtained from the UKCCSG Data Centre. For non-trial cases, histopathology material was requested by the coordinating centre in Manchester from the relevant consultant pathologists. Tissue sections were cut and mounted, and distributed to the appropriate diagnostic panel in accordance with the panel's requirements. Consensus reports were prepared on each case by the review panels and reports sent to the Manchester centre in due course. For trial cases a similar review process was followed but this was coordinated by the UKCCSG Data Centre in Leicester. For these latter cases, copies of the consensus pathology panel reports were provided to the coordinating centre in Manchester. These reports constitute the final reviewed diagnosis on each case.
Diagnoses were coded according to ICD-O2 and entered together with identification and other registration details onto the histopathology database held in Manchester. A hierarchical confirmation of diagnosis code was also allocated as follows: 1. special histopathological review (trial and non-trial) 2. review at UKCCSG member centre or other specialist oncological or neuropathology centre and pathology report received in Manchester 3. other pathology report received in Manchester 4. diagnosis histologically confirmed but pathology report not seen in Manchester 5. no histological confirmation but positive biological marker 6. clinical diagnosis with radiology report seen in Manchester 7. clinical diagnosis radiology report not seen in Manchester, 8. UKCCS registration form or notification from CCRG in Oxford with no accompanying pathology or radiology report.
Each case was allocated to a main diagnostic group and subgroup within the main group according to the classification scheme described below.

Diagnostic classification scheme for childhood cancers
In formulating the classification scheme priority was given to allocating individual tumour types to diagnostic groups on the basis of what is currently known about the histogenesis and biology of the tumours. The following features were incorporated into the scheme: (i) the more common types of childhood malignancies were specified as individual categories, (ii) rare tumours of particular biological interest were also individually specified, (iii) diagnostic categories were organized in a hierarchical way to allow major groups to be sub-divided into sub-groups and where appropriate into sub-categories within sub-groups. This allowed flexibility with regard to the diagnostic specificity with which data may be analysed and presented while retaining a standard framework.
For each morphological entity, or group of entities defined by ICD-O M codes, a set of corresponding topography codes was also defined. A computer data input routine was developed such that only compatible paired M and T codes would be accepted and incompatible M and T codes would be unclassified. This arrangement acts as an automatic check at the data input stage on the accuracy of information received on morphology and primary site of tumours and the accuracy of coding. Furthermore, by these means common data input errors such as reversal of two digits would be detected at the outset. The draft scheme was tested on the Manchester Children's Tumour Registry database which was already coded in ICD-O2. Primary information on those cases that were not classified to a diagnostic group was checked and inconsistencies in the allocation of M and T codes within the draft scheme were corrected. The final scheme and accompanying data input programme were then used for the UKCCS diagnostic data.
The UKCCS case series includes some relatively recently described diagnostic entities for which ICD-O codes do not exist.
Where it was thought that no appropriate alternative code existed, it was necessary to create a small number of new codes and to incorporate these into the classification scheme. The final scheme comprises 11 major diagnostic groups with a total of 45 subgroups. The distribution of interviewed cases included in UKCCS according to this classification scheme is shown in Table 18. Detailed results of the histopathology review will be presented elsewhere (Birch and Kelsey et al, in preparation).

Rationale of classification
The acute leukaemias have been traditionally classified by morphological and cytochemical criteria into lymphoblastic (ALL) and myeloblastic (AML) sub-classes for diagnostic and treatment purposes (Henderson et al, 1996). Beginning in the 1970s, it became apparent that these two broad divisions disguised considerable biological heterogeneity. Subsequently, immunophenotypic and chromosomal variations (structural or numerical) were identified and standardized using monoclonal antibodies (Greaves et al, 1985;van Dongen and Adriaansen, 1996) and Gbanded chromosome analysis (Pui et al, 1993;Raimondi, 1993) respectively. Over the past decade, changes in individual genes (rearrangements/fusions, mutations, deletions, etc) have been recognized and have provided a further hierarchical tier of leukaemia classification (Greaves, 1996a). The validity of this biological approach has been endorsed by the finding that subsets so defined had different prognostic outcomes in the context of particular therapies (Kersey, 1997;Pui and Evans, 1998). The design of current therapeutic trials or patient selection for bone marrow transplantation now exploits these biological classifications.
As acute leukaemia in children (and in adults) is not a single biological entity or disease, it seems highly improbable that its aetiology will be attributable to a single causal mechanism. Epidemiological case-control studies in the past have mostly divided cases into myeloid or lymphoid (the ICD system). Although both ALL and AML as well as CML may be initiated by ionizing radiation, there are several precedents in haematological malignancy for more selective or preferential aetiological associations. These include HTLV1 with mature T-cell leukaemia/ lymphoma, EBV with Burkitt-like B-NHL, benzene with AML, certain genotoxic therapeutic agents with AML (Henderson et al, 1996;Smith et al, 1996;Greaves, 1997). The incidence rates or relative frequency of subtypes also appear to vary geographically, especially with respect to the age peak of incidence in ALL at 2-5 years of age which consists of B-cell precursor ALL or common c-ALL (Greaves et al, 1985(Greaves et al, , 1993. The prominence of height of this peak in different countries has changed asynchronously during the past 60 years (Greaves et al, 1985(Greaves et al, , 1993 and although changes in ascertainment or diagnosis may have contributed to this, c-ALL has been proposed as having a distinct infectious aetiology in which social circumstances modulate patterns of exposure (Greaves, 1997). c-ALL in turn is heterogeneous by molecular genetic criteria but two large subgroups predominate -hyperdiploidy and TEL (ETV6)-AML1 (CBFA2) gene fusion (Bernard et al, 1996;Secker-Walker, 1997).
Age is a prognostic factor in ALL and it is now recognized that the higher risk groups of ALL below 1 year of age or above 10 years seldom have the common molecular characteristics of c-ALL in the 2-to 5-year age peak. Infants (< 1 year) in particular appear to have a distinct biological subtype of disease in which translocations at 11q23 are predominant. These generate fusions of the MLL gene with partner genes on other chromosomes, especially AF4 (at 4q) (Pui et al, 1995;Greaves, 1996b). Similar translocations involving 11q23/MLL (but with different partner genes) are common in paediatric AML, particularly below the age of 3 years. 11q23/MLL gene fusions are also frequent in secondary leukaemias arising from prior therapeutic exposure to drugs that inhibit the enzyme topoisomerase II. On this basis it has been proposed that transplacental exposure to chemically similar compounds in utero might provide a selective causal pathway for infant AML and ALL with 11q23/MLL rearrangements (Ross et al, 1994b;Greaves, 1997). Some prior epidemiological data suggest selective associations for acute leukaemia, and especially AML, in very young (< 2 years) patients (Ross et al, 1994a).
These data and insights all suggest that it would be prudent whenever possible to incorporate biological classification into epidemiological studies of leukaemia. This is only worthwhile, however, when studies are of a sufficient size to generate adequate numbers in the smaller subgroups. This is the first large case-control study where a systematic biological classification has been undertaken. Of the five major hypotheses under discussion (see p. 1074), those involving EMF and ionizing radiation do not involve any prior consideration of biological subsets of ALL. With respect to chemical exposures (in utero) and infection, however, we have identified biological subgroups in advance for which selective associations will be sought (e.g. delayed infection and HLA associations in c-ALL).
As diagnostic samples of acute leukaemia were referred to a central reference laboratory (see below) for molecular genetic analysis, it was also decided to use whatever material remained after the tests were completed to establish a sample bank. This is designated for use in add-on studies related to our major hypothesis (e.g. screening for viruses or for gene polymorphisms by polymerase chain reaction (PCR) methods).

Morphology
Individual haematologists in participating centres diagnosed ALL and AML based on morphological, standard staining (PAS ± Sudan Black ± non-specific esterase ± acid phosphatase stains) and immunophenotyping. Central review was performed by a panel of three haematologists as part of the MRC therapy trials protocol. This panel ascribed individual patients to an appropriate French-American-British morphological classification type; for ALL into L1, L2 or L3; for AML into M0-M7 (Catovsky et al, 1991;Lilleyman et al, 1992).

Immunophenotyping
In 1989-1990, following national consensus meetings on the optimal battery of monoclonal antibodies required for appropriate immuno-phenotypic diagnosis of ALL, a series of workshops were held to produce consistent quality of performance in such testing, prior to the commencement of the MRC UKALL XI trial which ran throughout the period of the National Case Control Study. Thereafter centres admitting patients to the MRCs ALL trials were requested to perform a standard panel of immunophenotyping tests (Hann et al, 1998), which included anti-CD2, CD7, CD10, CD13, CD19, CD33, CD34 and HLA-DR as well as testing for terminal deoxynucleotidyl transferase (TdT) and for cytoplasmic immunoglobulin. The latter (cyto m) data have not been used for the classification scheme reported here.
In line with international agreement, a cut-off point for positivity was taken at ≥ 30% for CD2, CD7, CD19 and CD34. Patients were investigated by all 98 participating physicians using the same panel, but 42% of laboratories used flow cytometry analysis, 38% alternative methods in addition to flow cytometry and 20% used no flow cytometry and used only alternative immunophenotyping techniques (fluorescence microscopy, immunohistochemistry).
Cases of ALL were classified by immunophenotype into B-cell lineage or T-cell lineage subtypes. Cases that could not be unambiguously defined because of incomplete data or mixed lineage phenotypes were classified as 'Other'. Seven per cent of eligible cases of ALL samples were not available for analysis.
Clinical presentation alone rarely provides clarity between the diagnosis of ALL or AML. The same battery of morphological staining and phenotyping was therefore performed on the majority of patients with AML, with the addition of erythroid and megakaryocyte antibody markers wherever appropriate. Central morphological review was also performed by the panel for AML cases.

Cytogenetics
Conventional G-banded cytogenetic analysis was performed on diagnostic bone marrow and/or peripheral blood samples from ALL and AML patients by local cytogenetic laboratories. Cytogenetic data from patients entered to MRC treatment trials were routinely collected by two national databases: the Leukaemia Research Fund UK Cancer Cytogenetics Group ALL Karyotype Database (ALL Database) and the Kay Kendall Leukaemia Fund Childhood AML Karyotype Database (AML Database). Slides are reviewed before the karyotypes are entered to the ALL Database, and all karyotypes are written according to the International System for Human Cytogenetic Nomenclature (ISCN, 1995). Cytogenetic data were then provided directly from the ALL and AML Databases to the UKCCS.
Molecular diagnostics and cell/DNA/plasma banks Pretreatment biological samples from leukaemia and nonleukaemia cancer cases were referred to a single centre and entered into a FoxPro database. Of the 1711 leukaemia cases (ALL plus AML) entered into the study, biological samples were received from 1153. For the 2102 non-leukaemic cancers registered and interviewed, a blood sample was received from 738. Leukaemic samples consisted of up to 5 ml of peripheral blood (with anticoagulant) and/or up to five bone marrow smears. Blood only was received from non-leukaemic cancers.
On receipt of samples, slides were individually wrapped in foil and stored at -70°C. Plasma was separated from blood by centrifugation, aliquoted and stored at -2°C or -70°C. Mononuclear cells were separated over a density gradient and counted. In cases with a low yield of cells from blood (< 10 6 ) the cells were either pellet frozen in liquid nitrogen or stored in guanidinium isothiocyanate (GIT) solution for subsequent RNA preparation. For higher yield cases (> 10 6 ), some cells were control frozen in liquid nitrogen as viable cells (and as a source of DNA) and an aliquot stored in GIT.

Detection of TEL(ETV6)-AML1 (CBFA2)
RNA was reverse transcribed to generate cDNA, the integrity of which was assessed by reverse transcriptase PCR (RT-PCR) amplification of a housekeeping gene (c-ABL). Screening for (TEL-AML1) by RT-PCR was with a single pair of primers as described previously (Romana et al, 1995). Fifty cases that had been investigated for evidence of the TEL-AML1 fusion by RT-PCR (see below), were also independently tested by interphase fluorescence in situ hybridization (FISH) using dual colour probes (Vysis, UK) in the cytogenetics reference laboratory (C Harrison, Royal Free Hospital, London). FISH revealed six TEL-AML1-positive cases, which had not been detected by RT-PCR (and one case was negative by FISH, which had been found to be positive by RT-PCR).
RT-PCR was also used to screen for the t(1;19) (E2A-PBX), t(9;22) (BCR-ABL) translocations and for translocations of MLL with AF4, AF6, AF9 and ENL (Repp et al, 1995). These tests used nested primers with high sensitivity (10 -4 /10 -5 ). These latter screens produced an unacceptable level of false-positives (and some false-negatives) and therefore karyotype alone was used for the final molecular classification based upon these three types of translocations. Deletion of the TAL gene in cases of T-ALL was detected by direct PCR using a single set of primers as previously described (Janssen et al, 1993). A subset of ALL cases with high white cell counts were also assayed for deletion and methylation of the cell cycle inhibitor genes CDKN2/p16 INK4A and CDKN1/p15 INK4B . These data were not, however, incorporated into the final molecular diagnostic scheme and are reported in detail elsewhere (Iravani et al, 1997).

FISH detection of ploidy changes
Cases that were registered as karyotypically normal or failed in the LRF/UK Cancer Cytogenetics Study Group Database (see above) were screened (by the molecular genetics reference laboratory) by interphase FISH for extra copies of chromosomes X and 21 using chromosome specific probes. The presence of extra copies of both X and 21 was required to define hyperdiploidy. These two chromosomes were selected as our pilot study and published reports (Moorman et al, 1996;Raimondi et al, 1996) indicated that > 90% of cases of ALL that are high hyperdiploid include additional copies of both 21 and X.
Sample banking Material remaining on some patients after molecular screening included bone marrow smears, plasma and small quantities of cells, RNA, cDNA and DNA suitable for PCRbased assays. These have been stored for future studies.

Results and discussion
A haematological and immunophenotypic subdivision of all patients eligible for the study and interviewed is given in Table 19. Non-Hodgkin's lymphoma data are given separately in Table 18 but we note here there is a biological and clinical case for T-NHL and T-ALL being subtle manifestations of essentially the same thymic or T-cell precursor malignancy. They may therefore be pooled together in later analyses. Similarly, rare cases of FABL3 Burkitt-like ALL (mature B immunophenotype) are more rationally grouped along with B-NHL. Figures 7A, 7B and 8 show the breakdown of cases according to the major cytogenetic and molecular genetic abnormalities. Molecular genetic diversity in acute leukaemia is extensive; indeed in complete detail it is likely that every patient has a distinct molecular pattern of clonal diversity. A substantial number of patients had either infrequently occurring chromosomal abnormalities or highly complex karyotypes in which none of the common chromosomal abnormalities were identified; these were grouped together under 'Others'. Some chromosomal deletions appear to occur relatively commonly in different immunological subtypes and in conjunction with other molecular changes, for example deletion of the long arm of chromosome 6 [del(6q -)] or the short arms of chromosomes 9 and 12 [del(9p), del(12p)]. It is suspected, though unproven, that these may be secondary changes associated with disease progression and although we have these data available for subsequent analysis, we have not incorporated them here for classification purposes. The numbers in some molecularly defined subgroups in this study are small which will restrain the statistical strength of any subsequent analyses; however, some highly selective associations may emerge. Two subgroups are of appreciable size -the c-ALL with either hyperdiploidy (423) or TEL-AML1 fusion (139). These data should enable us to assess whether any epidemiological associations found for ALL are more pronounced within these molecularly defined subgroups.

Age and gender
As age and gender have been shown to be of some prognostic importance in leukaemia clinical trials (Kersey, 1997) and some biologically defined subgroups are already known to have a marked age and/or gender bias (Greaves et al, 1985;Greaves, 1999), we also summarize here, in Figures 9-13, the age distributions within the UKCCS data set of the major biological subsets and in Tables 19 and 20 provide a breakdown according to gender and broad age groupings.

MOLECULAR ANALYSIS OF GENETIC SUSCEPTIBILITY
There is now increasing recognition that genetic susceptibility can influence the risk of childhood cancer (Perera, 1997). Identification of genes that increase susceptibility might therefore help to pinpoint the role of specific carcinogens (Suk and Collman, 1998). To facilitate studies of genetic susceptibility, the UKCCS has involved the systematic collection of post-treatment blood samples from children with cancer, and, as healthy controls, from their parents, sibs and unrelated children. In the first instance the molecular analysis has involved comparisons of HLA class II (DPB1, DQA1, DQB1) allele frequencies in cases and controls for evidence of allele associations that could, in conjunction with epidemiological data, provide support for the role of infection in the aetiology of childhood leukaemia and lymphoma. Since no previous study of childhood cancer in the UK has involved the collection of post-treatment blood samples from children with cancer on a national scale, this paper presents details of the methods used, reviews sample collection, and provides brief details of the HLA typing methodology.   to retrieve viable lymphoid cells. Samples (~10 ml) from the parents and siblings of children with leukaemia or lymphoma were collected in the same way. Samples from the parents and siblings of children with solid tumours were collected in EDTA only. Blood sample collection commenced in 1992, and continued until the end of 1997. The aim was to obtain as complete a set of samples from eligible index case children, and their parents and siblings as circumstances would allow. Genomic DNA was extracted from all blood samples since it provides the most robust and versatile source of biological material for the direct molecular analysis of genetic susceptibility. The PCR was used in the molecular analysis because it conserves DNA whilst allowing genetic screening to be carried out at high resolution. Since the amount of blood sample obtained was limited, viable lymphoid cells from children with leukaemia and lymphoma were frozen with the objective of preparing lymphoid cell-lines as an additional source of material.
Sample processing and storage Each child with cancer from whom a blood sample was received was assigned a unique laboratory code, and together with the child's name, diagnosis and other identifiers, this was entered into a Microsoft Access database. Information about relatives (parents, siblings) of the child with cancer from whom a sample was obtained was also recorded in the database. Genomic DNA was extracted from all blood samples using a resin-based commercial kit (BACC2, Nucleon), the DNA diluted to 50 ng µl -1 , and stored at -80°C. Lymphoid cells were isolated from the blood of children with leukaemia and lymphoma and from their parents by centrifugation over Lymphoprep (Nycomed), and stored in a viable state in liquid nitrogen.
Blood samples from children with cancer Post-treatment blood samples were obtained from 1863 (48%) of the 3838 children eligible for the UKCCS. These included samples for 1183 (68%) of the 1736 children with leukaemia, 181 (51%) of the 349 children with lymphoma (Hodgkin's and non-Hodgkin's), and 499 (28%) of the 1753 children with solid tumours.

Family controls
Where possible, blood samples were collected from both biological parents of each child with cancer for the purpose of providing family-based controls. Samples were obtained from both parents of 846 (48%) of the eligible children with leukaemia, 131 (37%) of those with lymphoma and 301 (17%) of those with solid tumours. Blood samples were obtained from the healthy sibs of 719 of the children with cancer. These included 426 sibs of children with leukaemia, 91 sibs of children with lymphoma and 202 sibs of children with solid tumours.

Unrelated childhood controls
Ethical considerations precluded the collection of blood samples from UKCCS case-matched control children. Children with other types of cancer were therefore used as controls for any specified type of cancer (for example, children with solid tumours served as controls for children with leukaemia). In the absence of an adequate national sample of case-matched control children without cancer, data from 1500 full-term healthy newborn babies delivered in St Mary's Hospital, Manchester were additionally used as a provisional comparison group.

Molecular analysis
The molecular analysis ( Figure 14) set out to compare allele, phenotype, and haplotype frequencies between children with different types of cancer, and with parental, sib or unrelated controls at three HLA class II loci (DPB1, DQA1, DQB1) loci. Using genomic DNA, exon 2 of HLA-DPB1, DQA1 or DQB1 was amplified in the polymerase chain reaction using locus-specific primers (Fernandez-Vina and Bignon, 1997). The amplified PCR products were arrayed at high density by dot-blotting onto replicate nylon membranes using a Beckman Biomek 2000, and the membranes hybridized with 32 P-labelled sequence specific oligonucleotides (SSO). The membranes were scanned for hybridization with SSO probes on a Packard InstantImager, and alleles assigned to each individual based on patterns of positive and negative SSO probe reactions for each locus.
The HLA-DPB1, DQA1 and DQB1 type of each subject was stored in the Microsoft Access database, and data relevant for the analysis of HLA associations collated in Access Queries for output into Microsoft Excel files. These were used to compute allele, genotype and phenotype frequencies for each diagnostic   Figure 14 Flow diagram showing steps in the molecular screening of index cases and controls for HLA class II alleles case and control series. Associations with specific HLA alleles were expressed as odds ratios with 95% confidence intervals (CI), and the significance of differences between different groups in Fisher's two-sided exact test, with appropriate Bonferroni corrections.

Discussion
It is increasingly recognized that the aetiology of childhood cancer involves pre-or early post-natal exposure to environmental carcinogens and a host response modulated both by developmental and inter-individual genetic variations. The identification of constitutional variations in specific genes involved in the metabolism of carcinogens offers a way of predicting the aetiological role of a given environmental factor.
The UKCCS has included the systematic collection of posttreatment blood samples from children with cancer and from related and unrelated controls for molecular studies of variations in cancer susceptibility. The primary aim of these studies was to analyse the frequency of HLA class II alleles in childhood leukaemia and lymphoma. An allele association could be construed as supporting the hypothesis of an immune-mediated, possibly infectious, aetiology in childhood leukaemia/lymphoma (Greaves, 1999). Existing evidence for the role of HLA class II alleles in susceptibility to childhood leukaemia and lymphoma is limited to small patient series (Taylor et al, 1998). Although the UKCCS has involved much larger patient and control series, logistic considerations required the development of a highthroughput HLA typing method. This included the batchwise amplification with HLA locus-specific PCR primers and the analysis of SSO hybridization by automated scanning.
The molecular screening of genetic variations in cases compared with controls is becoming a more widely accepted part of epidemiological studies. Molecular methods are able to identify inter-individual genetic variations that may influence childhood cancer susceptibility with remarkable precision. The intrinsically difficult assessment of environmental exposures in relation to the risk of childhood cancer is thus complemented by data on interindividual genetic variations in carcinogen-response genes that may help to define susceptible patient groups. Careful selection of candidate genes with known functions and sound aetiological hypotheses reduces the empirical nature of this analysis. The development of high-throughput micro array-based assays offers an opportunity for testing the role of given genetic variations in statistically robust patient and control groups. The collection of a comprehensive series of post-treatment blood samples from children with cancer as part of the UKCCS shows that this can be achieved as part of a comprehensive epidemiological study.

UK CHILDHOOD CANCER STUDY INVESTIGATORS
1 Management Committee. 2 Co-opted member of biological or epidemiological subcommittee of Management Committee. †Deceased

ACKNOWLEDGEMENTS
The success of the UKCCS has depended on the help of many thousands of people, most notably the parents of nearly 4000 children, who have had the tragic experience of seeing their children develop cancer, and those of nearly 8000 healthy children, all of whom have given their time to answer long questionnaires; we are most grateful for their cooperation. We are grateful, too, for the assistance provided by thousands of general practitioners who have allowed us to examine their records; members of the UK Childhood Cancer Study group and other clinicians whose names have not been listed above, but who have notified us of children under their care and allowed us to interview their patients and obtain, in many cases, samples of blood and bone marrow; Mary Martineau and the members of the UK Cancer Cytogenetics Group, Marjan Iravani, Ruby Dhat and Abiromi Anonda; David Gokhali, Mark Robinson and Carolyn Watson who assisted in the study of immunogenetics; the staff of the Regional Cancer Registries who helped ensure case ascertainment and of the FHSAs and HBs who helped with the selection of control children; the members of the UKCCCR staff, most notably Peter Twentyman, Jean Mossman, Stephanie Ashby and Joanne Bull who have in different capacities controlled expenditure on the study and serviced the work of the Management Committee; Barbara Deverson and Cathy Harwood, who prepared manuscripts of this report, and the many clerical and secretarial staff, not listed above, who have helped in recording the study data at different times over 8 years. We acknowledge above all our debt to Professor Ged Adams, one time Chairman of the Radiation Subcommittee of the UKCCCR whose inspiring and gentle leadership brought and kept together such a diverse group of investigators, but who died shortly before any results of the study were known.