Original Article

Genes and Immunity (2007) 8, 57–68. doi:10.1038/sj.gene.6364359; published online 7 December 2006

Genomic DNA pooling for whole-genome association scans in complex disease: empirical demonstration of efficacy in rheumatoid arthritis

S Steer1,10, V Abkevich2,10, A Gutin2, H J Cordell3, K L Gendall4, M E Merriman4, R A Rodger4, K A Rowley4, P Chapman5, P Gow6, A A Harrison7, J Highton8, P B B Jones9, J O'Donnell5, L Stamp5, L Fitzgerald2, D Iliev2, A Kouzmine2, T Tran2, M H Skolnick2, K M Timms2, J S Lanchbury2 and T R Merriman4

  1. 1Kings College London School of Medicine at Guy's, Department of Rheumatology, King's and St Thomas', London, UK
  2. 2Myriad Genetics Inc., Salt Lake City, UT, USA
  3. 3Institute of Human Genetics, University of Newcastle, Newcastle, UK
  4. 4Department of Biochemistry, University of Otago, Dunedin, New Zealand
  5. 5Department of Rheumatology, Christchurch Hospital, Christchurch, New Zealand
  6. 6Department of Rheumatology, Middlemore Hospital, Auckland, New Zealand
  7. 7Wellington School of Medicine, University of Otago, Wellington, New Zealand
  8. 8Otago School of Medicine, University of Otago, Dunedin, New Zealand
  9. 9Department of Rheumatology, QE Hospital, Rotorua, New Zealand

Correspondence: Dr TR Merriman, Biochemistry Department, 710 Cumberland Street, Dunedin 9054, New Zealand. E-mail: tony.merriman@stonebow.otago.ac.nz

10These authors contributed equally to this work.

Received 2 August 2006; Revised 25 October 2006; Accepted 25 October 2006; Published online 7 December 2006.

Top

Abstract

A pragmatic approach that balances the benefit of a whole-genome association (WGA) experiment against the cost of individual genotyping is to use pooled genomic DNA samples. We aimed to determine the feasibility of this approach in a WGA scan in rheumatoid arthritis (RA) using the validated human leucocyte antigen (HLA) and PTPN22 associations as test loci. A total of 203 269 single-nucleotide polymorphisms (SNPs) on the Affymetrix 100K GeneChip and Illumina Infinium microarrays were examined. A new approach to the estimation of allele frequencies from Affymetrix hybridization intensities was developed involving weighting for quality signals from the probe quartets. SNPs were ranked by z-scores, combined from United Kingdom and New Zealand case–control cohorts. Within a 1.7 Mb HLA region, 33 of the 257 SNPs and at PTPN22, 21 of the 45 SNPs, were ranked within the top 100 associated SNPs genome wide. Within PTPN22, individual genotyping of SNP rs1343125 within MAGI3 confirmed association and provided some evidence for association independent of the PTPN22 620W variant (P=0.03). Our results emphasize the feasibility of using genomic DNA pooling for the detection of association with complex disease susceptibility alleles. The results also underscore the importance of the HLA and PTPN22 loci in RA aetiology.

Keywords:

genome scan, association, DNA, pooling

Top

Introduction

Rheumatoid arthritis (RA) is a chronic debilitating autoimmune disease caused by inflammation of synovial tissue. Although it clearly has a genetic basis,1 the genetic causes of RA remain poorly defined. Until now, most insight into genetic aetiology has come from the study of functional candidate genes. Genetic association with alleles of the class II antigen-presenting molecule human leucocyte antigen (HLA)-DRB1 on chromosome 6p has been established for decades (odds ratio (ORs)=2.5–3.0), with the shared epitope, defined mainly by subtypes of DRB1*04 and *01, prominent in Caucasians.2 Recently, the 620W allele of the PTPN22 gene (which encodes the lymphoid tyrosine phosphatase), has been confirmed as a determinant of RA by extensive replication of association in Caucasian patient cohorts.3 Other genes are implicated in RA susceptibility, with CTLA4 and PADI4 the closest to being 'confirmed',4, 5 although their effect (OR=1.1–1.3) is less than that of PTPN22 (OR=1.5–2.0).

Microarray-based technology to enable whole-genome scanning for association (WGA) has evolved to the point where this approach to elucidating the genetic basis for common disease has become feasible.6, 7, 8 By the simultaneous genotyping of hundreds of thousands of single-nucleotide polymorphisms (SNPs) using the widely available Affymetrix and Illumina technologies, WGA scanning offers the promise of disease gene discovery through linkage disequilibrium (LD) to causal DNA changes. Although the optimal study design for a WGA experiment is a matter for debate, identification and validation of the genes encoding complement factor H, insulin-induced gene 2 and interferon-induced helicase as determinants of age-related macular degeneration, body mass index and type 1 diabetes, respectively,9, 10, 11 do provide confidence that this approach can be widely applied to complex disease.

Whole-genome association studies can be very expensive if case–control or family-based cohorts of a 1000 or more subjects are individually genotyped. This is likely to limit the number of primary discovery experiments that can be conducted. A pragmatic approach that balances the benefit of a WGA experiment against the cost of individual genotyping is to use pooled genomic DNA samples, followed by individual genotyping for validation in an expanded or independent sample.12, 13 Pooled DNA samples have been analysed for several generations of DNA-based genetic markers such as microsatellite and candidate SNPs using several technologies, including using the Affymetrix GeneChip Mapping 100K Array.14, 15, 16, 17, 18, 19, 20 To date, however, the empirical validity of DNA pooling and genotyping using array technology has not been sufficiently demonstrated to enable researchers to apply the method with confidence in WGA experiments.

Here our aim was to investigate whether DNA pooling was an effective approach for a WGA study in an empirical setting with the Affymetrix GeneChip Mapping 100K and Illumina Infinium microarray platforms with specific testing for association of the established HLA and PTPN22 loci with RA. To maximize accuracy of pooled allele frequency estimates (and hence power of WGA scanning) using the Affymetrix GeneChip Mapping 100K Array, a novel algorithm to account for the quality of individual probe quartets was developed. We also improved our WGA scan by overlapping analysis of pools of case–control cohorts from two racially and clinically similar populations (Caucasians from New Zealand (NZ) and the United Kingdom (UK)).

Top

Results

Forty different sets of pooled samples were run on Affymetrix 100K GeneChip microarrays (data not shown). Data from replicate microarrays were compared and the median standard deviation (s.d.) in allele frequency was 2.9%, in comparison to 7.4% using the Affymetrix algorithm (based on averaging relative allele signal (RAS)1 (sense probes) and RAS2 (antisense probes) values). The median s.d. in allele frequency obtained from comparison of data obtained from running 14 different sets of pooled samples on replicate Illumina Infinium microarrays was similar at 2.7% (data not shown).

In order to evaluate how well allele frequency could be estimated from a pool, 39 individually typed Centre d'Etude du Polymorphisme Humain (CEPH) samples were pooled and genotyped across an Affymetrix 100K GeneChip microarray. Figure 1 shows the relationship between minor allele frequencies calculated from the pool and actual frequencies derived from the individually typed samples for 200 randomly selected SNPs. The median absolute difference between the corresponding frequencies was 3.1% and mean dispersion 3.4%. The median s.d. of predicted allele frequency for the same CEPH pool run on four different chips was 3.2%.

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Comparison between allele frequencies estimated from the pooling experiment and allele frequencies based on individual genotypes of 94 Caucasian CEPH individuals. Each point represents one of 200 randomly selected SNPs from the Affymetrix 100K GeneChip microarray. The diagonal line represents equal allele frequencies.

Full figure and legend (69K)

In order to compare the ability to detect association using our DNA pooling approach with the ability to detect association based on individual genotyping, a data set of 250 cases and 250 matched controls was considered. Assuming that two pools, one for cases and one for controls, were run on four replicate microarrays each, we were able to estimate s.d. for the typical SNP (see Patients and methods, Detection of association). We defined the minimal detectable difference in allele frequency (MDDAF) as an expected (in infinitely large data sets) allele frequency difference between cases and controls at which power to detect association is 80% as Deltaf=z*sigma, where z* is a threshold value for a z-score to detect association. Assuming that an association is detected whenever a P-value is below P=4 times 10-5 (z*approx4.13) and sigma0=0.028, we calculated MDDAF as a function of minor allele frequency. Figure 2 shows the loss in ability to detect allele frequency differences using DNA pooling compared to individual genotyping. This loss appears to be acceptable.

Figure 2.
Figure 2 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Ability to detect association using the DNA pooling approach. The MDDAF is plotted as a function of minor allele frequency at a significance level P=0.001.

Full figure and legend (19K)

We then performed a WGA scan in RA and specifically examined association at two loci previously unambiguously associated with RA. If association at HLA and PTPN22 could be detected using DNA pooling, this would be empirical demonstration of efficacy of this technique in WGA analysis in RA genetics and would warrant later examination of the entire genome for novel disease-associated loci. NZ and UK case and control pools were hybridized, in quadruplicate, to Affymetrix 100K GeneChip and Illumina Infinium microarrays and a combined z-score determined for all SNPs. The estimated allele frequencies, z-scores and genome-wide ranks of SNPs within the class II/III HLA and PTPN22 windows are presented in Tables 1 and 2, respectively. The maximal z-score at HLA was 5.554 (P=3 times 10-8) and at PTPN22 was 5.511 (P=4 times 10-8). Both of these remain significant after correcting for the total number of SNPs analysed (Pc<0.01).



At HLA, a total of 33 SNPs (representing 18 'unique hits' after exclusion of SNPs exhibiting complete intermarker LD with at least one other SNP) were ranked within the top 100 associated SNPs genome wide from both the Affymetrix and Illumina microarrays (Table 1). HapMap CEPH CEPH Utah (CEU) genotyping data were available on 181 of the 257 HLA SNPs on the microarrays and the LD relationships between these are shown in Figure 3. Twenty-six of the disease-associated SNPs (for 15 of which Hapmap data were available) were clustered within four closely related LD blocks encompassing the predicted gene C6orf10, BTNL2 and HLA-DRA. These blocks are defined by HapMap markers rs9296015–rs3129941 (blocks 9–10, Figure 3), and rs2076530–rs1041885 (blocks 11–12, Figure 3). Outside the C6orf10 and HLA-DR regions, seven other SNPs (for which Hapmap data were available on six) within the major histocompatibility complex class III region provided evidence for association. Six of these were clustered around the complement component C2 (rs3020664, rs1042663 and rs541862), complement factor B, RD RNA binding protein (rs760070) and superkiller viralicidic activity 2-like homologue (rs438999) loci, of which four exhibited strong intermarker LD (r2>0.75; LD block 4 in Figure 3). The seventh, rs9296009, is intergenic and lies approximately 2.5 kb p-telomeric of proline-rich transmembrane protein 1.

Figure 3.
Figure 3 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Intermarker LD between SNPs within the class II and class III HLA window. Only those SNPs contained on the Affymetrix 100K GeneChip and Illumina Infinium microarrays and for which CEPH CEU genotype data were available are shown. SNPs ranked in the top 100 are arrowed. Haplotype blocks (n=22) generated in Haploview (www.broad.mit.edu/mpg/haploview) are outlined by a solid black line. Block numbers referred to in the text are from left to right. Horizontal lines indicate genes; A=C2/CFB/RDBP/SKIV2L, B=PPRT1, C=C6orf10, D=BTLN2, E=DRA, F=HLA-DRB1.

Full figure and legend (99K)

Inspection of LD in the CEPH CEU families did not identify any SNP markers that were in strong LD with the known associated HLA-DRB1*0401 allele. The top-ranked Affymetrix 100K GeneChip SNP was one of the markers closest to HLA-DRA (rs9268614), 155 kb from HLA-DRB1, that exhibited some LD with HLA-DRB1*0401 in the CEPH CEU individuals (r2=0.26). The extended NZ cohort was individually genotyped for rs9268614 and we tested for the possibility of an effect independent of HLA-DRB1*04. Association was confirmed, with allele G occurring at a significantly higher frequency in cases than controls (Table 3a; P=5.4 times 10-15). This SNP was in LD with DRB1*04 alleles (r2=0.49 in NZ controls and 0.66 in the NZ extended case cohort). Conditional analysis of rs9268614 on the presence of DRB1*04 alleles showed weak evidence for independent association (P=0.02). Analysis of haplotypes between rs9268614 and the DRB1*04 allele revealed global association (P=4.4 times 10-30) and confirmed the major effect of the presence or absence of the DRB1*04 allele.


At PTPN22, 21 SNPs (representing seven 'unique hits' after exclusion of SNPs exhibiting complete intermarker LD with at least one other SNP) were ranked within the top 100 associated SNPs genome wide (Table 2). LD relationships between the associated SNPs for which HapMap CEPH CEU genotyping data were available are shown in Figure 4. These data suggest that the PTPN22 620W variant is the major disease-causing allele in the extended haplotype block; there was correlation between amount of LD with the R620W variant (rs2476601) and z-score (correlation coefficient=0.64). Four of the disease-associated SNPs (rs1343128, rs1418958, rs1343125 and rs1217201) were clustered at the telomeric end of the region within the membrane-associated guanylate kinase-related 3 (MAGI3) gene. Although it is possible that the association at the MAGI3 SNPs is due to LD with the PTPN22 620W variant (Figure 4), given the presence of a disease-associated haplotype in US Caucasians that is independent of PTPN22 620W21 we hypothesized that the MAGI3 SNPs themselves, or variants in LD, defined a disease-association distinct to the 620W association. The MAGI3 SNP rs1343125 was genotyped across the extended NZ RA case–control cohort. This confirmed association of the C allele with disease (Table 3a; P=1.8 times 10-4). Disease association of rs1343125-R620W haplotypes was then analysed (Table 3b). The T allele at rs1343125 was present on a protective haplotype with the 620R allele (OR=0.68; P=1.3 times 10-5) whereas there was weak evidence for over-representation of the C-620R haplotype in cases compared to controls (P=0.05). Association analysis at rs1343125 conditional on PTPN22 R620W also provided evidence for an effect independent of PTPN22 R620W (P=0.03).

Figure 4.
Figure 4 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Intermarker Intermarker LD between SNPs within the PTPN22 haplotype block. Only those SNPs contained on the Affymetrix 100K GeneChip and Illumina Infinium microarrays and for which CEPH CEU genotype data were available are shown. Those with zgreater than or equal to4.0 are red, 4.0>zgreater than or equal to3.0 are blue, 3.0>zgreater than or equal to2.0 are green and z<2.0 are black. The genes are not marked to scale. *Other relevant SNPs shown are PTPN22 R620W (rs2476601) and rs12760457 (defines protective 'haplotype 5'21) – neither was included in the WGA scan.

Full figure and legend (279K)


Top

Discussion

This paper reports methodology for WGA scanning in complex disease using pooled genomic DNA samples and the Affymetrix 100K GeneChip and Illumina Infinium microarray platforms. The empirical efficacy of the method for detecting loci of moderate to strong effect in complex disease was demonstrated by detection of the HLA and PTPN22 loci in RA (OR>1.5). Our data emphasize the importance of the HLA and PTPN22 loci (relative to the rest of the genome) in the aetiology of RA. Using control population allele frequencies of 0.19 for HLA-DRB1*04 and 0.099 for PTPN22 R620W, and genotype relative risk estimates from the extended NZ RA cohort, the estimated population attributable risk22 for each locus was 41.7 and 11.4%, respectively. Considering that the environment also contributes to the aetiology of RA it is unlikely that more than several other genetic variants of effect greater than PTPN22 R620W remain to be discovered. Of course, it is most important to acknowledge that the genotyping platforms used here tag, at r2greater than or equal to0.8, only approximately 50% of common variation in the human genome in Caucasians.23 However, an association analysis of >6500 nonsynonymous SNPs (nsSNPs) has also emphasized the importance of PTPN22 R620W in autoimmunity;11 this nsSNP was the most associated with type I diabetes outside HLA.

The use of DNA pooling has potential as an extremely cost-effective method to identify a reduced set of potentially disease-associated SNPs suitable for follow-up in the second phase of a WGA experiment. The key to using DNA pooling in WGA scanning is reducing variability in estimation of allele frequency in genomic DNA pools. Given the current wider use and longer availability of the Affymetrix GeneChip Mapping 100K Array, we focused on developing an improved algorithm able to minimize variation in estimation of allele frequencies in DNA pools. Previous methods have improved the Affymetrix algorithm by averaging the RAS scores corresponding to each of the sense and antisense probe sets, applying the k correction factor (often used in pooling experiments to correct for unequal efficiencies in measuring allele signals24) and repeated measurement of DNA pools.19, 25, 26 The fundamental difference between our and previous algorithms for estimating allele frequencies in DNA pools using the Affymetrix GeneChip Mapping 100K Array19, 26 is in calculation of RAS scores for each SNP. The Affymetrix algorithm obtains RAS as a median value from five quartet sense (RAS1) and five quartet antisense (RAS2) sequences (containing match and mismatch SNP probes). For most SNPs, this is sufficient to distinguish between homozygous and heterozygous genotypes in analysis of individual samples.6 However, the s.d. of the estimated allele frequency in pooled samples using the Affymetrix algorithm (7.4%) is simply too great to apply to WGA scanning. To address this, we calculated all 10 RAS scores (corresponding to the five sense and five antisense probe quartets) and summed these scores weighted with coefficients that were inversely proportional to the square of their variability. The s.d. of the difference in allele frequency between pools of cases and controls consists of two components, the first coming from the limited size of the pools and the second from imprecise measurement of the allele frequency in the pool (see above). Although the s.d. will always decrease with increasing number of pooled samples it cannot become lower than its second component. Thus the power of the study will quickly plateau after the size of the pools becomes sufficiently large for the second component of the s.d. to become larger than the first. Such conditions will be obtained for pools with approx400 samples for Affymetrix 100K chips and with approx550 samples for Illumina 100K chips. In our study, the NZ case pool approaches the optimal sample size for the Affymetrix chip, but other sample sizes would need to be increased to improve power to detect loci with statistically smaller effects than PTPN22. The approach of combining data from the NZ and UK cohorts also reduced the noise in the WGA scan data observed when the cohorts were analysed separately (Figure 5); the HLA and PTPN22 associations are considerably more obvious when the cohorts are analysed together (Figure 5b and d) than separately (Figure 5a and c).

Figure 5.
Figure 5 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Z-score plots of SNPs for the separate (a) and combined (b) analyses of the NZ (red dots) and UK (blue dots) WGA Affymetrix data and for the separate (c) and combined analyses of the Illumina data. The scale is in 100 kbpU, beginning at Chr 1 and finishing at Chr X. PTPN22 is at 113 Mb, HLA at 3900 Mb.

Full figure and legend (286K)

For several reasons, if economic considerations are not central to the design of a WGA study, individual genotyping is preferable to estimation of allele frequencies by DNA pooling. This is because the use of DNA pooling does not enable the detection and controlling of population stratification, and there is the loss of ability to study haplotypes and to undertake gene–gene interaction studies. However, if cost is an impediment then this DNA pooling methodology will enable WGA scanning. We have demonstrated the efficacy of DNA pooling on both the Affymetrix 100K GeneChip 100K and Illumina Infinium platforms and have demonstrated its utility on the Illumina HapMap300 chip (data not shown). In general, cost considerations still limit current WGA scans using individual genotyping to only one platform.

Carlton et al.21 identified one disease-susceptibility PTPN22 haplotype (haplotype '2', uniquely defined by the 620W allele) and one disease-protective haplotype in US Caucasians (haplotype '5'). Our data confirm the existence of a disease-protective haplotype in NZ Caucasians – the T allele of the MAGI3 SNP rs1343235 in combination with PTPN22 620R defined a protective haplotype (Table 3b; P=1.3 times 10-5). Our data also provide evidence that there is an additional risk haplotype, carrying the rs1343125 risk C allele with the protective PTPN22 620R allele. Combined with the analysis of association of rs1343125 conditional on genotype at R620W (P=0.03), these findings are evidence of a second RA locus in the extended PTPN22 haplotype block (Table 3b). MAGI3 can be considered a candidate RA susceptibility gene; MAGI3 associates with the Notch-activating Delta proteins27 and NOTCH signalling has been implicated in the pathophysiology of RA.28 The possibility of a RA susceptibility determinant elsewhere in the PTPN22 haplotype block is further supported by the observation that the haplotype '5'-defining PTPN22 SNP rs1276045721 is in complete LD with four MAGI3 SNPs (Figure 4; rs10489936, rs3761931, rs3747998 and rs1080307). None of these SNPs were associated with RA in our study (Table 2; z=0.84 for the SNPs on the Affymetrix 100K GeneChip chip and z=0.69 for the SNPs on the Illumina Infinium chip). There is, however, a group of SNPs within PHTF1 (putative homeodomain transcription factor 1) and the 3' end of PTPN22 (rs2273758, rs3789598, rs3789600 and rs2476600) that are in moderate LD (0.3<r2<0.4) with the four MAGI3 SNPs and rs12760457, and for which there is stronger evidence for association with RA (3.1<z<4.2). These PHTF1/PTPN22 SNPs exhibit low LD with PTPN22 R620W (rs2477601) (r2less than or equal to0.2). These WGA data do indicate that further analysis of the PTPN22 region is warranted; understanding the relationship of these SNPs with the protective haplotype '5' previously identified21 should be most informative in identifying a possible second RA susceptibility determinant in the PTPN22 region.

The HLA region, and to a lesser extent the PTPN22 locus, dominated the top-ranked SNPs (Tables 1 and 2). This is atypical in the context of other WGA scans.9, 10, 29 PTPN22 maps within an unusually large block of extended LD (>300 kb) that contains several other genes. This large haplotype block, combined with the selection of gene-centric SNPs on the Illumina Infinium microarray7 goes some way to explaining the number of top-ranked PTPN22 SNPs in our WGA data. Dominance of SNPs in the HLA region is less surprising, given previous evidence for the existence of multiple RA susceptibility loci in this region. The HLA region was one of the top seven regions detected as associated with RA in Japanese as a result of the WGA genotyping of microsatellites over pooled genomic DNA samples.30 The strongest associated HLA SNP was the 41st ranked Affymetrix SNP in our genome-wide scan (rs2227139) and maps within the same HLA-DRA-containing block of LD (32489917–32521295 Mb) as the highest ranked Affymetrix SNP (rs9268614) in our WGA scan. This block is flanked by the BTNL2 and HLA-DRB3 loci, as well as C6orf10 (for which no transcripts have been identified to date). Several previous studies have reported associations in the telomeric class III region, independent of HLA-DRB1, particularly centred on the lymphotoxin alpha, and tumour necrosis factor-alpha loci.31, 32, 33, 34, 35 Our WGA scan data did not provide evidence for association with RA of SNPs in this region (all z<0.8), but did demonstrate association with SNPs in and around loci encoding components of the complement pathway, complement component C2 and complement factor B. Previous studies that have demonstrated association in and around these loci have not been able to consistently show that association is independent of the HLA-DRB1*04 haplotype.36, 37 However, taken together, data generated by us and others30, 31, 32, 33, 34, 35, 38, 39 strongly suggest multiple RA susceptibility loci within the HLA region, with the HLA class II association the strongest. A comprehensive HLA SNP genotyping experiment is warranted in RA, using sufficiently large cohorts to enable detection of effects independent of HLA-DRB1.

We have empirically demonstrated that a WGA scan using DNA pooling and combination of data from independent cohorts is an effective method for detecting association at a genome-wide level of significance to complex disease loci of relatively large effect. However, additional strategies will be needed to enable detection of genuine association to loci situated in regions of lower LD than PTPN22 (which would have lower numbers of associated SNPs) and to loci of weaker effect (OR 1.2–1.5. e.g. CTLA4, PADI4). Lowering the threshold of significance for selection of SNPs for follow-up analysis may be necessary for SNPs within functionally relevant genes and genes mapping within areas of linkage to disease. This can be achieved using the false-discovery rate (FDR) principle, which maximizes power by controlling the fraction of false rejections rather than the type I error rate. An FDR approach for weighting WGA scan P-values on the basis of previous linkage data has been proposed.40 Finally, replication of putative associations to loci of weak effect will be vital in additional cohorts of large size. Guidelines for conducting WGA scans have been published.41 However it is likely that optimal strategies for novel disease gene discovery using WGA scanning will be refined in an empirical process, of which this study and others9, 20, 29 represent the first steps. The use of DNA pooling should facilitate development of these WGA scanning strategies.

Top

Patients and methods

Clinical samples

The NZ cohort consisted of 384 (268 females and 116 males) patients and 296 healthy controls (148 females and 148 males) and the UK cohort of 241 (184 females and 57 males) patients and 262 healthy controls (116 females and 146 males). All patients satisfied the American College of Rheumatology criteria for the classification of RA.42 Clinical characteristics for the cohorts are (NZ/UK, plusminuss.d.); 69.4/76.3% females, 42.7plusminus14.5/48.3plusminus13.5 years of age at onset, 17.6plusminus9.0/13.5plusminus10.8 years disease duration, 81.7/64.7% rheumatoid factor positive and 82.1/77.2% positive for the shared epitope.2 Ethical approval for the study in NZ was given by the Otago ethics committee (as lead committee), and in the UK by the Lewisham Hospital and Guy's and St Thomas' Hospitals local research ethics committees. All subjects were white Caucasian and gave written informed consent.

Genomic DNA preparation and pooling

The NZ genomic DNA samples were all prepared from white blood cells pelleted from whole blood using a standard Gu-HCl-based white blood cell lysis and chloroform extraction protocol; the UK case genomic DNA samples were all extracted using a standard Tris-HCl-based white blood cell lysis and phenol/chloroform extraction protocol, and the UK control genomic DNA samples were extracted from immortalized cell lines using Qiagen spin column separation technology. DNA samples were electrophoresed on agarose gels and samples with intact genomic DNA showing no smearing on agarose gel electrophoresis were selected for pooling. Intact genomic DNA was diluted to 50 ng/mul concentration based on Quant-iT Picogreen (Invitrogen, Eugene, Oregen) quantitation and then concentration confirmed by repeating the picogreen analysis. Concentrations were adjusted based on these results and then picogreen analysis was repeated. This process was repeated until all samples consistently measured 50 ng/mul. Pools were constructed by combining equal volumes of each DNA. All pipetting steps were of volumes greater than 2 mul to minimize pipetting error. Four replicates from each pool were prepared and hybridized to Affymetrix GeneChip Mapping 100K Array and Illumina Infinium microarrays according to the manufacturers protocols.

Estimation of pooled allele frequency using Affymetrix GeneChip mapping 100K array

A new approach was developed to estimate allele frequency by combining hybridization intensities for each SNP from the 10 different oligonucleotide probe quartets consisting of perfect-match (PM) and mismatch (MM) pairs for both alleles (five in the sense direction and five in the antisense direction). In the current version of the Affymetrix algorithm, signals from perfectly matched probes are adjusted on the signal from mismatched probes:43

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

PMA is the signal from a probe perfectly matched to allele A, PMB is the signal from a probe perfectly matched to allele B, and MM=(MMA+MMB)/2 is the averaged signal from probes mismatched both to alleles A and B. If either PM'A or PM'B is negative its value is made equal to zero. For each of the quartets a RAS can be calculated:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

These RAS values allow clear distinction between homozygous and heterozygous genotypes for the majority of SNPs. However there are three reasons (A, B and C) why they do not provide a sufficiently accurate estimate of average allele frequency among pooled samples. A: Not all RAS values for the same SNP are equally good for determination of allele frequency (e.g., some of them might have higher noise level than others). B: RAS values are always between 0 and 1, which makes distribution of RAS values different from normal for homozygous genotypes in individual samples and for SNPs with low allele frequency in pools. In particular, variation of RAS values for homozygous calls is artificially reduced. This creates bias towards higher heterozygosity (average RAS values even for homozygous calls appear to be higher than 0 and lower than 1) and makes any further statistical analysis difficult. C: The intensity of fluorescence of different alleles is not necessarily equal, creating bias in overestimation of the frequency of the allele with higher expression. In our analysis we addressed these problems in order to decrease the s.d. First of all, we slightly changed the definition of RAS values:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

This change allows RAS to be <0 and >1 thus eliminating bias towards higher heterozygosity and resolving problem B (we tried to exclude averaged mismatch signal MM from the above equation but we found that precision in estimation of allele frequency decreased for the majority of SNPs (data not shown)). We started by genotyping 94 individual CEPH samples. Then for each quartet we calculated the median RAS values for genotypes AA, AB and BB (RASAA, RASAB and RASBB). We used median values rather than mean to decrease effect of outliers. For simplicity we assume in our further explanations that RASAAgreater than or equal toRASABgreater than or equal toRASBB. To estimate the average allele frequency among pooled samples linear interpolation was first used to calculate the frequency of allele A based on the data from each quartet. If in the pool RASgreater than or equal toRASAB the following equation was used to estimate allele frequency:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

If RASABgreater than or equal toRAS the following equation was used instead:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

This approach allowed minimization of the bias from problems B and C. After that we combined these frequencies with weights reflecting the quality of the corresponding RAS values, which allowed resolution of problem A:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where the sum is over q=1 to 10 and wq is a weight for quartet q. To determine these parameters we used genotyping data of the individually typed CEPH samples (with the 0.25 quality parameter cutoff the average call rate for the Affymetrix Xba chip was 97.8 and 96.8% for the Hind chip). Each individually genotyped sample s can be considered as a pool consisting of a single sample. The allele frequency fs in this pool is fully determined by the individual's genotype: fs=1 for the AA genotype, fs=0.5 for the AB genotype and fs=0 for the BB genotype. In order to determine optimal weights wq for quartets, we minimized the differences between predicted allele frequencies f=sumqwqfq and known frequencies fs. More precisely, the following expression over wq was minimized:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where s denotes individually genotyped samples (s=1, ..., n), and fs is 0, 0.5 or 1.0 according to the genotype of the individual.

Estimation of pooled allele frequency using the Illumina Infinium array

A similar approach was used to estimate average allele frequency among pooled samples using the Illumina Infinium array. For each SNP there are two probes in this array corresponding to each allele. We started with analysis of genotyping data for 120 individual DNA samples provided to us by Illumina (the average call rate for the Infinium microarray was 99.96%). Median signal was determined for both alleles for genotypes AA (AAA and BAA), AB (AAB and BAB) and BB (ABB and BBB). Three RASs for different genotypes was estimated using the following formulae:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Parameter k compensates for over-expression of one of the alleles. It was estimated from the formula RASAB=(RASAA+RASBB)/2

If in the pool RASgreater than or equal toRASAB the following equation was used to estimate allele frequency:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

If RASABgreater than or equal toRAS the following equation was used instead:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

SNPs in LD

Estimated allele frequencies for SNPs in total LD were averaged with weights based on their performance:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where sigmai2 is the SNP-specific s.d. of the allele frequency estimation, fi is the estimate of allele frequency of one of the group of SNPs in LD, and f0 is the averaged estimate of allele frequency for this group of SNPs.

Exclusion of SNPs from analysis

Of the 116 204 Affymetrix GeneChip Mapping 100K Array SNPs, 15.35% were excluded from analysis on the basis of not being informative in the CEPH samples (12.69%), location in duplicated regions (0.27%; whereby there was a 100% match of the 50 surrounding bases to two different positions within the human genome), deviation from Hardy–Weinberg equilibrium (HWE) (1.19%; P<1 times 10-5), and poor performance in estimating pooled allele frequency in CEPH individuals (1.20%; difference >12%). In the case of the Illumina Infinium array, 4.08% of the 109 365 SNPs were excluded from analysis on the basis of not being informative (4.07%) or presence in duplicated regions (0.01%) (insufficient Caucasian CEPH samples were genotyped to obtain data on SNPs deviating from HWE or performing poorly in allele frequency estimates from genomic DNA pools).

Differential bias in genotype scoring between case and control cohorts can arise when DNA samples originate from different laboratories; this was observed using the MegAllele (ParAllele BioScience) technology, with the authors having no reason to believe that the differential bias phenomenon is specific to this genotyping platform.44 The possibility of this occurring is unlikely with our method – all SNPs with a high 'half-call' rate that contribute to this phenomenon44 would be excluded from further analysis owing to high s.d.

Detection of association

The difference in allele frequency between pools was expressed as a z-score:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where fcases is the frequency of a SNP allele in cases, fcontrols is the same in controls and sigma is the s.d. which consists of two parts:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The first part of the total s.d., sigmas2, represents a sampling error owing to the limited number of cases and controls, and is the only error in the situation where cases and controls are genotyped individually and can be expressed as:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where Ncases and Ncontrols are the number of chromosomes in cases and controls, respectively, and f is the population allele frequency. The second part of the total s.d., sigmam2, is the error owing to imprecise measurement of the allele frequency in a pool:

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where ncases and ncontrols are the numbers of microarray replicates for pools of cases and controls (for these studies ncases=4 and ncontrols=4), respectively, and sigma02 is the SNP specific s.d. of the allele frequency estimation. Under the null hypothesis of no association, the z-score is normally distributed with mean=0 and variance=1. Therefore a z-score can be translated into a significance level expressed as a P-value.

Composite z-scores were calculated in four steps (A–D). Step A. For each study SNPs were ordered according to their z-score (z-score1UK, z-score1NZ) and corresponding rank numbers assigned to each SNP. Step B. P-values were then calculated for each SNP to attain the given rank in each study: Pi=Ri/N where Ri is rank of SNP i and N is the total number of SNPs. After that we converted these P-values back to z-scores (z-score2UK, z-score2NZ) assuming a normal distribution. Step C. For each SNP with the allele frequency in cases lower than in controls, z-score3=-z-score2. Otherwise, z-score3=z-score2. Step D. The composite z-score was calculated as |z-score3UK+z-score3NZ|/radic2. Composite z-scores were obtained from Illumina Infinium-derived allele frequency estimates by applying Steps C and D only, whereas Steps A–D were applied to Affymetrix 100K GeneChip-derived allele frequency estimates because strong fluctuations in signal intensity in some SNPs led to deviation from the normal distribution of original z-scores. Therefore Steps A and B were added to enforce a Gaussian (normal) distribution of z-scores. Unfortunately, this also decreased the power of our method, weakening the effect from genuinely disease-associated SNPs whose z-scores do not fall in a Gaussian distribution.

Target regions examined

For PTPN22, analysis was limited to SNPs within a large conserved haplotype block of approx365 kb (113.800–114.165 Mb), which contained all of PTPN22 (114.004–114.127 Mb). In the case of the HLA region, analysis extended to the most centromeric class II locus (DPA3). The HLA class II and III physical window was 31.500–33.207 Mb. Between the Affymetrix GeneChip Mapping 100K and Illumina Infinium arrays there were 257 SNPs included in the analysis within the HLA window (60 SNPs on the Affymetrix GeneChip Mapping 100K array and 202 SNPs on the Illumina Infinium array, five of these SNPs appeared on both arrays) and 45 within the PTPN22 window (11 SNPs on the Affymetrix GeneChip Mapping 100K array and 35 SNPs on the Illumina Infinium array, one of these SNPs appeared on both arrays). Haplotype block structure and intermarker LD relationships were inferred from CEU Caucasian data available from HapMap (www.hapmap.org).

Individual genotyping

Individual genotyping was carried out for the HLA rs9268614 and MAGI3 rs1343125 SNPs over 869 NZ cases and 563 controls (the same cohort genotyped with rs2476601 (PTPN22 R620W45) using PCR-RFLP. For rs9268614 PCR with primers ACACGGGCCATGAAGGAATCTGAA and GTTGAAGGCAGGAATGAGTGTGGT created a 229 bp product that was cleaved by HhaI into 150 and 79 bp products in the presence of the G allele. For rs1343125 PCR with primers CTATCTACTGACCATTCTGGTATC and TCCTCTATAGTGTGAAATTGAGGG created a 305 bp product that was cleaved by MspI into fragments of 207 and 98 bp. 11.3% of the samples were also genotyped for rs1343125 using SNPlex (Applied Biosystems, Foster City, CA, USA) – there was 100% concordance in genotypes between the two methodologies. This SNP was not in HWE in the NZ controls (Table 3a; P=0.002); a possible reason for this is the possibility of a 63.1 kb deletion within MAGI3 encompassing rs1343125.46 Allele frequencies at individual SNPs were compared between cases and controls using standard chi2 statistics, with Fisher's P-values reported. Alleles were tested for deviation from HWE using the SHEsis software platform.47

COCAPHASE from UNPHASED48 was used to implement the expectation-maximization (EM) algorithm for haplotype estimation, to perform haplotype association testing (unconditional logistic regression based on the maximum-likelihood frequency estimates from the EM algorithm) between cases and controls. Individual haplotypes were tested for association by grouping all others together, and haplotype-specific ORs generated in the same way. Ninety-five percent confidence intervals (CIs) were calculated from estimated haplotype 'counts' using Woolf's method; this is likely to underestimate the 95% CI owing to uncertain haplotype phase. Conditional analysis of one locus on the allele at a second locus, in LD with the first locus and known to be associated, was also performed in COCAPHASE; this implements the test of equality of ORs for haplotypes identical at conditioning loci.

Top

References

  1. MacGregor AJ, Snieder H, Rigby AS, Koskenvuo M, Kaprio J, Aho K, Silman AJ. Characterising the quantitative genetic contribution to rheumatoid arthritis using data from twins. Arthritis Rheum 2000; 43: 30–37. | Article | PubMed | ISI | ChemPort |
  2. Gregersen PK, Silver J, Winchester RJ. The shared epitope hypothesis. An approach to understanding the molecular genetics of susceptibility to rheumatoid arthritis. Arthritis Rheum 1987; 30: 1205–1213. | PubMed | ISI | ChemPort |
  3. Gregersen PK, Lee HS, Batliwalla F, Begovich AB. PTPN22: setting thresholds for autoimmunity. Semin Immunol 2006; 18: 214–223. | Article | PubMed | ChemPort |
  4. Iwamoto T, Ikari K, Nakamura T, Kuwahara M, Toyama Y, Tomatsu T et al. Association between PADI4 and rheumatoid arthritis: a meta-analysis. Rheumatology 2006; 45: 804–807. | Article | PubMed | ChemPort |
  5. Plenge RM, Padyukov L, Remmers EF, Purcell S, Lee AT, Karlson EW et al. Replication of putative candidate-gene associations with rheumatoid arthritis in >4000 samples from North America and Sweden: association of susceptibility with PTPN22, CTLA4, and PADI4. Am J Hum Genet 2005; 77: 1044–1060. | Article | PubMed | ISI | ChemPort |
  6. Matsuzaki H, Dong S, Loi H, Di X, Liu G, Hubbell E et al. Genotyping over 100 000 SNPs on a pair of oligonucleotide arrays. Nat Methods 2004; 1: 109–111. | Article | PubMed | ISI | ChemPort |
  7. Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS. A genome-wide scalable SNP genotyping assay using microarray technology. Nat Genet 2005; 37: 549–554. | Article | PubMed | ISI | ChemPort |
  8. Gunderson KL, Kuhn KM, Steemers FJ, Ng P, Murray SS, Shen R. Whole-genome genotyping of haplotype tag single nucleotide polymorphisms. Pharmacogenomics 2006; 7: 641–648. | Article | PubMed | ChemPort |
  9. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C et al. Complement factor H polymorphism in age-related macular degeneration. Science 2005; 308: 385–389. | Article | PubMed | ISI | ChemPort |
  10. Herbert A, Gerry NP, McQueen MB, Heid IM, Pfeufer A, Illig T et al. A common genetic variant is associated with adult and childhood obesity. Science 2006; 312: 279–283. | Article | PubMed | ChemPort |
  11. Smyth DJ, Cooper JD, Bailey R, Field S, Burren O, Smink LJ et al. A genome-wide association study of nonsynonymous SNPs identifies a type 1 diabetes locus in the interferon-induced helicase (IFIH1) region. Nat Genet 2006; 38: 617–619. | Article | PubMed | ChemPort |
  12. Zuo Y, Zuo G, Zhao H. Two-stage designs in case–control association analysis. Genetics 2006; 173: 1747–1760. | Article | PubMed | ChemPort |
  13. Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 2006; 38: 209–213. | Article | PubMed | ISI | ChemPort |
  14. Fisher PJ, Turic D, Williams NM, McGuffin P, Asherson P, Ball D et al. DNA pooling identifies QTLs on chromosome 4 for general cognitive ability in children. Hum Mol Genet 1999; 8: 915–922. | Article | PubMed | ISI | ChemPort |
  15. Bansal A, van den Boom D, Kammerer S, Honisch C, Adam G, Cantor CR et al. Association testing by DNA pooling: an effective initial screen. Proc Natl Acad Sci USA 2002; 99: 16871–16874. | Article | PubMed | ChemPort |
  16. Begovich AB, Carlton VE, Honigberg LA, Schrodi SJ, Chokkalingam AP, Alexander HC et al. A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis. Am J Hum Genet 2004; 75: 330–337. | Article | PubMed | ISI | ChemPort |
  17. Spector TD, Reneland RH, Mah S, Valdes AM, Hart DJ, Kammerer S et al. Association between a variation in LRCH1 and knee osteoarthritis. Arthritis Rheum 2006; 54: 524–532. | Article | PubMed | ChemPort |
  18. Spinola M, Meyer P, Kammerer S, Falvella FS, Boettger ME, Hoyal CR et al. Association of the PDCD5 locus with lung cancer risk and prognosis in smokers. J Clin Oncol 2006; 24: 1672–1678. | Article | PubMed | ChemPort |
  19. Meaburn E, Butcher LM, Schalkwyk LC, Plomin R. Genotyping pooled DNA using 100K SNP microarrays: a step towards genomewide association scans. Nucl Acids Res 2006; 34: e27. | Article | PubMed | ChemPort |
  20. Downes K, Barratt BJ, Akan P, Bumpstead SJ, Taylor SD, Clayton DG, Deloukas P. SNP allele frequency estimation in DNA pools and variance components analysis. Biotechniques 2004; 5: 840–845.
  21. Carlton VE, Hu X, Chokkalingam AP, Schrodi SJ, Brandon R, Alexander HC et al. PTPN22 genetic variation: evidence for multiple variants associated with rheumatoid arthritis. Am J Hum Genet 2005; 77: 567–581. | Article | PubMed | ISI | ChemPort |
  22. Altshuler D, Hirschhorn JN, Klannemark M, Lindgren CM, Vohl MC, Nemesh J et al. The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat Genet 2000; 26: 76–80. | Article | PubMed | ISI | ChemPort |
  23. Barrett JC, Cardon LR. Evaluating coverage of genome-wide association studies. Nat Genet 2006; 38: 659–662. | Article | PubMed | ChemPort |
  24. Moskvina V, Norton N, Williams N, Holmans P, Owen M, O'Donovan M. Streamlined analysis of pooled genotype data in SNP-based association studies. Genet Epidemiol 2005; 28: 273–282. | Article | PubMed |
  25. Meaburn E, Butcher LM, Liu L, Fernandes C, Hansen V, Al-Chalabi A et al. Genotyping DNA pools on microarrays: Tackling the QTL problem of large samples and large numbers of SNPs. BMC Genomics 2005; 6: 52–59. | Article | PubMed | ChemPort |
  26. Kirov G, Nikolov I, Georgieva L, Moskvina V, Owen MJ, O'Donovan MC. Pooled DNA genotyping on Affymetrix SNP genotyping arrays. BMC Genomics 2006; 7: 27. | Article | PubMed | ChemPort |
  27. Wright GJ, Leslie JD, Ariza-McNaughton L, Lewis J. Delta proteins and MAGI proteins: an interaction of Notch ligands with intracellular scaffolding molecules and its significance for zebrafish development. Development 2004; 131: 5659–5669. | Article | PubMed | ChemPort |
  28. Ando K, Kanazawa S, Tetsuka T, Ohta S, Jiang X, Tada T et al. Induction of Notch signaling by tumor necrosis factor in rheumatoid synovial fibroblasts. Oncogene 2003; 22: 7796–7803. | Article | PubMed | ChemPort |
  29. Arking DE, Pfeufer A, Post W, Kao WHL, Newton-Cheh C, Ikeda M et al. A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarization. Nat Genet 2006; 644: 644–651. | Article | ChemPort |
  30. Tamiya G, Shinya M, Imanishi T, Ikuta T, Makino S, Okamoto K et al. Whole genome association study of rheumatoid arthritis using 27 039 microsatellites. Hum Mol Genet 2005; 14: 2305–2321. | Article | PubMed | ChemPort |
  31. Newton J, Brown MA, Milicic A, Ackerman H, Darke C, Wilson JN et al. The effect of HLA-DR on susceptibility to rheumatoid arthritis is influenced by the associated lymphotoxin alpha-tumor necrosis factor haplotype. Arthritis Rheum 2003; 48: 90–96. | Article | PubMed | ChemPort |
  32. Newton JL, Harney SM, Timms AE, Sims AM, Rockett K, Darke C et al. Dissection of class III major histocompatibility complex haplotypes associated with rheumatoid arthritis. Arthritis Rheum 2004; 50: 2122–2129. | Article | PubMed | ISI | ChemPort |
  33. Jawaheer D, Li W, Graham RR, Chen W, Damle A, Xiao X et al. Dissecting the genetic complexity of the association between human leukocyte antigens and rheumatoid arthritis. Am J Hum Genet 2002; 71: 585–594. | Article | PubMed | ISI | ChemPort |
  34. Singal DP, Li J, Lei K. Genetics of rheumatoid arthritis (RA): two separate regions in the major histocompatibility complex contribute to susceptibility to RA. Immunol Lett 1999; 69: 301–306. | Article | PubMed | ChemPort |
  35. Zanelli E, Jones G, Pascual M, Eerligh P, van der Slik AR, Zwinderman AH et al. The telomeric part of the HLA region predisposes to rheumatoid arthritis independently of the class II loci. Hum Immunol 2001; 62: 75–84. | Article | PubMed | ChemPort |
  36. Dyer PA, Thomson W, Sanders PA, Grennan DM. Are major histocompatibility system class III products independent markers for susceptibility to rheumatoid arthritis? Dis Markers 1986; 4: 151–155. | PubMed | ChemPort |
  37. Fielder AH, Ollier W, Lord DK, Burley MW, Silman A, Awad J et al. HLA class III haplotypes in multicase rheumatoid arthritis families. Hum Immunol 1989; 25: 75–85. | Article | PubMed | ChemPort |
  38. Okamoto K, Makino S, Yoshikawa Y, Takaki A, Nagatsuka Y, Ota M et al. Identification of I kappa BL as the second major histocompatibility complex-linked susceptibility locus for rheumatoid arthritis. Am J Hum Genet 2003; 72: 303–312. | Article | PubMed | ISI | ChemPort |
  39. Brintnell W, Zeggini E, Barton A, Thomson W, Eyre S, Hinks A et al. Evidence for a novel rheumatoid arthritis susceptibility locus on chromosome 6p. Arthritis Rheum 2004; 50: 3823–3830. | Article | PubMed | ChemPort |
  40. Roeder K, Bacanu SA, Wasserman L, Devlin B. Using linkage genome scans to improve power of association in genome scans. Am J Hum Genet 2006; 78: 243–252. | Article | PubMed | ChemPort |
  41. Ehm MG, Nelson MR, Spurr NK. Guidelines for conducting and reporting whole genome/large-scale association studies. Hum Mol Genet 2005; 14: 2485–2488. | Article | PubMed | ChemPort |
  42. Arnett FC, Edworthy SM, Bloch DA, McShane DJ, Fries JF, Cooper NS et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988; 31: 315–324. | PubMed | ISI | ChemPort |
  43. Liu WM, Di X, Yang G, Matsuzaki H, Huang J, Mei R et al. Algorithms for large-scale genotyping microarrays. Bioinformatics 2003; 19: 2397–2403. | Article | PubMed | ISI | ChemPort |
  44. Clayton DG, Walker NM, Smyth DJ, Pask R, Cooper JD, Maier LM et al. Population structure, differential bias and genomic control in a large-scale, case–control association study. Nat Genet 2005; 37: 1243–1246. | Article | PubMed | ISI | ChemPort |
  45. Simkins HM, Merriman ME, Highton J, Chapman PT, O'Donnell JL, Jones PB et al. Association of the PTPN22 locus with rheumatoid arthritis in a New Zealand Caucasian cohort. Arthritis Rheum 2005; 52: 2222–2225. | Article | PubMed | ISI | ChemPort |
  46. Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. A high-resolution survey of deletion polymorphism in the human genome. Nat Genet 2006; 38: 75–81. | Article | PubMed | ISI | ChemPort |
  47. Shi YY, He L. SHEsis, a powerful software platform for analyses of linkage disequilibrium, haplotype construction, and genetic association at polymorphic loci. Cell Res 2005; 15: 97–98. | Article | PubMed | ISI | ChemPort |
  48. Dudbridge F. Pedigree disequilibrium tests for multilocus haplotypes. Genet Epidemiol 2003; 25: 115–121. | Article | PubMed | ISI |
Top

Acknowledgements

This work was supported by the Health Research Council of New Zealand, the Arthritis and Rheumatism Council in the United Kingdom, Myriad Genetics Inc. and NHS Research and Development funding for recruitment carried out at Guy's and St Thomas' and Lewisham hospitals. We thank NZ research nurses Gael Hewett and Sue Yeoman, UK research nurse Janet Grumley, and Bhaneeta Lad for technical assistance, and Cathryn Lewis and Sheila Fisher for statistical advice.