Introduction

Multiple sclerosis (MS) is a complex genetic disorder, assumed to have multiple genes involved, each conferring a small increase in risk. Evidence for the genetic contribution to MS comes from studies of familial risks, and from association in the HLA region. However, no non-HLA genes for MS have been identified through linkage or association studies. Many research groups have performed linkage studies with MS families, but significant and replicated evidence for linkage has been difficult to achieve with inconsistent findings across studies. Collaboration between research groups in MS has been strong, with several analyses pooling genotypes across studies1, 2 and a recent study regenotyping families on a SNP chip to increase the informativeness of families.3 These studies have highlighted chromosomal regions that may contain susceptibility genes but, as in previous analyses, the evidence is not compelling.

Several reasons may be postulated for the disappointing results in MS linkage studies. First, the genes may confer such a low increase in risk that many thousands of families would be necessary to attain enough power to identify linkage. Second, heterogeneity may exist between families in the same study, or (more particularly) across studies performed in different populations. Third, the power of linkage studies can be substantially reduced by incomplete genetic information from genotyping errors, missing genotypes, or sparse marker maps. The familial data for a genetic risk in MS are quite compelling, and the upcoming genome-wide association studies may identify genes that were not previously considered as candidates, or genes with effects that would be difficult to detect in linkage studies. Meanwhile, much effort has been expended in family collections and genotyping of linkage studies, and it is necessary to extract all possible information from such studies.

One statistical tool which has not been applied to recent MS linkage studies is the Genome Search Meta-Analysis (GSMA) method. Previously, the GSMA was applied to four MS studies4 but many additional and extended studies are now available. The GSMA performs meta-analysis pooling results (eg LOD scores) across the genome, but does not require individual-level genotype data. It has become a widely used meta-analysis method for genome-wide linkage studies, and has previously been applied to diseases such as inflammatory bowel disease, psoriasis, schizophrenia and autism.5, 6, 7, 8 Our goals were to strengthen evidence of linkage in candidate regions of MS, to identify novel regions not detected by eye-balling results in individual studies, and to explore how the GSMA results can be affected by analysing different bin widths and by moving bin boundaries.

Materials and methods

Study inclusion

We identified all published genome-wide linkage studies performed in MS using Medline and PubMed searches, and examining reference lists of papers in the genetics of MS. Genome-wide association studies were excluded, as were candidate region linkage studies which considered only small regions of the genome. Linkage studies which overlapped in patient samples or were extended versions of previous publications were identified, and only independent studies included. Where studies had performed a two-stage analysis, genotyping more markers in targeted regions in stage 2, only the stage 1 results were used, as the GSMA requires a uniform distribution of markers and families across the genome. In total, 10 independent studies were included,9, 10, 11, 12, 13, 14, 15, 16, 17, 18 and study characteristics are described in Table 1. Most studies consisted of populations of European/Caucasian ethnicity and ascertained affected sibling pairs, with some additional affected individuals sampled where available in the families. Most linkage analyses were non-parametric, using a range of programs (Mapmaker/Sibs, ASPEX, Genehunter, Genehunter-plus, Allegro) reflecting the family structures, and the date the analysis was performed. The French and US studies performed single-point parametric linkage analysis including heterogeneity (using the programme HOMOG). The Modin study18 used a single consanguineous family, which may have different genetic contributions from families in the other studies. This family contributed fully to the unweighted analysis, but contributes little information to the weighted analysis, owing to its small size. Kenealy et al16 analysed both a US and French cohort; these were included separately in the meta-analysis, to give 11 studies in total. All linkage studies used microsatellites markers distributed on the autosomes and on the X chromosome.

Table 1 Summary of genome-wide linkage studies included (ordered by first author)

Nine studies performed multipoint non-parametric linkage analysis and presented the results as chromosomal graphs. Linkage scores across the genome were extracted from graphs using the digitising programme Engauge Digitiser (v.2.14,© Mark Mitchell, 2002, http://digitizer.sourceforge.net/), which will convert curves into (x,y) coordinates. Bins were defined by dividing each chromosome length from each study into the required number of equal-width segments. The maximum linkage statistic per bin was then identified. Full details of this procedure for defining bins, and its accuracy, are given in Forabosco et al (in press).19

The French and US studies presented results as single-point parametric HLOD scores, which were downloaded from the Vanderbilt Center for Human Genetics Research website (http://phg.mc.vanderbilt.edu/content/publication_data).

We also included a large collaborative study which regenotyped 4500 SNPs on a Illumina BeadArray linkage mapping panel in an extended sample of 730 families from the UK, US, Australian and Nordic studies.3 Families overlap substantially (70%) with those studied previously,9, 10, 15, 16 and 216 new families were included. This IMSGC study was analysed with the remaining seven non-IMSGC studies, to give eight studies in total.

Statistical analysis

The GSMA method tests for evidence for linkage within a series of bins of traditionally 30 cM. Chromosomes are divided into approximately equal length bins, with chromosome 1 having 10 bins, and chromosomes 21 and 22 having two bins each, giving a total of 118 bins on the autosomes. The X chromosome comprises six bins, as appropriate for the female chromosome length (184 cM on the Marshfield map). The notation c_n is used to refer to the nth bin on chromosome c. For each study, the chromosome length was divided into the required number of bins, and the linkage test statistics within each bin were extracted. The maximum evidence for linkage achieved within a bin was noted (eg maximum LOD or NPL). Bins were then ranked, with the bin containing the highest linkage score given rank 124, the next highest rank given 123, etc. Ranks were summed across studies, and the summed rank (SR) forms a test statistic for assessing linkage within the bin. Bins with high SR may show significant evidence for linkage.

The significance of the SR in each bin was assessed using Monte Carlo simulation methods, permuting the bin location of ranks within each study, to obtain a P-value (PSR) for linkage. Ten thousand simulations were run to obtain GSMA statistics, using the GSMA software20 available from the GSMA website at http://www.kcl.ac.uk/depsta/memoge/gsma/. Multiple testing is a problem: each of 124 bins produces a P-value, so under the null hypothesis of no linkage, 6.2 bins would be expected to attain P-values of PSR<0.05. As a control for this, GSMA uses a Bonferroni correction for the bins, a P-value of 0.05/124=0.0004 is necessary for genome-wide evidence of linkage, or a P-value of 1/124=0.0081 for suggestive evidence of linkage.21 Simulation studies have shown that these thresholds are appropriate for the GSMA,22 with a P-value exceeding the genome-wide significant threshold arising just once in every 20 GSMA studies, and a P-value exceeding the suggestive significance threshold arising on average once in each GSMA study.

We repeated the analyses using two additional bin widths: 20 cM (giving a total of 182 bins) and 40 cM (giving a total of 92 bins) bin widths. Corresponding genome wide and suggestive significance values were calculated using a Bonferroni correction for the total number of bins in each analysis. We also assessed the effects of bin placement by offsetting the 30 cM bins by 15 cM, so that the bin boundaries of ‘shifted 30 cM bins' were the mid-points of the original 30 cM bins, the first and last 15 cM of the chromosome being merged into a single bin to keep the number of bins consistent. A schematic illustration of the bin definitions used in these additional analyses is given in Figure 1.

Figure 1
figure 1

Graphical representation of the overlap between the different bin widths (20, 30 and 40 cM bin lengths) and using a shifted 30 cM bin width for a chromosome of 120 cM.

Meta-analysis of MS was performed both unweighted (assuming equal contribution from each study) and weighted by study size (using as weighting factor the square root of the number of affecteds in a study). These weights were scaled to a mean of 1, and range from 1.70 for the UK study15 to 0.18 for the study of one consanguineous family.18 When the IMSGC study was analysed with the seven remaining, independent studies, it had a weight of 2.87, and contributed 36% of the SR. Both weighted and unweighted results for the 11 studies and the eight studies including IMSGC are shown for the traditional 30 cM bin width analysis, whereas only weighted results are presented for the additional analyses (unweighted results being similar). We have also repeated the analyses removing the study of Modin et al18 performed on a single large consanguineous family, to evaluate the impact of this family on unweighted results.

Results

The meta-analysis of 11 MS genome-wide linkage studies used 982 families with 2121 affected individuals, and the analysis of eight studies including IMSGC used 1145 families comprising 2490 affecteds (Table 1). Results from the weighted and unweighted analyses for the original 30 cM bin width are shown in Table 2 (and Figure 2), which lists all nominally significant bins (PSR<0.05), highlighting those with suggestive and genome-wide evidence for linkage (PSR<0.0081 and PSR<0.0004, respectively).

Table 2 Summary of nominally significant results (PSR<0.05), for all studies and IMSGC, using 30 cM bins
Figure 2
figure 2

Weighted summed ranks from the analysis of (a) all studies, and (b) including the IMSGC study. Thresholds for suggestive significance are shown.

In the unweighted meta-analysis of 11 studies, suggestive evidence for linkage occurred on chromosome 6p (bin 6_2, PSR=0.0006) and chromosome 18p (bin 18_1, PSR=0.0054). Bins flanking bin 6_2 showed nominal evidence for linkage. In addition, nominal evidence for linkage was found on chromosome 6q (bin 6_4, PSR=0.0300), chromosome 10q (bin 10_3, PSR=0.0163) and chromosome 20p (bin 20_1, PSR=0.0263). The weighted analysis detected the same bins, with genome-wide evidence for linkage in bin 6_2 of PSR=0.00004 and almost suggestive significance in bin 18_1 (PSR=0.0084). Bin 6_4 was no longer significant. Additional nominal evidence for linkage to chromosome 11p (bin 11_1, PSR=0.0321) was also observed. When analyses were performed using different bin widths and the shifted 30 cM bin width, genome-wide evidence for linkage (after correction for the number of bins defined in each analysis) was consistently observed on chromosome 6p in all analyses (PSR varying from 0.00003 to 0.00009 in the weighted analyses). Chromosome 6q yielded suggestive evidence for linkage in the 20 cM bin and the shifted analyses (PSR=0.0024 and 0.0056, respectively), whereas nominal evidence for linkage was obtained with the 40 cM bin analysis (PSR=0.0214). The results for chromosome 6 GSMA analyses using different bin definitions are shown in Figure 3. The region on chromosome 18p was also detected with suggestive evidence in the 40 cM bin analysis (PSR=0.0070).

Figure 3
figure 3

Graphical representation of the GSMA results for chromosome 6 in (a) all studies, and (b) including the IMSGC study, using different bin widths. The y-axis is the log base 10 of the PSR-values, multiplied by the total number of bins, to give a consistent definition for genome-wide and suggestive significance for all bin widths.

When the IMSGC study was used with the original 30 cM bin width (Table 2), genome-wide significant evidence for linkage was seen in two flanking bins on chromosomes 6p, bin 6_2 (PSR=0.0003) and bin 6_3 (PSR=0.0001) in the weighted analysis, whereas unweighted analysis provided suggestive evidence (PSR=0.0043 and PSR=0.0016, respectively). Suggestive evidence was also obtained on 10q (PSR=0.0077, in the unweighted analysis), and on 20p (PSR=0.0079, in the weighted analysis). Additional regions with nominal significance in the weighted and unweighted analyses were 6q (PSR=0.0103) and 18p (PSR=0.0221). When we analysed the data using different bin widths and using the shifted 30 cM bins, we observed consistency across the different analyses yielding genome-wide evidence in the 6p region (Figure 3) and suggestive evidence on chromosome 10q for the 20 cM and the shifted 30 cM bin width analyses (PSR=0.0013 and PSR=0.0016, respectively).

Removing the study of Modin et al18 did not affect, as expected, the weighted GSMA analysis. Unweighted analysis identified the same regions as obtained previously in the full analysis and when the IMSGC study was included. Generally higher significance was achieved, for example, bin 18_1 reached suggestive significance (PSR=0.0061) in the IMSGC analysis.

Discussion

The GSMA is heavily dependent on the definition used for bins, which form the chromosomal region in which linkage can be detected. Our original description of the GSMA listed bins of 30 cM width, defined by specific boundary markers (see GSMA website for full information). The original 30 cM bin width is the largest that allows having at least two bins of equal width on the smallest chromosomes. Smaller bin widths may provide finer localisation, given that the linkage signal is not too broad. However, peak linkage scores in individual studies may map to adjacent bins, and dilute the evidence for linkage in the GSMA, reducing both power and precision.21 This is also a problem where genes are located close to bin boundaries, and shifting bins to span the region from the mid-points of the original bins should increase the evidence for linkage.

In this study, we performed GSMA on all available genome-wide searches for MS using the traditional 30 cM bin width definition (dividing the genome into 124 bins). To assess the effects of bin widths on the identified regions and to evaluate the consistency of our results, we repeated the analyses using two additional bin widths of 20 cM (giving 182 bins) and 40 cM (giving 92 bins). Similarly, we assessed the effects on GSMA results of bin placement by shifting bins by approximately 15 cM in the original 30 cM bin width analysis. Previous GSMA studies have proposed other methods for reducing the dependency of the GSMA on bins of fixed width and location: Babron et al23 used a novel analysis method incorporating weighted SRs from adjacent bins; Copper et al24 suggested repeating GSMA analysis by offsetting the start-site of bins by 7 and 13 cM in an attempt to refine the significant region for fine mapping.

The results from these GSMA analyses suggest that chromosomes 6p, 6q, 10q and 18p may harbour genes for MS. Each region shows suggestive evidence for linkage in at least one analysis, with consistency across the different analyses.

The most significant results were obtained in the HLA region and flanking bins, with genome-wide significance for linkage in bin 6_2 (31.0–63.3 cM in the Marshfield map). Significant or suggestive linkage at HLA was detected in the US, UK, Canadian and Finnish, but not in other studies (Figure E1, see online supplement), confirming the ability of the GSMA to detect linkage in the presence of genetic heterogeneity. Nominal linkage to region 6q15–q23.2 (bin 6_4, 96.1–126.5 cM in the Marshfield map) was found in the full GSMA (unweighted) and when the IMSGC study is included (weighted, unweighted), together with suggestive evidence in two additional analyses (namely, the 20 cM width and the shifted 30 cM bin), potentially represents an additional MS locus on chromosome 6. No individual study showed strong evidence for linkage in this region, although four had nominally significant LOD scores (LOD>0.7). In nine of the linkage studies, the meta-analysis is based on results from multipoint linkage analysis. Multipoint LOD or NPL scores can be elevated in a region of 30–50 cM flanking a maximum value, which leads to correlated SRs in neighbouring bins, several of which may show significant evidence for linkage. Linkage to HLA (bin 6_2) leads to increased SRs in adjacent bins 6_1 and 6_3. However, the significant results for bin 6_4 are unlikely to be due to a carry-over effect from the HLA locus. In other GSMA studies of autoimmune diseases, linkage to HLA affected only bins 6_1 to 6_3, and bin 6_4 was not significant.5, 6, 23 This result may therefore indicate a novel susceptibility locus for MS on chromosome 6q.

Nominal evidence for linkage on chromosome 10 is seen in the full meta-analysis (weighted and unweighted) and when the IMSGC study is included (suggestive linkage), with the most significant results obtained in bins 10_3 and 10_4 (approximate cytogenetic bands 10p12.1–q23.33). Suggestive evidence for linkage is also observed in the 20 cM bin width and in the shifted 30 cM analyses. This region showed nominal evidence for linkage in individual studies (Italy and Nordic) and in the pooled genotype study.3

Suggestive linkage on chromosome 18p (bin 18_1) is observed in the full analysis (unweighted), and nominal evidence for linkage in the weighted GSMA and including the IMSGC study, although that study shows LOD scores of almost zero in this region. Suggestive linkage to this region was also observed when removing the Modin et al18 study (both in the weighted and unweighted analyses). Suggestive evidence of linkage to 18p was found in the 40 cM bin width analysis including all studies.

Nominal evidence for linkage in the MS GSMA is also seen on chromosome 20 (bin 20_1) in all analyses, and this region showed suggestive evidence of linkage including the IMSGC study. None of the additional analyses confirmed linkage to this region with suggestive significance.

The regions described above have been selected based on the strength of significant results (suggestive evidence for linkage is only expected to be achieved once in a GSMA study), and on consistency of results across the different analyses, which implies that linkage evidence was robust for the choice of bin definitions. In interpreting the results, it should be noted that the power to detect linkage reduces with a decrease in the number of studies analysed (assuming an equal effect size in all studies). This may lead to less significant P-values in the analysis with the IMSGC study, if the decrease in numbers of studies is not compensated for by substantially increased significance in the IMSGC study.

Genetic research groups in MS have collaborated closely, in an attempt to improve the power of linkage studies. Notably, a pooled genotype analysis of 719 families showed suggestive linkage on 17q21 and 22q13, in addition to the MHC region.2 Our analysis includes all the studies in GAMES, but the Canadian and US cohorts of families are much extended (Canada: increased from 61 to 172 families; US: from 52 to 151 families). Additionally, we use French and Middle-Eastern studies, giving a total of 982 families. The IMSGC study regenotyped individuals in four studies from US, UK, Australia and Canada, using a dense SNP marker map. They found suggestive linkage on chromosomes 6p21, 17q23 and 5q33, with weaker evidence for linkage on 20p12, which overlaps with the region found in our study.

The regions of suggestive linkage from the pooled genotype and the regenotyping studies are distinct from the regions found in our GSMA study. The discrepancies arise primarily from the different family sets included. Our previous MS GSMA study4 used only four studies and found linkage to MS in the MHC region of chromosome 6, and on chromosome 19, which is not found in the current study.

The power to detect a gene will depend on its genetic model and the specific analysis method used; given the difficulties in identifying linkage for MS, many complementary approaches should be used to analyse the data, with different methods having the potential to detect differently acting genes. The GSMA method has several strengths that may allow it to detect linkage where other methods have failed. The GSMA retains the design and chosen analysis method of each individual study, without the compromises that may be required for a unified analysis of pooled genotypes. It uses bins to assess linkage, and therefore may pool evidence for linkage that maximises at different locations in individual studies, which is a common feature of linkage studies in complex disorders.25 The method also copes well with heterogeneity, retaining good power to detect linkage when some studies are unlinked.26 One further advantage of the GSMA is that the correction for multiple testing of 124 bins is less stringent than that required in linkage analysis,27 so the power to detect linkage in the GSMA compares well with individual study genotyping.21 The relative power to detect linkage in this meta-analysis and in pooled genotype studies will depend on the precise genetic model for a gene, its effect size, and its role in the families/studies included. The lack of evidence for linkage a region should not therefore be interpreted as evidence against linkage – the GSMA will only have power to identify linkage in specific scenarios where linkage is present (albeit weakly) in a substantial proportion of the studies.

In summary, this study presents a meta-analysis of genome-wide linkage studies in MS, using the GSMA method. The strongest evidence for linkage in this study occurred on chromosome 6p (HLA region), 6q, 10q and 18p. Aside from 6p, these regions did not show strong evidence for linkage in individual studies or in previous pooled analyses of the MS studies, and provide novel candidate regions. Together with findings from previous studies, these regions may be used to prioritise results from genome-wide association studies.28 We have also extended the GSMA methodology to include analysis of 20, 30 and 40 cM bin widths, and to allow for different bin starting points.