# Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls

## Abstract

Determining whether potential causal variants for related diseases are shared can identify overlapping etiologies of multifactorial disorders. Colocalization methods disentangle shared and distinct causal variants. However, existing approaches require independent data sets. Here we extend two colocalization methods to allow for the shared-control design commonly used in comparison of genome-wide association study results across diseases. Our analysis of four autoimmune diseases—type 1 diabetes (T1D), rheumatoid arthritis, celiac disease and multiple sclerosis—identified 90 regions that were associated with at least one disease, 33 (37%) of which were associated with 2 or more disorders. Nevertheless, for 14 of these 33 shared regions, there was evidence that the causal variants differed. We identified new disease associations in 11 regions previously associated with one or more of the other 3 disorders. Four of eight T1D-specific regions contained known type 2 diabetes (T2D) candidate genes (COBL, GLIS3, RNLS and BCAR1), suggesting a shared cellular etiology.

### Supplementary Figure 4 A Manhattan plot of the 19p13.2 region containing the candidate causal genes ICAM1, ICAM3 and TYK2.

The SNPs considered most likely to be causal by our analysis are highlighted. The green signal is shared by all diseases, whereas the magenta signal is unique to celiac disease (CEL).

### Supplementary Figure 5 Information from the UCSC Genome Browser for the 1q24.3 FASLG region.

This region shows association with type 1 diabetes and celiac disease. Note that there is strong evidence of regulatory activity in the region of rs78037977, suggesting that this SNP may be significant.

### Supplementary Figure 6 Signal clouds for rs78037977, a SNP within the 1q24.3 region containing candidate causal gene FASLG.

This SNP was removed from the celiac disease data in the original analysis owing to its failing a missingness check. However, the clustering shown here is of good quality, implying that the rs78037977 genotype can be considered reliable.

### Supplementary Figure 7 A Manhattan plot of the 7p12.2 region containing the candidate causal gene IKZF1.

This gene overlaps two Immunochip regions separated by a recombination hotspot, one at the 5ʹ end and one at the 3ʹ end. The 5ʹ region contains a colocalized signal for multiple sclerosis (MS) and type 1 diabetes (T1D), whereas the 3ʹ end contains only a T1D signal.

### Supplementary Figure 8 P values for type 2 diabetes at the peak SNP for all T1D-associated regions.

These regions are divided into those associated with T1D only and those associated with other autoimmune diseases. We see that those associated with no other autoimmune disease tend to have lower type 2 diabetes (T2D) P values. T2D data was taken from the stage 1 GWAS and stage 2 Metabochip study (summary statistics downloaded from http://diagram-consortium.org/).

### Supplementary Figure 9 P-value and colocalization data from the regions with newly identified associations.

The most significant SNP for the known association is found, and its P value for the newly identified association is computed. This is plotted against the posterior probability of colocalization (as computed using the Bayesian colocalization approach).

## Supplementary information

### Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 4 and 5, and Supplementary Note. (PDF 2234 kb)

### Supplementary Table 1

The regions analyzed and the number of SNPs within each region (after quality control). (CSV 6 kb)

### Supplementary Table 2

Detailed results from the two colocalization methods for each region/trait pair. (CSV 110 kb)

### Supplementary Table 3

The results from the conditional Bayesian analysis. (CSV 7 kb)

