Conjunctival fibrosis and the innate barriers to Chlamydia trachomatis intracellular infection: a genome wide association study

Chlamydia trachomatis causes both trachoma and sexually transmitted infections. These diseases have similar pathology and potentially similar genetic predisposing factors. We aimed to identify polymorphisms and pathways associated with pathological sequelae of ocular Chlamydia trachomatis infections in The Gambia. We report a discovery phase genome-wide association study (GWAS) of scarring trachoma (1090 cases, 1531 controls) that identified 27 SNPs with strong, but not genome-wide significant, association with disease (5 × 10−6 > P > 5 × 10−8). The most strongly associated SNP (rs111513399, P = 5.38 × 10−7) fell within a gene (PREX2) with homology to factors known to facilitate chlamydial entry to the host cell. Pathway analysis of GWAS data was significantly enriched for mitotic cell cycle processes (P = 0.001), the immune response (P = 0.00001) and for multiple cell surface receptor signalling pathways. New analyses of published transcriptome data sets from Gambia, Tanzania and Ethiopia also revealed that the same cell cycle and immune response pathways were enriched at the transcriptional level in various disease states. Although unconfirmed, the data suggest that genetic associations with chlamydial scarring disease may be focussed on processes relating to the immune response, the host cell cycle and cell surface receptor signalling.

Clusters filled in green had at least one pathway that was significant in each of the two pathways analyses. 0.726 *Expected frequency of the effect allele **The OR indicates the estimated allele frequency odds ratio for the effect allele. Values less than one indicate that the effect allele is less common in cases than controls and vice versa.

Tests for population stratification
The directly genotyped SNP data and the 1000 genomes reference data 31 were filtered to obtain a subset of variants with MAF ≥ 0.01 and HWE p-value > 1x10 -5 in both datasets. The

Pathways analysis
ALIGATOR counts the number of genes in a pathway that contain a SNP with a P EMMAX value more extreme than a pre-specified threshold value and then determines the significance of pathways in permutation tests. The signal to noise ratio of the test can be controlled by calibration of the P threshold value that is considered nominally significant. The ALIGATOR literature recommends exploring the effects of utilising a number of different thresholds.
Threshold values that return a greater number of significant pathways than the number of pathways expected to be significant by chance alone at P < 0.01 and P < 0.05 is predictable (given 1345 pathways) as respectively 13.45 and 67.25. An optimal threshold for pathways discovery might be one that returns more significant pathways than would be expected by chance alone. In this study, the effects of using threshold values of 0.01, 0.001 and 0.0001 were explored. Using a threshold value of 0.01 led to fewer significant pathways than would be expected by chance (Supplementary table 1), suggesting that this cut-off was too relaxed.
Threshold values of 0.001 and 0.0001 both obtained more significant pathways than expected, with a peak at 0.001 (Supplementary table 1), which was then used in the main analysis.
The high number of SNPs involved, combined with a high number of permutations, makes this process computationally intensive and a relatively modest number (n = 100) of re-samplings were initially used to pre-screen the pathways. Candidate pathways for fine ALIGATOR analysis were those with p < 0.05 in the initial screening. Candidate pathways were then tested again by ALIGATOR using 100,000 permutations of the phenotypes.
For each pathway in PODA analysis, a score "S" is determined for each individual's sample.
This score describes the genetic distance of the current sample from other cases relative to its distance from controls. The distributions of these scores in cases and controls are then compared to obtain the pathway Distinction Score 'DS'. DS is normalised by resampling with randomisation of the phenotypes and is tested for significance by an implementation of the permutation test where a set of arbitrary pathways of equal length to the pathway of interest are tested for association with the phenotype with resampling of phenotypes. The DSp is a confidence measure that is analogous to a standard p-value and that describes the proportion of permutations in which the DS value was larger in the simulated pathway than in the true pathway. The reported Odds Ratio (OR) for the pathway describes the increase in relative odds of disease given each unit increase in S. For each pathway we implemented 100 re-samplings and 1000 permutations of the test.