Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis

Genome-wide association studies have identified genetic variation contributing to complex disease risk. However, assigning causal genes and mechanisms has been more challenging because disease-associated variants are often found in distal regulatory regions with cell-type specific behaviours. Here, we collect ATAC-seq, Hi-C, Capture Hi-C and nuclear RNA-seq data in stimulated CD4+ T cells over 24 h, to identify functional enhancers regulating gene expression. We characterise changes in DNA interaction and activity dynamics that correlate with changes in gene expression, and find that the strongest correlations are observed within 200 kb of promoters. Using rheumatoid arthritis as an example of T cell mediated disease, we demonstrate interactions of expression quantitative trait loci with target genes, and confirm assigned genes or show complex interactions for 20% of disease associated loci, including FOXO1, which we confirm using CRISPR/Cas9.


Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Raw sequencing data and processed counts data for ATAC-seq, RNA-seq, CHi-C and Hi-C that support the findings of this study have been deposited in National Center for Biotechnology Information's Gene Expression Omnibus and are accessible through GEO Series accession number GSE138767 (https:// www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138767). Data for Fig. 2-4, 5b, 6b, 7 and Supplementary Fig. 1-8 The study was observational and not designed to test a specific hypothesis. No statistical methods were used to predetermine sample size. The final sample size was decided based on similar gene expression and chromatin QTL mapping studies performed previously.
Following common pipeline for quality check in sequencing data, ATAC-seq duplicates and reads with mapping quality lower than 30 were filtered out. Duplicated reads and mitochondria reads were removed. For RNA-seq genes with the sum of counts data across the six time points less than 10 were removed in each replicate and only genes that show expressions in both replicates were retained. For Hi-C/Capture Hi-C the maximum and minimum di-tag lengths were set to 800 and 150, respectively, after comparing Hi-C results from different di-tag lengths. Interactions with at least one time point having CHiCAGO score over 5 were kept were 5 is the default recommended threshold in the CHiCAGO package. CD3/CD28 Dynabeads, SYTOX red stain and TruStain FcX were validated by the manufacturer. No further validation was carried out. Purified anti-human CD3 Antibody, anti-human CD4 antibody (Biolegend) -each lot of these antibodies are quality control tested by immunofluorescent staining with flow cytometric analysis. The OKT3 monoclonal antibody reacts with an epitope on the epsilon-subunit within the human CD3 complex. The OKT4 antibody binds to the D3 domain of CD4.