A leukemia-protective germline variant mediates chromatin module formation via transcription factor nucleation

Non-coding variants coordinate transcription factor (TF) binding and chromatin mark enrichment changes over regions spanning >100 kb. These molecularly coordinated regions are named “variable chromatin modules” (VCMs), providing a conceptual framework of how regulatory variation might shape complex traits. To better understand the molecular mechanisms underlying VCM formation, here, we mechanistically dissect a VCM-modulating noncoding variant that is associated with reduced chronic lymphocytic leukemia (CLL) predisposition and disease progression. This common, germline variant constitutes a 5-bp indel that controls the activity of an AXIN2 gene-linked VCM by creating a MEF2 binding site, which, upon binding, activates a super-enhancer-like regulatory element. This triggers a large change in TF binding activity and chromatin state at an enhancer cluster spanning >150 kb, coinciding with subtle, long-range chromatin compaction and robust AXIN2 up-regulation. Our results support a model in which the indel acts as an AXIN2 VCM-activating TF nucleation event, which modulates CLL pathology.

For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection BD FACSDiva software v8.0.2 was used for flow cytometry data acquisition. All studies must disclose on these points even when the disclosure is negative.

Sample size
Sample size calculation was not performed. At least 3 biological or technical replicates were obtained per experiment, based on standards in the field and to be able to perform statistical test (such as a t-test). Additional replicates were obtained when possible to increase the statistical power. For CRISPR, we aimed for >3 clones per genotype and we report results for all good clones obtained from two batches (ALT and REF) and one batch (ALT.PU.1 and MEF2Δ). For the in vivo experiment, we used 10 mice per batch for a total of two batches to have a value of Total N of mice -N of groups between 10 and 20 (Charan and Kantharia, 2013, PMID: 24250214).
Data exclusions No data exclusions were performed during the course of this study.

Replication
In vitro DNA pulldown was performed in three technical replicates. ORCA was performed in two independent batches. NGS Capture-C and ATAC-seq were performed in three biological replicates. ChIP-qPCR was performed in six biological replicates (qPCR with three technical replicates). Luciferase assays were performed in four biological replicates. RT-qPCR for assessing AXIN2 expression in cell lines was performed in three technical replicates and three biological replicates. RT-qPCR for assessing AXIN2 expression in CRISPR clones was performed in three technical replicates for each clone (each clone is considered an independent biological replicate and >3 clones were aimed per genotype). RNA-seq was performed in three biological replicates. In vitro proliferation was performed in three biological replicates and five technical replicates. MEC1 in vivo competition experiment was performed in two batches of 10 mice each. All replicated experiments showed concordance (successful replication), except for some CRISPR clones, which showed some degree of heterogeneity.
Randomization Randomization was performed during the mice experiment when selecting mice to be injected with the MEC1-AXIN2-mCherry/MEC1-ctr-GFP or the MEC1-AXIN2-GFP/MEC1-ctr-mCherry mixture. For the other experiments, randomization is generally not applied in this field (most of the experiments start from a single cell line or defined samples).

Blinding
The investigators were not blinded during the experiments and sample processing. The results of our experiments are obtained by objective quantitative methods.

Wild animals
The study did not involve wild animals.
Field-collected samples The study did not involve field-collected samples.

Ethics oversight
Mice were bred and maintained at the EPFL animal facility. All animal work was carried out in accordance with Swiss national guidelines. This study was reviewed and approved by the cantonal veterinary service, Vaud.
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Human research participants
Policy information about studies involving human research participants

Population characteristics
The study is observational, retrospective and non interventional. The main clinical characteristic (i.e. age, gender, date of diagnosis and of date) of patients were collected anonymously.

Recruitment
The recruitment is: consecutive CLL patients referring to our institution.

Ethics oversight
The study was approved by our local Ethics Committee: Comitato Etico Interaziendale di Novara, Italy. Study number CE 8/11 and CE 120/19.
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Flow Cytometry
Plots Confirm that: The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots. male mice per experiment. Prior to the injection we analyzed the percentage of the respective cell populations in the input by flow cytometry using the LSR Fortessa (BD Biosciences). After 26 days, the mice were sacrificed and bone marrow immune cells from both legs (femur and tibia) and hip bones were extracted. Single cell suspensions were prepared as previously described (Wilson et al., 2001, PMID: 11581321). Cells were stained with human CD20 PE/Cy7 (302312, BioLegend), mouse CD45 APC (17-0451-83, eBioscience) and DAPI, and analyzed by flow cytometry.

Instrument
Flow cytometry data from the MEC1 cells input was acquired on an LSR Fortessa (BD) analyzer equipped with 5-lasers and 18detectors. The configurations are especially adapted to the use of new generation fluorochromes such as Brilliant Violet and Brilliant UltraViolet dyes. Xenograft sorts were performed using an FACSAria III (BD) FACS sorter or assessed with a LSR Fortessa (BD) analyzer Software BD FACSDiva software v8.0.2 was used for flow cytometry data acquisition on the LSR Fortessa and the FACSAria III. For analysis of fcs files FlowJo v10.7.1 from TreeStar was used (FlowJo -https://www.flowjo.com/solutions/flowjo; RRID:SCR_008520).

Cell population abundance
The sorted and analyzed cell mCD45-CD20+ population subdivided into GFP+ vs mCherry+ populations ranging from anywhere between 5-35% related to the total cell population and depending in the progression of CLL. Analyses were performed until the least common population (GFP+ or mCherry+ cells) reach 20,000 events.

Gating strategy
For the sorts of each sample the gating strategy was as follows: room temperature -100μm Nozzle -sorting mask set to purity -sorting purity greater than 98%: FSC_SSC Single Cells DAPI or Zombie NIR negative Exclude murine CD45 positive cells (anti-mouse CD45-APC) Gate on human CD20-PE-Cy7 positive cells (this can be anywhere from 5% to 35% -depending on how the disease is progressing). Then the CD20+ cells should be subdivided into mCherry+ (Sorting gate 1) and GFP+ (Sorting gate 2).
Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information.