Genome organization and DNA accessibility control antigenic variation in trypanosomes

Many evolutionarily distant pathogenic organisms have evolved similar survival strategies to evade the immune responses of their hosts. These include antigenic variation, through which an infecting organism prevents clearance by periodically altering the identity of proteins that are visible to the immune system of the host1. Antigenic variation requires large reservoirs of immunologically diverse antigen genes, which are often generated through homologous recombination, as well as mechanisms to ensure the expression of one or very few antigens at any given time. Both homologous recombination and gene expression are affected by three-dimensional genome architecture and local DNA accessibility2,3. Factors that link three-dimensional genome architecture, local chromatin conformation and antigenic variation have, to our knowledge, not yet been identified in any organism. One of the major obstacles to studying the role of genome architecture in antigenic variation has been the highly repetitive nature and heterozygosity of antigen-gene arrays, which has precluded complete genome assembly in many pathogens. Here we report the de novo haplotype-specific assembly and scaffolding of the long antigen-gene arrays of the model protozoan parasite Trypanosoma brucei, using long-read sequencing technology and conserved features of chromosome folding4. Genome-wide chromosome conformation capture (Hi-C) reveals a distinct partitioning of the genome, with antigen-encoding subtelomeric regions that are folded into distinct, highly compact compartments. In addition, we performed a range of analyses—Hi-C, fluorescence in situ hybridization, assays for transposase-accessible chromatin using sequencing and single-cell RNA sequencing—that showed that deletion of the histone variants H3.V and H4.V increases antigen-gene clustering, DNA accessibility across sites of antigen expression and switching of the expressed antigen isoform, via homologous recombination. Our analyses identify histone variants as a molecular link between global genome architecture, local chromatin conformation and antigenic variation.

A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Clearly defined error bars
State explicitly what error bars represent (e.g. SD, SE, CI) Our web collection on statistics for biologists may be useful.

Software and code
Policy information about availability of computer code

Custom code
The described analysis workflows and required custom made Unix Shell, Python and R scripts were deposited and are accessible at Zenodo (DOI 10.5281/zenodo.823671). All data is available via NCBI GEO (accession GSM2586510) and EBI ENA (accession PRJEB18945).
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The RNA-seq, scRNA-seq, ChIP-seq, ATAC-seq and Hi-C sequencing data used in this publication have been deposited in NCBI's Gene Expression Omnibus69

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
Sample sizes were not statistically predetermined.
Data exclusions In the scRNA-seq analysis, only data from cells with more than 500 genes with more than 10 reads per gene were included. Therefore, data from 34 single cells were excluded because they were not matching the criteria.

Replication
All attempts at replication were successful. Hi-C experiments were perfomed in triplicates for each cell line. RNA-seq experiments were performed in triplicates for each isolate. MNase-ChIP-seq experiments were performed in duplicates with an input control for each experiment. Scc1-ChIP experiments were performed in triplicates with an input control for each experiment. ATAC-seq experiments were performed in duplicates with different cell numbers, respectively. Two gDNA samples were included as internal control for accessibility. Single-cell RNA-seq analysis is based on 40 and 408 cells per cell line, respectively. Flow cytometry experiments were performed in triplicates for each cell line. Quantification of telomere clusters (FISH microscopy) was performed in duplicates (total number of cells: 1128).
Randomization Not relevant for this study as allocation of samples/organisms was not needed or intended.

Blinding
Image acquisition and analysis was done in a blinded fashion. a) For IF images, pictures were taken randomly from the slide without "searching" for a suitable area. b) FISH images were taken and analyzed by an unbiased, external investigator. This person also chose representative images for the publication.
For other analyses investigators were not blinded. Validation Primary antibodies were validated regarding specificity and checked for cross-reactivity as described in the according publications: BB2 antibody Bastin, P., Bagherzadeh, A., Matthews, K. R. & Gull, K. A novel epitope tag system to study protein targeting and organelle biogenesis in Trypanosoma brucei. Mol. Biochem. Parasitol. 77, 235-239 (1996).

ChIP-seq Data deposition
Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).

Data access links
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided.

Methodology
Sample preparation 1x10*6 cells were centrifuged in a chilled microtube at 1,500 g for 4 min at 4 °C. Cells were resuspended in 100 ?l of ice cold HMI-11 and a VSG-specific antibody (Anti-VSG-2 or Anti-VSG Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information.