Introduction

Long-range chromatin contacts that bring genes and regulatory sequences in close proximity are necessary for co-transcription of biologically related and developmentally co-regulated genes.1, 2 Correspondingly, genomic structural changes were associated with disruption of the organization of chromatin compartments by shifting regulatory elements between domains and/or modifying domain boundaries, which resulted in ectopic interactions, gene misexpression and disease.3, 4 In the last 15 million years the 16p11.2–12.2 region rapidly integrated segmental duplications contributing to profound modifications of these chromosomal bands in hominoids.5, 6 It allowed the emergence of new transcripts7 and placed the whole region at risk for various recurrent rearrangements8, 9, 10 through non-allelic homologous recombination11 (Figure 1). These rearrangements include a recurrent interstitial deletion of ~600 kb defined by 16p11.2 breakpoints 4–5 (BP4-BP5; OMIM#611913), which encompasses 28 ‘unique’ genes and four genes with multiple copies12 (Figure 1). With a population prevalence of ~0.05% this variant is one of the most frequent known etiologies of autism spectrum disorder (ASD).9, 13, 14, 15 It impacts adaptive behavior and language skills and predisposes to a highly penetrant form of obesity and macrocephaly.15, 16, 17, 18, 19 A mirror phenotype is observed in carriers of the reciprocal duplication (OMIM#614671), who present a high risk of schizophrenia (SCZ), Rolandic epilepsy, being underweight and microcephalic.18, 19, 20, 21, 22 Case series have reported variable expressivity; systematic phenotyping showed that deletion and duplication lead to an average IQ decrease of 26 and 16 points in proband compared with non-carrier family members.15, 23 Correspondingly, the phenotypes of carriers identified in unselected populations are reminiscent of those described for carriers of 16p11.2 rearrangements ascertained in clinical cohorts.24 Deletions and duplications show a mirroring impact on brain volume and specific cortico-striatal structures implicated in reward, language and social cognition.25 Changes in copy numbers of this interval are associated with significant modifications of the mRNA levels of ciliopathy and ASD-associated genes in humans and mice.12, 26 Correspondingly, mouse models engineered to have three copies of the 7qF3 orthologous region showed reduced cilia length in the CA1 hippocampal region, whereas modulation of the expression of ciliopathy-associated genes rescued phenotypes induced by KCTD13 (MIM#608947) under- and overexpression,12 one of the key drivers of the 16p11.2 600 kb BP4-BP5 CNV genomic-interval associated traits.27 Distal to BP4-BP5, the deletion of 16p11.2 220 kb BP2-BP3 interval was similarly associated with obesity, developmental delay, intellectual disability and SCZ.16, 28, 29, 30, 31, 32, 33, 34, 35 However, detailed data about the phenotypes associated with the reciprocal duplication are still lacking.

Figure 1
figure 1

The 16p11.2 region and its 4C interactions profile (panels from top to bottom). Transcripts: The transcripts mapping within the human chromosome 16 GRCh37/hg19 27–31 Mb region are indicated. The 4C-targeted SH2B1, LAT, MVP, KCTD13, ALDOA, TBX6 and MAPK3 genes are highlighted in red. Segmental duplications/viewpoints: The duplicated regions containing the low-copy repeats (LCR) that flank these rearrangements telomerically and centromerically are shown, whereas the position of the restrictions fragments used as viewpoints are marked with red ticks. CNVs: The position of the 600 kb BP4-BP5 (orange) and 220 kb BP2-BP3 intervals (blue) are depicted. Brain/LCLs: The mean z-score for transcript expression per group (Brain or LCLs) from GTEx is displayed. The corresponding RNA-seq heatmap color legend is showed at the bottom left corner. PC/BRICKs: Smoothed and profile-corrected 4C signal (upper part of each panel) and BRICKs (lower part) identified for each of the seven 4C viewpoints within the 16p11.2 cytoband, that is, from top to bottom SH2B1, LAT, MVP, KCTD13, ALDOA, TBX6 and MAPK3. The corresponding BRICKs significance heatmap color legend is showed at the bottom right corner.

PowerPoint slide

We hypothesized that copy number modification of the 16p11.2 600 kb BP4-BP5 interval alters the three-dimensional positioning of these genes resulting in expression alterations of pathways involved in its phenotypic manifestation. We used chromatin conformation capture to explore the chromosome-wide effects of the 16p11.2 600 kb BP4-BP5 structural rearrangements on chromatin structure and assessed how these underlay the associated phenotypes. This region engages in multi-gene complex structures that are disrupted when its copy number changes. The implicated genes are known to be linked to 16p11.2-associated phenotypes, such as primary cilium alteration, energy imbalance, head circumference (HC) and ASD. We also demonstrate that our approach could be used to identify additional loci, whose copy number changes are associated with strikingly similar phenotypic manifestations.

Materials and methods

Recruitment and phenotyping of patients

The institutional review board of the University of Lausanne, Switzerland approved this study. Participants were enrolled in the study after signing an informed consent form and being clinically assessed by their respective physicians. For the data collected through questionnaires, information was gathered retrospectively and anonymously by physicians who had ordered chromosomal microarray analyses performed for clinical purposes only. Consequently, research-based informed consent was not required by the institutional review board of the University of Lausanne, which granted an exemption for this part of the data collection. Overall cognitive functioning was assessed as published.15

To better assess the phenotypic features associated with the 16p11.2 220 kb BP2-BP3 rearrangements, we recruited and phenotyped 110 and 57 carriers of the 220 kb BP2-BP3 deletion (OMIM#613444) and duplication from 88 and 49 families, respectively (Supplementary Table S1). Whereas these structural variants were previously reported to be among the CNVs most frequently harboring a possibly deleterious second genetic lesion (29 and 13% of the time, respectively),32 we do not confirm such propensity. Indeed, second-site structural variants were identified in ‘only’ 7% (6/88) and 4% (2/49) of the enrolled BP2-BP3 deletion and duplication carrier probands, respectively. Deleterious CNVs were defined as: (i) known recurrent genomic disorder, (ii) CNV encompassing published critical genomic region or disrupting a gene that is a known etiology of neurodevelopmental disorders or (iii) >500 kb CNV with AF<0.001. We compared available data on weight, height, body mass index (BMI) and HC for 77 and 39 unrelated deletion and duplication carriers, respectively (including published cases). The mean age of this group of patients was 16 years (range 0.42–78 years, with 34 cases older than 18 years). The prevalence of the 16p11.2 deletion and duplication were inferred from six European population-based genome-wide association studies cohorts, sets of chromosomal microarray-genotyped control individuals and clinical cohorts.9, 30, 33, 35, 36, 37, 38, 39 CNV analyses were carried out as described in Jacquemont et al.18

We similarly enrolled 26 and 9 unrelated carriers of 2p15 deletions and duplications, including 12 deletion cases from the literature.40, 41, 42, 43, 44, 45, 46, 47, 48, 49 The Signature Genomics cases were recently described in Jorgez et al.50 Patients were identified through routine etiological work-ups of patients ascertained for developmental delay/intellectual disability in cytogenetic centers. The coordinates of the rearrangements’ breakpoints (Supplementary Table S2) were recognized by different chromosomal microarray platforms.

Lymphoblastoid cell lines and transcriptome profiling

We had previously established by Epstein-Barr virus transformation lymphoblastoid cell lines (LCLs) from 16p11.2 BP4-BP5 patients, as well as controls. The LCL transcriptome of 50 deletion and 31 duplication carriers, as well as 17 control individuals was previously profiled with Affymetrix GeneChips Human Genome U133+ PM 24 array plates (Affymetrix, Santa Clara, CA, USA). The results are deposited in the NCBI Gene Expression Omnibus under accession number GSE57802. The Robust Multi-array Average approach was used for the creation and normalization of the summarized probe set signals. We applied a nonspecific filter to discard probe sets with low variability and low signal, that is, detectable expression levels. Specifically, probe sets with both (i) signal SD>median of signal SD of all probe sets and (ii) larger signal>median of larger signal of all probe sets were retained as described in Migliavacca et al.12 This selection yielded a total of 23,602 probe sets. To reduce a potential bias toward genes with multiple probe sets, for the modular analysis, only one probe set with the highest variance per gene was kept, for a total of 15,112 probe sets. Using a dosage effect model and moderated t-statistics, we identified 1188 and 2209 significantly differentially expressed genes (false discovery rate (FDR)1 and 5%, respectively; uniquely mapping probes).12 We used Geneprof to access data pertaining to gene expression and co-regulation.

We are well aware of the limitations of the study of LCLs, for instance for genes whose expression specificity resides in other cell lineages. These experiments are nevertheless worth pursuing simply because (i) the primary human target tissues remain often beyond reach; (ii) we cannot exclude a broad to ubiquitous expression pattern and chromatin contacts for the genes involved in these disease processes; and (iii) the pattern of expressions in peripheral tissue may be used as a biomarker in translational project. Similar limitations apply to the use of embryonic stem cells-derived material, while animal tissues have a different set of shortcomings.

Quantitative reverse transcriptase-PCR

For qPCR, 100 ng of high-quality total RNA was converted to cDNA using Superscript VILO (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol. Primers were designed using PrimerExpress 2.0 software (Applied Biosystems, Foster City, CA, USA), with default parameters except for the primer- and minimal amplicon lengths, which were set at 17–26 and 60 bp respectively. The amplification factor of each primer pair was tested using a cDNA dilution series and only assays with amplification factors between 1.75 and 2.00 were retained. A representative set of samples was tested for genomic contamination. qPCR experiments were performed in triplicate using SYBR-Green (Roche, Basel, Switzerland) as reporter. The reaction mixtures were prepared in 384-well plates using a Freedom Evo robot (Tecan, Männedorf, Switzerland) and run in an ABI 7900HT sequence detection system (Applied Biosystems) using the following conditions: 50°C for 2 min, 95°C for 10 min, followed by 45 cycles of 95°C for 15 s and then 60°C for 1 min, after which dissociation curves were established. Applicable normalization genes were included in each experiment to enable compensation for fluctuations in expression levels between experiments. Using SDS v2.4 software (Applied Biosystems) the threshold and baseline values were adjusted when necessary to obtain raw cycle threshold (Ct) values. The Ct values were further analysed using qBase plus software (Biogazelle, Zwijnaarde, Belgium), which calculates relative expression values per sample per tested gene upon designation of the normalization genes and corrects for the amplification efficiency of the performed assay. We assessed by qPCR the RNA levels of seven DE genes belonging to the ciliopathy or PTEN pathway (BBS4 (MIM#600374), BBS7 (MIM#607590), BBS10 (MIM#610148), XPOT (MIM#603180), NUP58 (MIM#607615), PTPN11 (MIM#176876) and SMAD2 (MIM#601366)), and five others that map either to the BP4-BP5 (ALDOA (MIM#103850), KCTD13, MAPK3 (MIM#601795) and MVP (MIM#605088)) or the BP2-BP3 interval (SH2B1 (MIM#608937)) in LCLs from eight carriers of the 220 kb BP2-BP3 deletion, eight carriers of the 600 kb BP4-BP5 deletion and 10 control individuals. In particular, we identified a significant diminution of the hemizygote gene SH2B1 but not of the neighboring normal-copy KCTD13, MVP and MAPK3 in BP2-BP3 deletion carriers.

Viewpoint selection

We used an adaptation of the 4C method,51, 52, 53 the high-resolution Chromosome Conformation Capture Sequencing technology (4C-seq),54 to identify chromosomal regions that physically associate with the promoters of MVP, KCTD13, ALDOA, TBX6 (MIM#602427) and MAPK3, five of the 28 ‘unique’ genes of the BP4-BP5 interval selected according to their potential role in the described phenotype. Reduction by 50% of the RNA levels of the ortholog of ALDOA (Aldolase A) was associated with a change in brain morphology in zebrafish, suggesting that this gene is dosage sensitive.55 In humans, recessive ALDOA deficiency is associated with glycogen storage disease XII (OMIM#611881).56 Morpholino-driven reduction of the expression level of the KCTD13 (Potassium Channel Tetramerization Domain containing protein 13) ortholog resulted in macrocephaly in zebrafish, while its depletion in the brain of mouse embryos resulted in an increase of proliferating cells. The mirroring microcephaly was seen upon overexpression of human KCTD13 cDNA in zebrafish embryos’ heads, a phenotype further amplified upon concomitant overexpression of either MAPK3 (mitogen-activated protein kinase 3) or MVP (Major Vault Protein).27 TBX6 (T-Box Transcription Factor 6) is a candidate gene for the vertebral malformations observed in some deletion carriers since (i) mice homozygous for a Tbx6 mutation showed rib and vertebral body anomalies;57 (ii) TBX6 polymorphisms were associated with congenital scoliosis in the Han population;58 (iii) a stoploss variant in TBX6 segregates with congenital spinal defects in a three-generation family59 (OMIM#122600); and (iv) carriers of 16p11.2 600 kb BP4-BP5 deletions and a common hypomorphic TBX6 allele suggest a compound inheritance in congenital scoliosis.60 TBX6 was selected as a viewpoint even though this gene is not expressed in LCLs (or only at extremely low level), as studies have shown that the contacted domains are stable across cell lines and tissues regardless of expression status.3 Within the group of genes chromatin-contacted by the above viewpoints we selected two more viewpoints within the 16p11.2 220 kb BP2-BP3 region (Figure 1), that is, the promoters of SH2B1 and LAT. The SH2B1 gene was suggested to be a crucial candidate for the obesity phenotype associated with this genomic interval16, 28 as it encodes an Src homology adaptor protein involved in leptin and insulin signaling.61, 62 Common variants in this locus were repeatedly associated with BMI, serum leptin and body fat in genome-wide association studies,63, 64, 65, 66 while rare dominant mutations were reported to cause obesity, social isolation, aggressive behavior, and speech and language delay.67 In a recent large-scale association study, the deletion was also significantly linked with SCZ.35 The LAT (linker for activation of T cells) adaptor molecule participates in AKT activation and plays an important role in the regulation of lymphocyte maturation, activation and differentiation.68, 69 Its inactivation could be circumvented by Ras/MAPK constitutive activation.70

4C-seq

4C libraries were prepared from LCLs of two control individuals and two carriers each of the 16p11.2 BP4-BP5 deletion and duplication, sex- and age-matched (Supplementary Table S3). Briefly, LCLs were grown at 37°C. 5 × 107 exponentially growing cells were harvested and crosslinked with 1% formaldehyde, lysed and cut with DpnII, a 4-cutter restriction enzyme that allows higher resolution.53 After ligation and reversal of the crosslinks, the DNA was purified to obtain the 3C library. This 3C library was further digested with NlaIII and circularized to obtain a 4C library. The inverse PCR primers to amplify 4C-seq templates were designed to contain Illumina adaptor tails, sample barcodes and viewpoint-specific sequences. Viewpoints were selected at the closest suitable DpnII fragment relative to the transcriptional start sites of the targeted genes. The sequence of the 4C-seq primers is reported in Supplementary Table S4. For all viewpoints, we amplified at least 1.6 μg of 4C template (using about 100 ng of 4C template per inverse PCR reaction, for a total number of 16 PCRs). We multiplexed the 4C-seq templates in equimolar ratios and analyzed them on a 100-bp single-end Illumina HiSeq flow cell. The numbers of raw, excluded, and mapped reads for each viewpoint and LCL sample are detailed in Supplementary Table S5.

4C-seq data analysis

4C-seq data were analyzed as described in Noordermeer et al.53and Gheldof et al.54 through the 4C-seq pipeline available at http://htsstation.epfl.ch/)71 and visualized with gFeatBrowser. Briefly, the multiplexed samples were separated, undigested and self-ligated reads removed. Remaining reads were aligned and translated to a virtual library of DpnII fragments. Read counts were then normalized to the total number of reads and replicates combined by averaging the resulting signal densities (Supplementary Figure S1). The local correlation between the profiles of the two samples per viewpoint was calculated (0.46r20.74 for controls, 0.29r20.67 for deletions and 0.22r20.61 for the duplications). The combined profiles were then smoothed with a window size of 29 fragments. The region directly surrounding the viewpoint is usually highly enriched and can show considerable experimental variation, thereby influencing overall fragment count. To minimize these effects, the viewpoint itself and the directly neighboring ‘undigested’ fragment were excluded during the procedure. In addition to this filtering, we modeled the data to apply a profile correction similar to the one described in Tolhuis et al.72 using a fit with a slope −1 in a log–log scale.73 Significantly interacting regions were detected by applying a domainogram analysis as described.74 We selected BRICKS (Blocks of Regulators In Chromosomal Kontext) with a P-value threshold smaller than 0.01 for both ‘cis’ and ‘trans’ interactions. To determine differentially interacting regions between the 16p11.2 600 kb BP4-BP5 deletion (Del), duplication (Dup) and control (Ctrl), we considered all non-null BRICKS found by a domainogram analysis74 in either condition and quantified both signals in each BRICK. The resulting table was scaled to the sample with the largest interquartile range and the difference of signals was compared with random in order to associate a P-value (FDR) with each BRICK. Finally, only BRICKS with a P-value <0.01 were considered.

All the viewpoints mapping on the BP4-BP5 interval, except KCTD13, contact the 146 and 147 kb long low copy repeats that flank the 16p11 600 kb BP4-BP5 rearrangements. To unravel whether the signal was reflecting the interaction with the centromeric, the telomeric or both low copy repeats and given the high similarity (99.5% identity) of the two blocks, we separately treated the reads mapping within these regions (chr16: 29460515–29606852 and chr16: 30199854–30346868 according to GRCh 37/hg19 assembly, February 2009) using different and more stringent criteria, that is, no mismatch and unique site mapping. All values were normalized to the total number of reads mapping to the two regions (per thousands of reads). We observed a higher proportion of contacts occurring with the centromeric segmental duplication compared with the telomeric one for MAPK3 and TBX6, while the trend was reversed for MVP, in agreement with their proximity to the centromeric and telomeric low copy repeat blocks, respectively. No conclusive results were obtained for ALDOA.

Hi-C data

Hi-C matrices from Rao et al.75 were prepared by first applying a KR normalization to the 5 and 100 kb resolution observed matrices, and then by dividing each normalized score by the expected one extracted from the KR expected file (as described in section II.c of the Extended Experimental Procedures of reference75). KR expected values less than 1 were set to 1 to avoid long-distance interaction biases.

Enrichment analyses

Gene annotation was obtained through BioScript (http://gdv.epfl.ch/bs). Protein interaction networks for the genes selected by BRICKS calling and from the list of interacting regions affected by the rearrangements were determined using STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) v9.1 (http://string-db.org/).76 We exploited Bioscript for Gene Ontology analysis (topGO) (http://gdv.epfl.ch/bs), DAVID GO and KEGG, OMIM Disease and KEGG coupled with Enrichr to assess if the chromatin-contacted genes were enriched in specific pathways and genes associated with Mendelian diseases77, 78, 79, 80 (http://www.omim.org/downloads). The OMIM gene-set library was obtained directly from the NCBI’s OMIM Morbid Map.81 We exploited the SFARI Gene lists and scores (https://sfari.org/; March 2014 release), the union of the genes cataloged in Girard et al.82 and Xu et al.82, 83, 84 and the genome-wide association studies hits for BMI85 to assess enrichment for ASD, SCZ and BMI genes, respectively. We also used the de novo ‘high confidence’ ASD targets (selected with FDR<0.1 in De Rubeis et al.86 and likely gene disrupting recurrent mutations target genes in Iossifov et al.87) to assess enrichment of ASD-associated genes. Ciliary genes enrichment was computed merging the SYSCILIA gold standard (SCGS) and potential ciliary gene lists (genes with no additional evidence for ciliary function were excluded).88 We used Enrichr Chromosome Location tool and BRICKS count in different window sizes (5 Mb, 1 Mb and 500 kb) to determine whether any cytogenetic band other than 16p11.2 was enriched for BRICKS. Other than 16p11.2, we identified enrichments at 16p12, 16p13, 16q13, 1p36, 11q13, 16q22, 7q31, 15q15 and 1q32 cytobands. As a large proportion of the 16p12.2 BRICKS maps to segmental duplications that are highly similar to the low copy repeats flanking the 16p11 600 kb BP4-BP5 rearrangements we conservatively did not consider this region.

All seven tested viewpoints showed enrichment for contacts with loci that encode proteins that interact together (all P<0.01). A single process, focal adhesion assembly, was shared between the BP2-BP3 and the BP4-BP5 groups of viewpoints (GO:0048041, P=6.02e03 for the BP2-BP3 group and P=1.12e−03 for the BP4-BP5). Focal adhesion links the internal actin cytoskeleton to the extracellular matrix; it is used by cells to explore their environment, and depends strongly on microtubule dynamics89 in coordination with the primary cilium.90, 91 Genes participating in focal adhesion (GSK3B (MIM#605004); PAK7 (MIM#608038)), axon guidance (ROBO1 (MIM#602430); EPHA6 (MIM#600066); PAK7; GSK3B) and Golgi apparatus-related processes (SND1 (MIM#602181); FRMD5 (MIM#616309)) are among the 24 genes trans-contacted by both the BP4-BP5 and BP2-BP3 viewpoints. The gene MPZL1 (MIM#604376), contacted by both KCTD13 and SH2B1, is associated with SCZ.92 It is a downstream target of PTPN11.93

Fluorescence in situ hybridization

Interphase nuclei were prepared from a LCL of a control individual. Fluorescence in situ hybridization experiments were performed using fosmid clones (300 ng) directly labeled by nick-translation with Cy3-dUTP and fluorescein-dUTP as previously described94 with minor modifications. Fosmid and BAC clones (G248P86150B3 for the ALDOA locus, G248P800063B6 for the SH2B1 locus; G248P86115A10 for the KIAA0556 locus; RP11-301D18 for the KCTD13 and MVP loci; RP11-383D9 for PTEN (MIM#601728); RP11-477N2 for USP34 (MIM#615295)/XPO1 (MIM#602559) and RP11-43E18 for MARK4 (MIM#606495)) were obtained from the CHORI BACPAC Resources Center (https://bacpac.chori.org/). We picked the MARK4-encompassing BAC as ‘control BAC’ because it maps to a gene-rich region on chromosome 19, a centrally-positioned chromosome within the nucleus. Hybridization was performed at 37 °C in 2 × SSC, 50% (v/v) formamide, 10% (w/v) dextran sulfate, 3 μg C0t-1 DNA and 3 μg sonicated salmon sperm DNA in a volume of 10 μl. Post-hybridization washing was at 60 °C in 0.1 × SSC, three times. Nuclei were DAPI-stained and digital images were obtained using a Zeiss Imager A1 fluorescence microscope (Carl Zeiss, Oberkochen, Germany). We considered 50–60 cells per experiment (i.e. at least 100 distances) and co-localization was defined if the distance between signals was 0.3 um. The contact between SH2B1 and ALDOA, versus the control KIAA0556, was estimated by calculating the distance between BAC probes (median SH2B1-ALDOA and -KIAA0556 distances=0.43 and 1.24 μm, respectively; Wilcoxon rank-sum test, P=1.45e−17). The contact of MVP and KCTD13 with, respectively, PTEN and USP34/XPO1, compared with the control MARK4, was estimated as the percentage of co-localization (25 and 14% co-localization versus 2% with the control locus; Fisher’s test enrichment: P=6.9e−05 and P=0.01, respectively; median MVP/KCTD13-USP34/XPO1 distances=1.76, MVP/KCTD13-PTEN= 2.61 and MVP/KCTD13-MARK4 = 4.96 μm; Wilcoxon rank-sum test, P=5.4e−10 and P=9.3e−05, respectively).

ChIP-seq and RNA-seq molecular associations

Detailed experimental procedures and results are described in Waszak et al.95 Briefly, ChIP-seq (chromatin immuno-precipitation coupled with sequencing) and mRNA-seq data were produced from LCLs of 54 individuals of European origin from the 1000 Genomes Project.96 ChIP-seq with antibodies recognizing H3K4me1, H3K4me3, H3K27ac, PU.1 and RNA Pol2 binding, as well as mRNA-seq gene expression profiling, were carried out from a single growth of LCLs as previously described.97 Genotypes were obtained from the GEUVADIS consortium.98 To map associations between pairs of ChIP-seq and/or RNA-seq peaks, we retained 47 individuals after data quality control and proceeded as follows for each of the 15 possible unordered pairs of distinct molecular phenotypes (A1, A2). First, we measured inter-individual Pearson correlation between every possible pair of normalized quantifications at peaks (p1, p2) within the 16p11.2 interval (28.1–34.6 Mb) such that p1 and p2 belong to A1 and A2, respectively. Note that the distances here were measured between the respective peak centers, excepted for mRNA for which we used the transcription start site. Then, we assessed to what extent the correlations significantly differed from zero by calculating P-values using the R function cor.test and corrected them for multiple-testing by using the Benjamini and Hochberg procedure as implemented in the R function p.adjust (FDR 5 and 10%).

Results

Distinct and non-overlapping loci at 16p11.2 are associated with mirror phenotypes on BMI and HC and autism susceptibility

To comprehensively assess phenotypic features associated with the distal 16p11.2 220 kb BP2-BP3 CNVs (Figure 1), we collected de-identified data on 137 unrelated carriers (88 deletions and 49 duplications; Supplementary Table S1) and compared BMI and HC with gender-, age- and geographical location-matched reference population as described18 (Figures 2a and b). The BMI mean Z-score of deletion carriers deviated significantly from that of the general population (t-test, P=3.1e−14), replicating the earlier described association of the deletion with obesity.16, 28 We observed a trend towards increased HC in deletion carriers. The duplication carriers showed a mirroring decrease of BMI and HC values when compared with those of the control population (t-test, P=0.005 and 1.1e−4, respectively). We also observed an increase in ASD prevalence in both deletion (23/88; 26%) and duplication (11/49; 22%) carriers compared with the general population (5,338/363,749; 1.5%)99 (Fisher’s enrichment test: OR=23.7, P=2.5e−22; OR=19.4, P=1.2e−10) in agreement with published results.29, 30, 31, 32, 33, 34, 35 Thus, genomic rearrangements at 600 kb BP4-BP5 and 220 kb BP2-BP3, two loci 650 kb apart, present similar clinical patterns: large effect sizes on BMI and HC, as well as association with ASD.

Figure 2
figure 2

Phenotypic characterization of carriers of 16p11.2 BP2-BP3 and 2p15 rearrangements. Distribution of Z-score values of BMI (a) and head circumference (b) in unrelated carriers of the 16p11.2 220 kb BP2-BP3 deletion (red) and duplication (blue) taking into account the normal effect of age and gender observed in the general population as described in Jacquemont et al.18 The general population has a mean of zero. (c) Comparison of the genomic breakpoints of 2p15 deletions (red bars) and duplications (blue bars) in 26 and 9 unrelated carriers, respectively. The breakpoints’ coordinates are detailed in Supplementary Table S2. The genes mapping within the interval and cytobands’ positions are shown above, while the extent of the critical region is indicated by a black bar. Distribution of Z-score values of BMI (d) and head circumference (e) in carriers of the 2p15 deletion (red) and duplication (blue).

PowerPoint slide

Cis-acting chromatin loops that link the 16p11.2 BP4-BP5 and BP2-BP3 genomic intervals are perturbed in BP4-BP5 CNV carriers

We posited that the remarkable overlap of phenotypic features associated with the BP2-BP3 and BP4-BP5 CNVs might derive from the rearrangement-mediated disruption of the 3D chromatin structure within the 16p11.2 cytoband. To challenge this hypothesis, we assessed the pattern of chromosomal interactions of selected ‘viewpoints’ from both loci in two LCLs derived from control individuals using an adapted version of the 4C method (4C-seq: circularized chromosome conformation capture combined with multiplexed high-throughput sequencing)51, 52, 54, 100 (Materials and Methods, Supplementary Table S3). Despite the limitations of the study of LCLs (Materials and Methods), these experiments are worth pursuing as studies have shown that chromatin contacts are stable across cell lines and tissues regardless of contacted-gene expression status3 and that LCL transcriptome profiles can be recapitulated in other tissues and species.12 Specifically, our previous analyses of LCL transcriptomes showed that genes whose expression correlated with the dosage of the 16p11.2 locus are significantly enriched in genes associated with ASD and ciliopathies both in human LCLs and mouse cortex.12 In particular, we identified chromosomal regions that physically associate with the promoters of MVP, KCTD13, ALDOA, TBX6 and MAPK3, five genes mapping to the BP4-BP5 interval, and SH2B1 and LAT, two genes mapping to the BP2-BP3 one. These were investigated based on their potential role in the phenotype (Supplementary Figure S2).27, 55, 57, 58, 59

Genome-wide we identified an average of 265 BRICKs (FDR1%), i.e., three-dimensionally interacting genomic fragments, for the seven viewpoints (range: 168–442; Supplementary Tables S6–S12). In particular, we observed complex chromatin looping between genes located in the proximal BP4-BP5 and those mapping both to the distal BP2-BP3 region and the equidistant downstream region rich in Zn-finger genes (Figures 1 and 3a). For instance each of the nine genes of the BP2-BP3 interval (ATXN2L (MIM#607931), TUFM (MIM#602389), SH2B1, ATP2A1 (MIM#108730), RABEP2 (MIM#611869), CD19 (MIM#107265), NFATC2IP (MIM#614525), SPNS1 (MIM#612583) and LAT) is contacted by at least one of the five assessed viewpoints in the BP4-BP5 interval (Figures 1 and 3a). We reciprocally validated these chromatin interactions using the promoters of SH2B1 and LAT as viewpoints (e.g. the chromatin loops of MVP, KCTD13, ALDOA, TBX6 and MAPK3 with SH2B1 are all recapitulated using SH2B1 as viewpoint; Figure 3a and Supplementary Tables S11 and S12). The preferential contacted domain of the BP2-BP3 viewpoints extends proximally to the BP4-BP5 and Zn-finger gene-rich regions (Figures 3a and b and Supplementary Figure S3). Inversely, significantly less interactions are called in the gene-rich and equidistant distal region (t-test P=0.011, Supplementary Figure S3), suggesting that these interactions do not merely reflect the spatial clustering of gene-dense regions.

Figure 3
figure 3

Chromatin interactions between the 16p11.2 600 kb BP4-BP5 and 220 kb BP2-BP3 genomic intervals. (a) Circos plot representation of the chromatin loops identified in the human chromosome 16 27.5–31.0 Mb window. The 220 kb BP2-BP3 and 600 kb BP4-BP5 intervals are depicted by blue and orange bars on the peripheral circle, respectively. Darker sections indicated the positions of the viewpoints. Central blue and orange lines indicate the chromatin interactions corresponding to BP2-BP3 and BP4-BP5 viewpoints, respectively. Note the quasi absence of loops between the BP2-BP3 viewpoints (LAT and SH2B1) and the 27.5-28.4 Mb region. The mapping position of the KIAA0556 gene, used as control locus in fluorescence in situ hybridization experiments, is indicated. (b) High-resolution Hi-C chromosome conformation capture results obtained in reference75 with the GM12878 LCL within the chromosome 16 0–34 Mb window (left panels) and zoom in within the 28–31 Mb region encompassing the two CNVs (5 kb resolution; right panels). The positions of the 220 kb BP2-BP3 and 600 kb BP4-BP5 intervals are shown by blue and orange bars, respectively. Observed (top panel), observed/expected (central panel) and Pearson correlation results are presented (bottom panel). (c) Fluorescence in situ hybridization experiments show colocalization of SH2B1 foci (green) that map to the 220 kb BP2-BP3 interval with ALDOA foci (red) that map to the 600 kb BP4-BP5 genomic interval (left panel) but not with the equidistant KIAA0556 (red) foci (central panel). The distribution of interphase nuclei distances between the SH2B1 and ALDOA (deep pink) and SH2B1 and KIAA0556 foci (gray) are shown in the lower panel. The mapping positions of ALDOA, SH2B1 and KIAA0556 are indicated in (a).

PowerPoint slide

We confirmed the genomic interaction between the 600 kb BP4-BP5 and 220 kb BP2-BP3 intervals using fluorescence in situ hybridization. This independent method showed that the BP2-BP3-mapping SH2B1 locus was significantly closer to the BP4-BP5-encompassed ALDOA locus than to a control region, the KIAA0556 locus, situated equidistantly on its telomeric side (median SH2B1-ALDOA and SH2B1-KIAA0556 distances=0.43 and 1.24 μm, respectively; Wilcoxon rank-sum test, P=1.45e−17) (Figure 3c, Supplementary Figure S4). We also examined published Hi-C (genome-wide conformation capture) and high-resolution Hi-C data from LCLs. Although they cannot confirm our chromatin connections given their limited resolution, they support a preferential three-dimensional proximity of these two regions73, 75 (Figure 3b). Concordant results were found in both human IMR90 fibroblasts and embryonic stem cells26, 101 and mouse cortex and embryonic stem cells, suggesting the conservation of the structure of this topological-associated domain across species and tissues,101 as recently shown for the topological-associated domain spanning the WNT6/IHH/EPHA4/PAX3 (MIM#604663; #600726; #602188; #606597) locus.3

As chromatin interactions were determined in normal diploid context, we next assessed the effect of BP4-BP5 CNVs on these chromatin loops. We identified genomic fragments that interact with the same seven viewpoints in LCLs of two BP4-BP5 deletion patients and two reciprocal duplication patients (Materials and Methods). We observed a genome-wide decrease in the number of BRICKS per viewpoint ranging from 27 to 84%, suggesting that both rearrangements triggered dramatic reorganizations. Consistent with this hypothesis, the SH2B1 viewpoint, whose copy number is not affected by the proximal BP4-BP5 CNV, shows a 36% reduction in the amount of interacting regions (all BRICKS listed in Supplementary Tables S13–S26). We compared the 4C-seq results from control individuals and the four patients and identified, across all conditions and considering all viewpoints, 1193 genes with significantly modified chromosomal contacts (FDR<1%; Materials and Methods, Supplementary Table S27). These results support the idea that large structural rearrangements perturb the 3D genomic structure by modifying both cis and trans contacts.

Perturbations of the chromatin interactions’ landscape at 16p11.2 are associated with gene expression modification

Our results show that the gene-rich BP2-BP3 and BP4-BP5 16p11.2 intervals, whose CNVs are linked to overlapping phenotypes, are reciprocally engaged in complex chromatin looping as determined by 4C, fluorescence in situ hybridization and Hi-C. The recent discovery of multigene complexes where chromosomal loops orchestrate co-transcription of interacting genes2, 102 is suggestive of functional implications for the chromosomal contacts between the BP2-BP3 and BP4-BP5 intervals.

To assess this possibility, we first used our recent association analyses of population-wide transcription factor DNA binding (PU.1 and RPB2—the second largest subunit of RNA polymerase II), histone modification enrichment patterns (H3K4me1, H3K4me3 and H3K27ac) and gene expression measured by ChIP-seq and RNA-seq in LCLs derived from 47 European unrelated individuals whose genomes were sequenced in the frame of the 1000 Genomes Project.95 We measured the extent of quantitative coordination of natural inter-individual variation between pairs of these six molecular phenotypes at putative regulatory regions mapping within cytoband 16p11.2 and identified coordinated behavior in terms of mapping enrichment. For example, we found association between active regulatory regions mapping within the BP4-BP5 interval and expression of BP2-BP3 genes (Supplementary Figure S5),95 consistent with the notion that some of the chromatin loops uncovered between these two intervals might bring together regulatory elements and genes.

Secondly, we examined if genes involved in primary cilium function and related pathways, which are modified in BP4-BP5 deletion patients’ cells,12 are also changed in BP2-BP3 deletion carriers (Materials and Methods). We found that the ciliary genes BBS4, BBS7, SMAD2, XPOT and NUP58 are correspondingly modified in LCLs derived from both BP4-BP5 and BP2-BP3 deletion carriers (Supplementary Figure S6). The interplay between the 600 kb BP4-BP5 and the 220 kb BP2-BP3 interval is further substantiated: (i) by published data showing perturbed expression of genes mapping within the BP2-BP3 distal interval (i.e. LAT, SPNS1 and ATP2A1) in cells derived from BP4-BP5 patients;26 as well as (ii) by the observation that, within the top-10 genes correlated with SH2B1 expression according to GeneProf (www.geneprof.org), three (ZNF500; CDAN1 (MIM#607465); LRRC14) are contacted by viewpoints located in the BP4-BP5 region and two (PIGO (MIM#614730); TUBGCP6 (MIM#610053)) are differentially expressed in BP4-BP5 CNV patients.

We then assessed whether the structural chromatin changes identified in the 600 kb BP4-BP5 CNVs carriers (i.e. 1193 genes with significantly modified chromosomal contacts, FDR<1%) are paralleled by transcriptome modifications identified in Migliavacca et al.12 (2209 significantly differentially expressed genes, FDR5%; uniquely mapping probes)). We find a significant overlap between the two genes lists as 125 genes with modified chromatin loops are concomitantly differentially expressed in 16p11.2 600 kb BP4-BP5 CNV carriers’ cells (125/665 4C-modified genes with detectable expression (see Materials and Methods); Fisher’s enrichment test: OR=1.4, P=0.002; Supplementary Tables S28–S30 and Figure 4a). There is no correlation between quality of the change in 4C contact (increased or decreased) and the sense of the perturbation in gene expression, that is, a loss of contact does not necessarily imply a decrease in expression and vice versa (Supplementary Figure S7 and Supplementary Table S31).

Figure 4
figure 4

Extensive overlap between differentially expressed genes and loci that show modified chromatin interactions. (a) Top panel: weighted Venn diagram showing the overlap between the 2209 genes that are differentially expressed in 16p11.2 rearrangement carriers (DE, yellow disk; FDR5%12), the 1193 genes that show modified chromatin interactions in 16p11.2 rearrangement carriers (4C-modified, purple disk; only 665 with detectable expression are considered for the DE enrichment; see Supplementary Table S31) and the 604 genes listed in SFARI Gene (https://sfari.org/; 323 expressed), an annotated list of candidate genes for ASD (ASD; blue disk). The numbers of common genes are indicated and the 12 4C-modified, DE and ASD-SFARI genes are specified on the right. Bottom panels: weighted Venn diagrams showing the overlap between the DE genes and the LCLs-expressed ASD and 4C-modified genes (lower left and right, respectively). (b) Circos plot representation of the modified chromatin loops identified in human chromosomes 16 and 22 (right-hand panel). The 220 kb BP2-BP3 and 600 kb BP4-BP5 intervals are depicted by blue and orange bars on the peripheral circle, respectively. Central blue and orange lines indicate the CNVs-modified chromatin interactions corresponding to BP2-BP3 and BP4-BP5 viewpoints, respectively. Ticks on the three internal rings indicate BRICKS with significantly modified interactions between 16p11.2 600 kb BP4-BP5 duplication and control samples (light blue ring), between 16p11.2 600 kb BP4-BP5 deletion and control samples (dark gray), and between 16p11.2 600 kb BP4-BP5 deletion and duplication samples (yellow). Blue and red ticks on the most external rings denote genes differentially expressed in 16p11.2 patients (DE) and SFARI-ASD-associated genes (ASD), respectively. A zoomed-in view with examples of genes with modified chromatin interactions mapping within the 22q13 cytoband is presented in the left-hand panel. (c) The 1193 genes that show modified chromatin interaction in 16p11.2 cells with 16p11.2 600 kb BP4-BP5 rearrangements encode proteins that interact. The confidence view interaction network of the encoded proteins corresponding to the enriched GO terms (GO:0030030 cell projection organization, GO:0042995 cell projection, GO:0030173 integral to Golgi membrane, GO:0000138 Golgi trans cisterna, GO:0034504 protein localization to nucleus and GO:0005874 microtubule) is visualized with STRING. Proteins belonging to cell projection (blue), microtubule (red), Golgi apparatus (green), stereocilium bundle (purple) and cilium (yellow) process/cell components are highlighted by colored beads. Disconnected nodes are not shown. FDR, false discovery rate.

PowerPoint slide

These results show that the BP2-BP3 and BP4-BP5 loci are linked by chromatin loops, coordinated molecular phenotypes and co-regulation of genes.

Genomic regions contacted by 16p11.2 viewpoints are associated with autism, BMI and HC phenotypes and enriched in ciliary genes

Consistent with the notion that chromatin contacts connect biologically related genes, BRICKs genes are enriched for genes that encode proteins that interact (P<0.01 for all viewpoints; Supplementary Table S32, Materials and Methods). Processes overrepresented within BRICKs genes are listed in Supplementary Table S33. They are also enriched for genes listed in SFARI Gene (https://sfari.org/), an annotated list of candidate genes for ASD (union of the SFARI Syndromic, High Confidence and Strong Candidate Gene categories (Categories S+1-2), 13/76, OR=2.9, P=0.0014) and for ASD-associated genes identified by whole-exome studies (8/50 of the ‘high confidence de novo’ ASD-associated genes;86, 87 OR=2.7, P=0.016; Supplementary Table S34). The BP4-BP5 and BP2-BP3 viewpoints contacted genes include GRID1 (MIM#610659) and PTEN at 10q23.2–q23.31, USP34/XPO1 at 2p15, but also genes linked to HC phenotypes like CHD1L (MIM#613039) at 1q21.1103, 104, 105, 106, 107, 108, 109, 110, 111, 112 and EP300 (MIM#602700) at 22q13113 (Figure 5a, Supplementary Figure S8 and Supplementary Table S35). To validate these interactions, we verified a subset by fluorescence in situ hybridization (e.g. MVP-PTEN and KCTD13-USP34/XPO1; Figures 5b–e and Materials and Methods). The comparison of distributions of Hi-C scores in selected versus non-selected BRICKS for each of our 4C viewpoints (Materials and Methods) further demonstrates the reproducibility of the 4C results (Supplementary Figure S9).

Figure 5
figure 5

Examples of 16p11.2 viewpoints chromatin-contacted regions. (a) Examples of regions (BRICKS) interacting with 16p11.2 viewpoints showing some of the contacted genes, that is, GRID1, PTEN and USP34/XPO1. Other examples (CHD1L and EP300) are shown in Supplementary Figure S8). Fluorescence in situ hybridization experiments show colocalization of the 600 kb BP4-BP5 interval-encompassed KCTD13 (red) and 2p15-mapping XPO1 foci (green) (b) and the 600 kb BP4-BP5 interval-encompassed MVP (red) and 10q23.31-mapping PTEN foci (green) (d). The distribution of interphase nuclei distances between KCTD13 and XPO1 (c) and between MVP and PTEN (e) foci are compared with to those between KCTD13/MVP and MARK4 (control) foci (25 and 14% co-localization versus 2% with the control locus; Fisher’s test enrichment: P=6.9e−05 and P=0.01, respectively; median MVP/KCTD13-USP34/XPO1 distances=1.76, MVP/KCTD13-PTEN=2.61 and MVP/KCTD13-MARK4=4.96 μm; Wilcoxon rank-sum test, P=5.4e−10 and P=9.3e−05, respectively).

PowerPoint slide

Reminiscent of our transcriptome findings,12 enrichment analysis of the 1193 genes with modified chromosomal contacts showed significant over-representation of ciliary genes88 (40/493, OR=1.47, P=0.030, Supplementary Tables S29–30), OMIM terms associated with dysfunction of ciliary structures (Supplementary Table S28) and candidate genes for ASD (Supplementary Tables S29–S30). We showed previously12 that differentially expressed genes were similarly enriched for SFARI-ASD-associated genes (91/323 with detectable expression; OR=2.3, P=2.43e−10; Figure 4a). Notably five (TCF4 (MIM#602272), EP300, ADK (MIM#102750), TUBGCP5 (MIM#608147), VPS13B (MIM#607817)) of the 12 genes that are concurrently SFARI-ASD, differentially expressed and modified in 4C contacts (OR=5.01, P=1.5e−05) were previously associated with head circumference changes114, 115, 116, 117 (Figures 4a and b and Supplementary Tables S29 and S30).

Phenotypes associated with 2p15-16.1 CNVs

Our results suggest that chromatin interactions captured by the BP4-BP5 and BP2-BP3 viewpoints and their perturbations can be exploited to identify additional genes/loci, which when genetically perturbed, are associated with similar pathways, diseases and phenotypes. We challenged this hypothesis by assessing whether the phenotypic features of 2p15-p16.1 deletion and duplication carriers overlap with those of carriers of the 600 kb BP4-BP5 and 220 kb BP2-BP3 rearrangements. Haploinsufficiency of the chromatin-contacted USP34/XPO1 was suggested to be responsible for the 2p15-p16.1 deletion syndrome (MIM#612513) phenotypes that included intellectual disability, ASD, microcephaly, dysmorphic facial features and a variety of congenital organ defects.111, 112

We collected clinical data on 26 and 9 unrelated 2p15-p16.1 deletion and duplication carriers, respectively (Figure 2c and Supplementary Table S2). Comparison of data on BMI and HC of both variants pinpoints mirror phenotypes for these two traits (Figures 2d and e). Whereas we do not formally demonstrate a mechanistic and functional link between the 600 kb BP4-BP5 and the 2p15 interval, it should be noted that the KCTD13USP34/XPO1 interactions are present in controls LCLs, but neither in 16p11.2 deletion nor duplication LCLs (see below). Furthermore, five (C2orf74, COMMD1 (MIM#607238), FAM161A (MIM#613596), PEX13 (MIM#601789), PUS10 (MIM#612787)) and 11 (AHSA2, BCL11A (MIM#606557), PAPOLG, REL (MIM#164910), USP34, XPO1) of the 13 genes mapping within the 2p15-16.1 syndrome minimal overlapping interval111, 112 show perturbed expression levels in 16p11.2 CNV patients with cutoffs of 5 and 15% FDR, respectively. Thus, the cis- and trans-chromatin contacts we uncovered bridge genomic regions, whose rearrangements are associated with ASD and mirror phenotypes on BMI and HC.

Discussion

The 16p11.2 600 kb BP4-BP5 rearrangements allow investigation of molecular mechanisms underlying the co-morbidity triad of neurodevelopmental disorders, energy imbalance and HC alterations, all associated with changes in gene dosage. To identify pathways that are perturbed when the dosage of this region is modified we cataloged the chromosomal contacts of genes mapping within this genomic interval. Using chromosome conformation capture we uncovered a network of chromatin loops with genes previously associated with ASD and HC demonstrating the pertinence of this approach. We show, for example, that the 16p11.2 phenotype drivers MVP and MAPK3 promoters have long-range chromatin interactions with PTEN and CHD1L, respectively. The MVP protein regulates the intracellular localization of PTEN,118 a dual-specificity phosphatase that antagonizes PI3K/AKT and Ras/MAPK signaling pathways. Both PTEN germline mutations in humans and targeted inactivation in mice are associated with macrocephaly/ASD syndrome (MIM#605309).103, 104, 105, 106, 119 Congruently, germline mutations in the Ras/MAPK pathway cause a group of syndromes frequently regrouped under the term RASopathies, recently shown to affect social interactions.120, 121 We corroboratingly revealed that expression of PTEN pathway members is sensitive to gene dosage at the 16p11.2 locus.12 CHD1L was suggested to be a major driver of the phenotypes associated with 1q21.1 rearrangements (OMIM#612474; #612475).107, 108 Analogous to 16p11.2, deletions and duplications of this interval are linked to micro- and macrocephaly, respectively.109

These results suggest that chromatin interactions, even when tested in peripheral tissues (such as LCLs) not considered to play a central role in the resulting neurodevelopmental phenotype, could reveal genes or pathways, which are co-regulated and associated with similar phenotypes. Several studies have shown that the contacted domains can be highly stable across species and cell lines (even when the contacted genes are not expressed),3, 101 supporting the notion that patient-derived samples can provide direct insight into regulatory abnormalities and that LCLs still contain valuable information for the study of the patients phenotype.12 Consistent with this hypothesis, we established that dosage perturbation of two chromatin-contacted loci, the cis-contacted 16p11.2 220 kb BP2-BP3 and the trans-interacting 2p15 intervals, are associated with mirror phenotypes on BMI and HC. Similar regulatory cores engaged in multiple physical interactions were recently described, for example, the 8q24 oncogenetic locus.122

The physiological relevance of the underlying chromatin architecture is further exemplified by the extensive overlap between differentially expressed genes and the loci that show modified loops upon dosage changes of the 16p11.2 600 kb BP4-BP5 interval. Together with the observed enrichments in various pathology-relevant gene ontology terms and pathways, this sheds light on a possible ‘chromatin hub’ role of the 16p11.2 locus in the observed phenotypes. The 12 genes that are concurrently SFARI-ASD, differentially expressed and modified in 4C contacts include (i) VPS13B, whose mutations cause Cohen syndrome (OMIM#216550), an autosomal recessive disorder characterized by intellectual disability, microcephaly, retinal dystrophy and truncal obesity;123 (ii) TCF4, whose haploinsufficiency is associated with Pitt-Hopkins syndrome (OMIM#610954) typified among other traits by intellectual disability, recurrent seizures and microcephaly (Of note, the expression level of this transcription factor of the Wnt/β-catenin signaling cascade was shown to be altered in the cortex of mice models engineered to carry one or three copies of the 16p11.2 orthologous region26); (iii) the TSC2 (MIM#191092) tuberous sclerosis (OMIM#613254) gene, which encodes an inhibitor of mTORC1 signaling that limits cell growth and was linked to ciliary dysfunction;124 and (iv) EP300, which is mutated in a form of Rubinstein-Taybi syndrome 2 (OMIM#613684) associated with a more severe microcephaly.114, 125

Whereas the functions and natures of the detected contacts remain to be elucidated, recent reports show that transcription of co-regulated genes occurs in the context of spatial proximity, which is lost in knockout studies.2, 102 While we cannot exclude that the observed proximal positions of genes is brought about by proteins that do not directly interact with any of them126 or occur through their use of identical transcription factory for example,127, 128 we de facto witness that expression of multiple genes of converging pathways is modified and that the chromatin-contacted genes are encoding proteins of overlapping interactomes (Figure 4c and Supplementary Figures S10–S11). For example, MVP loops with genes implicated in maintenance of cell polarity (P=2.75e−03) (Supplementary Table S33).

Chromatin spatial organization is conserved, to some extent at least, through evolution.129 The distal and proximal 16p11.2 regions are physically interacting in both human and mouse cells.73, 101 This chromatin crosstalk is preserved despite modifications in the human lineage of the orientation of both the BP2-BP3 and the BP4-BP5 regions, as well as doubling of the size of the intervening region. While we cannot rule out that similar studies on the clinically relevant tissues might uncover additional important partners, these studies demonstrate that maintenance of chromatin crosstalk across tissues (from fibroblasts to cortical neurons) and in different lineages lends credence to the use of LCLs and animal models as proxies to study chromatin properties of the human central nervous system, the more likely tissue determining the phenotypes associated with 16p11.2 600 kb BP4-BP5 and BP2-BP3 CNVs.

The identified cis- and trans-chromatin contacts bridge loci whose rearrangements result in mirror phenotypes on BMI and HC, as well as involve known ASD candidate genes. While investigations of the 3D genomic structures of additional regions are warranted, the results present here support the idea that the elucidation of chromatin contacts can be proposed as a new and effective tool to unravel genes participating in similar pathways or disease mechanisms, and identify loci associated with overlapping phenotypic manifestations. Our study also suggests that modifications of chromatin interplays play a crucial role in the observed phenotypes.

Accession numbers

GEO Seriegs accession number: GSE57802.

The Web Resources section

4C-seq pipeline: http://htsstation.epfl.ch/BioScript: http://gdv.epfl.ch/bsGeneProf: http://www.geneprof.orgGTEx: http://www.gtexportal.org/home/gFeatBrowser: http://www.gfeatbrowser.com. Online Mendelian Inheritance in Man: http://www.omim.orgSFARI: https://sfari.org/STRING: http://string-db.org/.