Pseudotime analysis reveals novel regulatory factors for multigenic onset and monogenic transition of odorant receptor expression

Hussainy, Mohammad; Korsching, Sigrun I.; Tresch, Achim

doi:10.1038/s41598-022-20106-w

Download PDF

Article
Open access
Published: 28 September 2022

Pseudotime analysis reveals novel regulatory factors for multigenic onset and monogenic transition of odorant receptor expression

Mohammad Hussainy^1,2,
Sigrun I. Korsching²^na1 &
Achim Tresch^1,3,4^na1

Scientific Reports volume 12, Article number: 16183 (2022) Cite this article

2117 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

During their maturation from horizontal basal stem cells, olfactory sensory neurons (OSNs) are known to select exactly one out of hundreds of olfactory receptors (ORs) and express it on their surface, a process called monogenic selection. Monogenic expression is preceded by a multigenic phase during which several OR genes are expressed in a single OSN. Here, we perform pseudotime analysis of a single cell RNA-Seq dataset of murine olfactory epithelium to precisely align the multigenic and monogenic expression phases with the cell types occurring during OSN differentiation. In combination with motif analysis of OR gene cluster-associated enhancer regions, we identify known and novel transcription (co-)factors (Ebf1, Lhx2, Ldb1, Fos and Ssbp2) and chromatin remodelers (Kdm1a, Eed and Zmynd8) associated with OR expression. The inferred temporal order of their activity suggests novel mechanisms contributing to multigenic OR expression and monogenic selection.

Simultaneous single-cell three-dimensional genome and gene expression profiling uncovers dynamic enhancer connectivity underlying olfactory receptor choice

Article Open access 15 April 2024

Evolution of tissue-specific expression of ancestral genes across vertebrates and insects

Article 15 April 2024

Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain

Article Open access 09 April 2024

Introduction

The sense of smell is tasked with the daunting challenge of making sense of potentially billions of different chemical stimuli to enable a multitude of different behaviors such as food search, prey hunting, predator evasion, mating and other social interactions¹. This task is solved by several different receptor families, of which the best studied is both the evolutionary oldest and the largest (odorant receptor genes, OR genes)². The analytical power of this system is maximal when information gathered from activation of individual receptors is kept separate at the peripheral level. Indeed for both vertebrates and insects, it has been shown that individual olfactory sensory neurons (OSNs) express only a single OR gene out of the entire olfactory receptor repertoire, which has been christened as monogenic expression^3,4. Sensory neurons expressing the same receptor are distributed across the olfactory sensory surface, but their axons converge into a single target region in the olfactory bulb, the first relay station of olfactory information processing^5,6,7,8. These target regions (so-called glomeruli) show a stereotyped arrangement, resulting in a receptotopic map on the olfactory bulb (or antennal lobe in the case of insects). Thus monogenic expression has a central importance for the olfactory coding logic. In fact, expression of ORs is even monoallelic, i.e. restricted to one allele of the OR selected in monogenic expression^3,4,9,10. The molecular path towards monogenic and monoallelic expression is still not well understood, and the relative timing of these processes is not clear.

To reach monogenic and monoallelic expression presents a massive challenge in the case of very large gene families such as those of mouse and rat ORs, which both number well over one thousand intact genes². A striking feature of genomic arrangement of OR genes is the occurrence in several clusters, which contain from a single to over one hundred different OR genes¹¹. For mouse 68 such clusters have been identified, with the largest cluster containing 269 OR genes¹¹. Another large cluster contains all 145 class I OR genes, which show a spatially restricted expression pattern in the olfactory epithelium^12,13,14. These observations have prompted the search for cluster-specific regulatory elements. In a seminal publication, 63 genomic regions containing such elements were identified and named after Greek islands^11,15. Fourty-two class II OR clusters and the single class I OR cluster are associated with these Greek islands, which lie proximal to and sometimes even inside the clusters¹¹. A common feature of Greek islands is the presence of closely adjacent Lhx2 and Ebf1-binding motifs, which are also found individually in promoter regions of individual OR genes^16,17,18,19.

Beyond individual cluster-associated regulatory elements the chromatin structure itself appears to play an essential role in regulating OR expression. OSNs possess a unique nuclear architecture compared to other cell types including the basal cells giving rise to the OSN lineage. In the silent phase before onset of expression OR genes are aggregated in constitutive heterochromatin and are associated with its molecular hallmarks, H3K9me3 and H4K20me3^20,21. Onset of expression is concomitant with selective de-methylation (of H3K9me3), tri-methylation of H3K27 and re-location into expression-competent territory^21,22,23,24. Moreover, of the two alleles of an active OR only one is found in the more plastic facultative heterochromatin^22,23,24, i.e. amenable to expression, whereas the other remains blocked inside the constitutive heterochromatin, resulting in monoallelic expression. This suggests an involvement of chromatin remodelers in regulation of expression of OR genes. Furthermore, the stabilization of monogenic expression appears to require negative feedback from an active OR gene^25,26, which may be mediated by silencing of the activating demethylase LSD1, synonym Kdm1a^27,28. Recent progress in deep sequencing techniques has allowed to obtain high quality single cell transcriptomes (scRNA-Seq), resulting in the surprising observation that monogenic expression of ORs found in mature OSNs is preceded by a multigenic phase in immature OSN^29,30.

It is so far mostly unclear how these OR expression phases align to the developmental stages of OSN differentiation. Furthermore, although the basic stages (stem cell, dividing precursor cell, immature neuron, mature neuron) were known previously^29,30, deep sequencing techniques allow an unbiased ordering of individual cells along pseudotime trajectories according to their entire transcriptome. This enables a more precise and more stringent categorization of developmental stages compared to previous attempts.

Here, we re-analyzed a scRNA-Seq dataset obtained by Fletcher et al.³¹ with the goal to precisely determine the timing of multigenic and monogenic expression during OSN differentiation. A combination of sequence binding motif and time series analysis then identifies novel regulatory components involved in establishing OR gene expression patterns. We ascertain the transcription (co-)factors and chromatin remodelers that are specifically correlated with the onset of multigenic and of monogenic expression (e.g., Fos, Ssbp2, Eed and Zmynd8). Finally, we suggest potential mechanisms for multigenic and monogenic selection.

Results

Re-analysis of a single cell RNA-Seq data set reveals four lineages originating from quiescent globose basal cells

We have searched the GEO and ENA repositories for murine and human single-cell RNA-seq data sets that contain cells from the olfactory neuronal lineage (see STable 1 for a comprehensive list). These datasets were either generated by the SMART-Seq or the 10 × Genomics technology. While the latter typically allows the analysis of many cells, the former provides a higher number of unique mapped sequence reads per cell³². For our purpose, it is most important to detect the sporadic (and often low level) of OR genes with sufficient sensitivity. Therefore, we selected the SMART-seq dataset by Fletcher et al.³¹ for our re-analysis. This dataset contains the most significant number of neuronal lineage cells as our primary source. For validation, we compare our results to the 10 × Genomics dataset richest in OSNs³³.

We re-analyzed a scRNA-Seq dataset obtained from Fletcher et al.³¹. After quality filtering and pre-processing data from Fletcher et al.³¹ (Methods, SFigure 1), 687 cells were included into further analysis and grouped into 13 clusters using Seurat KNN clustering on the top 15 principal components (Methods). Dimension reduction and visualization was performed using principal components analysis (Fig. 1A) and tSNE / UMAP. Using an extensive set of known marker genes, we assigned clusters to cell types of the main olfactory epithelium (MOE) (Fig. 1B, Methods SFigure 2a and STable 2). We detected all cell types described by Fletcher et al.³¹, and additionally we could subdivide the globose basal stem cell cluster into quiescent cells (qGBC) and active cells (GBC). Active GBC were identified by their expression of Ascl1/Mash1^34,35,36,37 and by the presence of cell cycle genes such as Mki67 and Top2a. GBC are known as the adult OSN stem cells responsible for a sustained self-renewal of the OSNs throughout life³⁷.

Further, trajectory inference by Slingshot³⁸ (Methods) revealed a tree with four leaves (Fig. 1A). Slingshot does not provide information on the direction of development but merely a tree topology. We chose qGBC as a root node for pseudotime analysis because qGBC has been reported as a general stem cell population in the MOE following injury³⁷. This results in four lineages starting from qGBC (Fig. 1A):

(1)
The basal stem cells lineage, which connects the quiescent horizontal basal stem cells (HBC0, represented by 90 cells in the data) via two transient populations of horizontal basal stem cells (HBC1, 32 cells, and HBC2, 115 cells) with the qGBC (50 cells).
(2)
The supporting cells lineage ranges from mature sustentacular cells (mSus, 44 cells) to qGBCs, and includes immature sustentacular cells (iSus, 75 cells) and HBC2.
(3)
The microvillous cells lineage contains merely microvillous cells (MV, 36 cells). No transient cell types have been detected in this trajectory.
(4)
The neuronal lineage ends with mature olfactory sensory neurons (mOSN, 32 cells). It spans a range of several stages, namely GBC (38 cells), three intermediate neuronal precursors Early.INP (26 cells), Mid.INP (20 cells), Late.INP (34 cells) and immature olfactory sensory neurons (iOSN, 95 cells).

A previous analysis by Fletcher et al.³¹ did not recognize qGBC as a separate population and set their root node as HBC0, which is a leaf node of the tree, and therefore leads to merely three lineages. Apart from this, their reconstruction is essentially identical to ours. Notably, both reconstructions agree on the neuronal lineage, i.e., the sequence of cell types qGBCs - GBC - Early.INP - Mid.INP - Late.INP - iOSN - mOSN. In the following, we restrict our analysis to the neuronal lineage.

For comparison purposes, we selected the main cell types of the olfactory epithelium from a dataset by Wang et al.³³. We kept more than 35,000 cells annotated as HBC, SUS, MV, GBC, INP, iOSN and OSN (see SFigure 3a). The median number of genes per cell is 2448 (compared to 4164 in Fletcher) and the median number of counts per cell is 5694 (460,561 in Fletcher). Cells from the neuronal lineage (branch 4) were re-clustered (Methods, see SFigure 3b for a UMAP plot of the cells and their annotation), which lead to the cell type assignment used in the following.

OR gene expression is limited to the last three stages of OSN differentiation: Sudden onset of multigenic OR expression in Late.INP is followed by transition to monogenic expression in immature OSN stage

Expectedly, OR expression is essentially unique to the neuronal lineage (Fig. 2 and data not shown). Next we used Slingshot to assign a pseudotime to each cell, thereby providing a linear order of all 295 cells in the neuronal lineage from qGBC to the terminal cell cluster (Methods). Our analysis could detect 157 cells of neuronal lineage that express at least one OR gene at relevant levels, i.e. ≥ 50 normalized counts. 132 of them belonged to the last three stages of OSN differentiation. Most of Late.INP cells (28 of 34), iOSN (76 of 95) and mOSN (28 of 32) express at least one OR. Many cells express more than one OR (multigenic expression), in particular in the Late.INP stage (26 of the 28 cells expressing OR genes, 92.8%). The frequency of multigenic expression drops sharply in later stages, 42% and 32% for iOSN and mOSN, respectively. We found 212 different OR genes that were expressed at least once in a single cell of the neuronal lineage (STable 3). Figure 2A shows the total number of reads that were assigned to OR genes, separately for each cell. While aggregate OR expression levels are almost zero for qGBC/GBC, early and mid INPs, there is a steep onset of OR expression in Late.INP. Then, overall OR expression stays at similarly high levels in iOSN and mOSN (Fig. 2A). A few OSN do not appear to express any OR at a relevant level. More precisely,19 out 95 iOSN cells (20%) and 4 out 32 mOSN (12.5%) have less than 50 normalized OR counts. While it cannot be excluded that this is caused by incomplete annotation of the OR repertoire, or the exclusion of reads due to multiple mapping to closely similar OR genes, it is also possible that some OSN can not solve the challenging task of selecting an OR gene.

It is known that mature OSNs express only a single OR gene^3,4,9,39, after a transient period of multigene expression^29,30. We therefore decided to rank OR genes by expression level, separately in each cell. We then investigated the temporal behavior of the top four ranked OR genes in each cell. These genes account for 99% of all reads (413,388 out of 416,033)mapped to OR genes in cells from the neuronal lineage. The top-ranked gene of each stage will be referred to as ‘winner’ and the others as the ‘runners-up’. While the abundance of all runners-up drops sharply after Late.INP, the winner does not drop and in fact absolute levels keep increasing several fold until the mOSN stage (Fig. 2C,D). As a consequence the distance between winner and runners-up increases considerably in the iOSN stage and even more so in the mature neurons (mOSN). Since we observe each cell only once, we cannot be sure that the winner gene observed in one cell at a certain stage will be the highest expressed OR gene when that cell matures. However, this is the most plausible explanation, since a rank switch between the winner and a runner-up before/during the iOSN stage would require a coordinated switch of expression between these two specific OR genes, from high to almost zero and vice versa, an unlikely scenario.

Taken together, the pseudotime analysis of neuronal lineage cells suggests three main phases for OR expression (Fig. 2C,D):

(1)
The silent phase exhibits virtually no OR expression, which is the case for the four early stages of neuronal lineage (qGBC, GBC, Early.INP and Mid.INP). In the present data, the silent phase is represented by 134 cells.
(2)
The onset of OR expression (multigenic phase) is characterized by the simultaneous expression of several OR genes per cell at relatively similar levels; this phase is represented by 34 cells and contains specifically the Late.INP stage and the beginning of the iOSN stage.
(3)
Finally, the monogenic phase includes the end of the iOSN stage and the mOSN stage, where each cell expresses essentially one functional OR allele (henceforth called the “winner”, Fig. 2C) at a very high level while the remaining ORs (the “runners-up”, Fig. 2D) show no or very low expression. Our data contains 127 cells in this phase.

Our analysis reveals that Late.INP is a crucial stage in stochastic selection.

We performed the same analysis for the Wang data. It turns out that the coverage per cell is not enough to reveal the characteristic expression time course of the runners-up during the multigenic phase (SFigure 4a), which is the reason why we chose Smart-seq data as a primary source.

OR gene expression during multigenic phase shows no sign of OR cluster-specific activation

Next, in an attempt to infer the mechanism of activation in the multigenic stage and transition to monogenic stage, we analyzed the joint OR gene expression per cell (Fig. 2B, STable 4). We found that each of the cells expressing more than one OR gene in the Late.INP stage had at least two active OR clusters (i.e., clusters with an OR gene expressed at a level of at least 50 counts). The number of active clusters reached up to 9 for some cells. We performed a permutation test to assess whether OR genes that are jointly active in one cell have the tendency to be located in the same cluster (see SCode 1 and the description therein). We calculated, across all cells, the average frequency of clusters with more than one active OR gene. The null hypothesis we challenge is that this number is due to chance. We calculate the probability of observing this or a higher number if one randomly shuffles the genes × cells matrix such that the marginal frequency of active genes per cell as well as the marginal frequency of cells expressing a given gene stays constant. The results however show that the average number of clusters with more than 1 active OR gene is consistent with the null model of random cluster allocation of OR genes (SFigure 5, p = 0.112). Moreover, we found that the top two active OR clusters (ranked by expression level) always belonged to different chromosomes in each cell of the Late.INP stage (Fig. 2B). Thus the onset of OR gene expression in the multigenic stage cannot be caused by activation of an individual chromosome or a particular OR cluster. Conversely, the transition to the monogenic stage could be partially caused by the restriction of expression to a single chromosome and cluster. However, this may not be the only selection mechanism involved, as some cells express up to 3–4 different OR genes simultaneously within a single cluster. Specifically, for the 34 Late.INP cells we found 20 cells which expressed at least 2 OR genes from the same cluster (expression defined as ≥ 50 normalized counts). Thus, additional steps are required to restrict expression to a single gene within a cluster.

Motif search in Greek island enhancers identifies novel transcription factors

Our next goal was to identify DNA-binding proteins potentially involved in the onset of multiple OR gene expression or the transition to monogenic expression. Such candidates should have a characteristic temporal gene expression pattern along the neuronal lineage. We thus require such candidates in our scRNA-seq dataset to be expressed at detectable levels. A factor is considered detectable if it has at least 35 counts in at least 15 out of 295 cells in the neuronal lineage, leaving us with 1358 (co)TFs. A characteristic expression pattern, however, might merely be the consequence and not the cause of the OR selection and differentiation process. Therefore, to make our search more specific for factors causally involved in OR selection, we confine our search to factors whose binding motif is enriched in enhancer regions of OR gene clusters.

Previously, 63 intergenic enhancer regions, termed Greek islands, have been identified inside or near OR clusters using DNase I hypersensitivity-sequencing and chromatin immunoprecipitation sequencing¹⁵ and ATAC-seq¹¹. SFigure 6 shows the co-localization of the Greek islands and the OR clusters on a map of the murine genome. Chromatin conformation capture experiments have revealed that Greek islands extensively contact OR clusters, remarkably both in cis and trans^40,41.

We performed a de novo motif search on all Greek island enhancer regions as annotated by¹¹ using MEME^42,43 (Methods). Ungapped motif analysis of Greek islands identified one known motif, TYCCYWKGGGVCTHATTARM (reported in Monahan et al.¹¹), and two novel motifs GVDHCYYCAGRGAV and TBYTCHTCTCYMCAGDGWBNY, with E-value 1.7e⁻⁰⁵⁷, 3.7e⁻⁰²⁷ and 4.1e⁻⁰⁰⁸, respectively. Almost all Greek islands contain each of these motifs exactly once, except 8 Greek islands which are missing the third motif. TOMTOM was employed to align these motifs with known transcription factor motifs from the JASPAR database⁴⁴ (Methods). TOMTOM did not predict any significant TF binding for the two novel motifs, therefore we do not discuss them further. We found 65 significant target binding sites for transcription factors inside the first motif (see STable 5). From those, 9 TFs were expressed at a detectable level.

The most significant motif, TYCCYWKGGGVCTHATTARM is composed of two adjacent submotifs, which are overlapped by a third submotif (Fig. 3A). The first submotif is targeted by the COE1 DNA-binding domain which is found exclusively in the Ebf transcription factor family (Ebf1-4). The second submotif is bound by homeodomain TFs such as Lhx2, Emx2 and Uncx (Fig. 3A). These results are consistent with previously reported Ebf1 and Lhx2 motifs to be positioned next to each other in most Greek islands¹¹. Furthermore, the second submotif is expected to interact with transcription factors from three other families, the homeobox domain TF family, the Pou TF family (Pou6f1, this family has a strong enrichment in OR genes) and the ARID (AT-Rich Interaction Domain) domain TF family (Arid3a).

Noteworthy, our analysis predicted a third submotif, which is a possible binding site for Fos, Fosb and Fosl2. Fos and Fosb are well-known early response transcription factors, which in turn regulate a broad variety of other transcription factors thus regulating many physiological processes^45,46,47. This Fos-binding motif overlaps with the end of the first and the beginning of the second submotif, suggesting a cooperative/inhibitory interaction of the respective binding factors. This possibility will be investigated further in the context of the pseudotime analysis.

Complementary to the de novo motif analysis, we did a forward motif search with AME⁴⁸ looking for known TF binding sites enriched in the 63 Greek Islands (Methods). This returned a total of 120 TFs (STable 6), of which 10 had a detectable expression (at least 35 count per cell in 15/295 cells of neuronal lineage) in our scRNA-seq dataset (SFigure 7). Six of those are also detected in the de novo motif search (see above), the additional TFs are Pax6, Dlx5, Pou2f1 and Arid3b (Fig. 3). Dlx5 is part of the same TF family as Lhx2, which was found in the de novo motif search. Pax6 and Pou2f1 have a homeobox domain, whereas Arid3b is part of the same TF family as Arid3a.

Taken together we describe 13 TFs that are found at detectable levels and predicted to bind to Greek islands by de novo and/or forward motif search (SFigure 7). Next we determined the pseudotime profiles of these TFs and found clear and distinct temporal expression patterns, providing additional evidence for their active involvement in the regulation of OR expression.

Pseudotime analysis suggests transcription factors involved in OR expression

The sorting of cells according to pseudotime (Methods) generates, for each gene, a time course of its expression (see above). Notably, all TFs found by motif search in the previous paragraph show a pronounced temporal expression pattern, which belongs to one of three groups (Fig. 3B and SFigure 7): The first group is active early in the silent phase, but strongly downregulated in late silent phase to reach a minimum in the multigenic phase (Fos, Fosb, Fosl2, Pax6 and Pou2f1). Some, but not all, are upregulated again in the monogenic phase (Fos, Fosb).The second group peaks within the multigenic phase (Ebf1, Lhx2, Emx2, Uncx and Dlx5). The third group is specifically upregulated during the monogenic phase (Arid3a and Pou6f1). Hereafter we will refer to these group definitions.

The fact that all TFs with a known Greek island binding site show a clear temporal pattern prompted us to perform a systematic search for TFs that change their expression upon transition between the three phases of OR expression. We also include co-factors in this analysis, because co-factors such as LDB1 have been found to be selectively associated with Greek islands and were suggested to initiate OR expression⁴⁰. We searched for (co-)TFs with a significant differential gene expression of at least twofold between silent vs. multigenic phase or between multigenic vs. monogenic phase (Methods), resulting in 83 (34 going up, 49 down) respectively 39 (8 up, 31 down) relevant hits (Fig. 4A, see STable 7 and STable 8).

Several of these differentially expressed factors were also identified by motif analysis (e.g., Ebf1, Lhx2, Dlx5, Fosb and Fos), but many are not. We manually inspected the pseudotime patterns of all differentially expressed (co-)TFs, and for detailed discussion, we selected those factors whose pseudotime expression pattern falls clearly into one of the three groups described above. We limit ourselves in the following to discussion of novel (co-)factors with additional evidence from motif analysis or a previous link to OR expression^11,18,19,40. All other factors with a characteristic expression time course are shown in SFigure 8A.

For group 1 (active early in silent phase, downregulated in late silent phase and minimum in the multigenic phase) the differential expression search newly identified Egr1, whose expression resembles that of Fos and Fosb, which were already identified by motif analysis (see above). Therefore we searched explicitly for the known Egr1 binding motif (Methods) in Greek islands, which could be identified in 16 of 63 Greek islands. Fos, Fosb and Egr1 are immediate early genes, which are rapidly upregulated in response to external stimuli, immune response, and cellular stress⁴⁷. Egr1, Fos and Fosb are specifically downregulated during the multigenic phase (Fig. 4B and SFigure 8A). This suggests combinatorial interactions with the other components that regulate OR expression and will be discussed later.

The second group peaks specifically within the multigenic phase and 6 factors have been identified by motif analysis (see above). Differential expression analysis further obtains the cofactor Ssbp2 (Fig. 4B) and three factors, Cebpg, Rcor1 and Ldb1, which have been reported previously to be involved in OR expression^11,40,49,50. Ssbp2 binds to Ldb1 and thereby prevents Ldb1 from degradation^51,52. While Ldb1 and Lhx2 were shown to bind to Greek island enhancers to regulate OR expression in trans⁴⁰, Ssbp2 is a novel candidate with such a function.

The third group is specifically upregulated during the monogenic phase and two factors from this group have been identified by motif analysis. Differential analysis identifies additionally the TFs Mef2b, Rfx3 and Sub1, also known as PC4 (Fig. 4B and SFigure 8A). Among these factors, only Rfx3 has a known motif, for which we performed a strict motif search in Greek islands (Methods). We report that 56 out of 63 Greek islands contain the binding motif for Rfx3. Moreover, note that Mef2a, which shares a similar SRF binding domain with Mef2b⁵³, is found to be strongly bound to OR promoters¹⁹.

The expression time courses of the transcription factors shown in Fig. 4B were found to be in good agreement with the Wang data (SFigure 4b).

Taken together, our pseudotime analysis recovers a large proportion of candidate (co-)TFs identified by motif analysis—both for initiating onset of OR expression, and for the transition to monogenic stage. Moreover it extends the range of candidates whose time course correlates with these two transitions, and consequently the regulatory repertoire for these transitions.

Changes in chromatin remodeler expression accompany both transitions in the OR selection process

It is known that chromatin changes accompany the selection of OR genes^20,21,23. We therefore searched our data for chromatin remodelers that show expression changes during OR selection (Methods, Excel files 5,6). We confirmed previous observations that the chromatin remodelers Lbr and Cbx5 (SFigure 8B) are expressed at earlier stages and are downregulated in the course of OSN differentiation^15,21. Furthermore, we discovered novel candidates for silencing OR genes, for onset of (multigenic) expression, and for transition to monogenic expression (Fig. 4C):

Among the genes whose expression profiles fall into group 1 (minimum in multigenic phase), we found Eed, one of the constitutive subunits of the polycomb repressive complex 2, PRC2 (Fig. 4C). Eed is required to maintain repressive H3K27me3 marks^54,55 and its downregulation may lead to de-repression of OR expression in Late.INP stage. Note that another PRC2 subunit, Ezh2 is expressed during the silent phase as well, but decreases later, at the transition to monogenic phase (SFigure 9). Nsd1 is a histone methyltransferase that demethylates H3K36me2⁵⁶ (Fig. 4C). All remodelers found with a group 1 pseudotime profile (Hells, H2afz and Set) are predicted to play a repressive role in the silent phase of OR selection (we only show H2afz as example SFigure 8B).

For group 2 (peak in multigenic phase), we found prominent chromatin remodelers such as Zmynd8, Ell3, Sertad2, Med12l and Scai (Fig. 4C, SFigure 8B). We also investigated the expression profile of Kdm1a which was known before as a regulator of OR expression. Kdm1a alias LSD1 is a Lysine demethylase and functions both as a coactivator by demethylation of mono- or di-methylated H3K9 and as a corepressor through demethylation of mono- or di-methylated H3K4^57,58,59,60. There have been contradictory reports on the function of Kdm1a in OR expression as an activator²⁷ or repressor of transcription⁵⁰. The present data sheds light on this debate: While Kdm1a expression sharply peaks directly before the multigenic phase (arguing for its role as activator), it can be part of the Co-REST repressor complex⁶¹. Two components of the Co-REST repressor complex, Rcor1 and Hdac2, sharply peak during multigenic phase (Fig. 5C, SFigure 10A), arguing for a change of function of Kdm1a by recruitment to the Co-REST complex at the transition to monogenic phase^57,62,63.

Of the four novel remodelers with a group 2 pseudotime profile, Zmynd8 and Ell3 are highly differentially expressed (Fig. 4C). Zmynd8, a chromatin reader, is a particularly appealing candidate, since it is also known to play a role in the selective expression of the immunoglobulin heavy chain (Igh) regions in immune cells (B cells). Its product ZMYND8 binds Igh super-enhancers known as 3’ regulatory region (3’RR). ZMYND8 thereby controls the 3′RR activity by modulating the enhancer transcriptional status⁶⁴. Consistent with an activating role during the multigenic phase, Ell3 does not only bind to active enhancers, but also marks the enhancers that are in a poised or inactive state in ES cells⁶⁵.

Remodelers whose pseudotime profiles fall into group 3 (specifically upregulated during the monogenic phase) are Cbx4 (Chromobox 4) and Jarid2 (Fig. 4C). Cbx4 is a component of a Polycomb group (PcG) multiprotein PRC1-like complex, which is required to maintain the transcriptionally repressive state of many genes^66,67. Jarid2 (Jumonji and AT-Rich Interaction Domain Containing 2) is required to repress expression of cyclin-D1 (CCND1) in cardiac cells by setting H3K9 methylation marks⁶⁸, and it is upregulated upon transition from Late.INP to iOSN (SFigure 2b), i.e. upon exit from the cell cycle.

We investigated the expression of the chromatin remodelers shown in Fig. 4C in the Wang data (SFigure 4b) and we found a good agreement with our time courses.

So far, we identified several chromatin remodelers that add to the regulatory repertoire for the onset of OR expression, and for the transition to monogenic stage. Another important feature which requires chromatin remodeling is the monoallelic expression of ORs in mature OSNs. This feature appears to be established from the very beginning of OR expression^4,23. Factors involved in generating allelic exclusion therefore would be expected to peak at least as early as factors regulating the onset of (multigenic) expression (group 2 factors). We found two remodeling factors with a very early onset within group 2 which could potentially play such a role: Smchd1 (structural maintenance of chromosomes flexible hinge domain-containing protein 1) and Cdyl2 (chromodomain Y-like protein 2) (see SFigure 11).

Based on the interactions and temporal coordination of remodelers and (co-)TFs, we generate and discuss some hypotheses about OR selection below.

Discussion

The monogenic expression of ORs presents a big challenge for the olfactory system, since it requires a random selection of exactly one OR per cell from the large family of OR genes. OR genes are known to be aggregated in silenced chromatin clusters during the silent phase, before the selection is made²¹. The identification of a multigenic state in early immature OSN^29,30 by scRNA-seq made clear that the initial escape from silencing is not limited to a single OR gene per cell. In a single cell, we observed up to nine different OR clusters and more than a dozen different OR genes concomitantly active. Both transcription factors and chromatin remodelers have been identified as regulators of OR expression.

Here we have employed pseudotime analysis of a single cell transcriptome data set³¹ focusing on OR expression. We aligned the three main phases of OR expression, silent, multigenic and monogenic^29,30, with the stages of OSN differentiation as defined in³¹. By analysis of winner vs. runners-up OR expression, we could precisely assign the onset of multigenic phase to Late.INP, and the onset of monogenic selection to iOSN. Most cells in the multigenic phase express ORs from more than one cluster and more than one chromosome (Figs. 2B, 5A). We tested whether ORs co-expressed in one cell tend to lie in identical OR clusters and did not find evidence for that (SCode 1, SFigure 5). We conclude that the selective expression of OR genes during multigenic phase is not caused by escape from compaction of merely one OR cluster. In the monogenic phase, the vast majority of cells express only a single OR gene (Fig. 5A).

It is worth noting that using data with a high coverage per cell (Smart-seq) was instrumental to detecting the multigenic phase, as it was not detectable in data with lower coverage (10 × Genomics). Moreover, the 10 × Genomics technology primarily queries the 3’end of a gene, while Smart-seq captures reads from the whole transcript region. OR genes have a high sequence similarity, which leads to exclusion of many 3’end reads that map to multiple loci. This may be another reason for the disproportionally low count numbers for OR genes in 10 × Genomics data.

We then identified candidate factors (TFs and cofactors including chromatin remodelers) differentially expressed between these stages and thus potentially involved in the transitions silent-to-multigenic (40 TFs, 43 cofactors) and multigenic-to-monogenic (19 TFs, 20 cofactors) (Methods, Excel files 5,6). Many of these differentially expressed factors are likely not directly involved in OR selection, since cells undergo many substantial changes along OSN differentiation. Thus, we performed an independent de novo motif analysis of Greek islands, which should enrich TFs involved in OR selection. This search revealed one motif that could be decomposed into three consecutive submotifs, only two of which were described previously¹¹. We additionally recognized a central Fos binding motif overlapping the previously described two elements (Fig. 3A). All but one of the factors that bind to these motifs show characteristic pseudotime expression time courses, which could be classified in three groups (Fig. 5B). The temporal coordination of these factors together with the precise location of their respective binding sites enables us to generate hypotheses about their molecular interactions and possible functional consequences. We provide extensive evidence from the literature to provide additional support for our hypotheses. Of course, this cannot replace pending experimental verification.

In the silent phase the transcriptional regulator Fos is binding to the central motif and may competitively prevent binding of the known activators of OR expression, Lhx2 and Ebf1¹¹, which bind to the left and right submotifs (Fig. 3A, top row of Fig. 5C). The strong downregulation of Fos expression in the multigenic phase then would allow Lhx2 and Ebf1 to access their binding motifs, and recruit their known binding partner Ldb1 to Greek islands⁴⁰. In situ Hi-C experimental data indicates that Ldb1 mediates trans interactions between different Greek islands, creating super-enhancer hubs that include neighboring OR clusters⁴⁰. Our pseudotime analysis additionally predicted the co-factor Ssbp2 to play a role in this process. Ssbp2 is in other contexts known to bind to Ldb1 and thereby prevents its degradation by the proteasome^51,52. Thus elevated expression of Ssbp2 in the multigenic phase would amplify the effect of Ldb1.

All four components of the super-enhancer-forming complex (Lhx2, Ebf1, Ldb1 and Ssbp2) peak during the multigenic phase (Fig. 4A,B, Fig. 5B and SFigure 8A), and thus are anti-correlated to Fos (rs:−0.58, −0591, −0.264 and −0.258, respectively.). In the monogenic phase the persistent expression of Ssbp2 would maintain Ldb1 protein levels^51,52. Thus, the lower levels of Ldb1 transcript in monogenic phase presumably are still sufficient to keep the expression of the selected OR at high levels, the relative high activity of Fos in the monogenic phase notwithstanding. However, the relatively high Fos levels in the monogenic phase, together with lower levels of Lhx2 and Ldb1, may counteract formation of additional super-enhancer complexes thus ensuring stably monogenic expression—in line with the competitive interaction hypothesis outlined above (first row of Fig. 5C).

Our results found no OR expression during the Mid.INP stage despite significant expression of Lhx2, Ebf1 and Kdm1a (known activators of OR genes). We note that all components of the polycomb repressive complex 2 (PRC2), Eed, Ezh2 and Suz12, are active in Mid.INP, which could explain the absence of OR gene expression in Mid.INP despite the presence of the activators (second row of Fig. 5C). Moreover, an essential subunit of PRC2, Eed, is significantly reduced during onset of OR expression in Late.INP (Fig. 4 and SFigure 9). This elimination of Eed is sufficient to disassemble the PRC2 complex, which then can no longer maintain the repressive H3K27me3 mark^54,55. Furthermore, we showed a dramatic reduction in expression of Ezh2 and Suz12 subunits of PRC2 along OSN differentiation (second row of Fig. 5C and SFigure 9). We conclude that PRC2 activity may be involved in repression of OR expression in Mid.INP. The disassembly of PRC2 in Late.INP then enables the Greek Island hubs to transiently activate the cis-corresponding OR gene/s, which enables the expression of multiple OR genes in most of Late.INP stage cells at the same time with relatively low levels compared to later stages of OSN differentiation (monogenic phase).

Heterochromatic silencing of ORs throughout OSN differentiation is enforced by the (interchromosomal) convergence of OR loci to OSN-specific, highly compacted nuclear bodies²¹. It has been shown that in the monogenic phase individual active OR genes require de-silencing by lysine demethylase Kdm1a²⁷ and spatial segregation of the single chosen OR allele towards euchromatic nuclear territories²¹. Another study however showed that the deletion of Kdm1a leads to persistent multigenic expression, suggesting a silencing role of Kdm1a rather than activating one⁵⁰. Here, pseudotime analysis sheds light on the seemingly contradictory role of Kdm1a (third row of Fig. 5C):

The CBX5 protein is responsible for silencing of OR genes during the silent phase by binding to and thereby protecting the repressive H3K9me3 mark in gene bodies²¹. It vanishes upon transition to multigenic phase (SFigure 8B). After H3K9me3 has lost one methyl group (e.g., through the action of Kdm4a), Kdm1a can demethylate H3K9me2. Kdm1a peaks at Mid.INP stage and acts on di-methylated lysines only⁵⁷. Thus the action of Kdm1a on H3K9me3 leads to activation in Mid.INP (Fig. 5C).

In contrast, another methylation site, on H3K4 is a mark of an active promoter/enhancer in the methylated stage (trimethylated for promoter, monomethylated for enhancer). Kdm5a peaks during multigenic phase (Fig. 5C, SFigure 10A) and can demethylate tri- or di-methylated H3K4 to its monomethylated form. Kdm1a by itself cannot act on H3K4me, but in a complex with Rcor1 and Hdac2 (CoRest complex) it is able to demethylate H3K4me2/1^57,62,63(Fig. 5C). Rcor1 and Hdac2 have their peak expression during Late.INP stage, i.e. shortly after onset of the Kdm1a peak (Fig. 5B, SFigure 10A). Thus, in Late.INP but not in Mid.INP, Kdm1a can demethylate H3K4me2/1, resulting in repression. This amounts to the multigenic phase (Late.INP) beginning to build up the molecular machinery to downregulate all but one of expressed ORs.

Taken together, the same enzymatic activity of the same factor (demethylation by Kdm1a) results in opposing effects on transcription due to co-factors Rcor1 and Hdac2 modulating substrate specificity of Kdm1a. Moreover our pseudotime analysis supports the hypothesis that OR expression issues a negative feedback signal on Kdm1a which is mediated by Atf5 and Adcy3 during transition from multigenic to monogenic phase^25,27,28 (see SFigure 10B).

During the monogenic phase, a super-enhancer is formed by the trans interaction between multiple Greek islands mediated by Lhx2-Ebf1-Ldb1-Ssbp2. Among the OR genes associated with this super-enhancer, merely one OR is expressed at very high levels⁴⁰. During the multigenic phase, H3K4me3 marks of Greek islands²⁰ are converted to H3K4me, e.g. by Kdm5a (Fig. 5C). The latter histone mark can be recognized by the chromatin reader Zmynd8⁶⁹. Zmynd8 has been shown to participate in another, highly specific selection process, namely the expression of Igh genes⁶⁴. There, it recognizes H3K4me and represses super-enhancer activity in B cells. We therefore speculate that it might play a similar role in OR expression.

Expression of Zmynd8 peaks simultaneously with expression of the super-enhancer forming complex Ebf1-Lhx2-Ldb1-Ssbp2 during multigenic phase (Fig. 4C for Zmynd8 and Ssbp2, Fig. 3 for Ebf1 and Lhx2, SFigure 8A for Ldb1). Upon transition to the monogenic phase, Zmynd8 expression ceases while Ebf1 and Sspb2 maintain intermediate expression. The disappearance of the ZMYND8 protein might abolish the suppression of the super-enhancer and could allow very high expression of the selected OR via the Lhx2-Ebf1-Ldb1-Ssbp2 complex (Fig. 5C bottom row).

So far it is unknown whether the monoallelic expression characteristic for mature OSNs is preceded by a biallelic phase. The dataset evaluated here does not allow us to analyze this question directly, because it does not exhibit enough sequence diversity between alleles to distinguish them. We searched for epigenetic factors known to be involved in allelic selection in other contexts, which show significant and relevant expression changes during OR differentiation. We found two factors, Smchd1 and Cdyl2 (SFigure 11), which were discussed as stabilizing monoallelic expression. They play a role in epigenetic silencing, spermatogenesis, random inactivation of X chromosome, and stabilize monoallelic expression^70,71,72. Both factors peak sharply in mid to late INP. Assuming these two factors initiate monoallelic expression, this would place monoallelic selection before or concomitant with the onset of (multigenic) expression in Late.INP, in other words, there might be no biallelic stage at all.

The dissection of the OSN maturation process into different stages allowed us to reveal three phases of OR gene selection. This in turn enabled an in-depth analysis of pseudotime expression profiles⁷³, leading to several promising candidates and to testable hypotheses on the mechanisms involved in OR gene selection. However, the provision of additional, independent experimental evidence is beyond the scope of the present work, and will form the basis of future studies. For example, it will be interesting to complement the current data with single-cell chromatin accessibility data (single-cell ATAC-seq) or single-cell chromatin conformation data (Hi-C). Those experiments could also be carried out in the context of conditional knockouts of the factors identified above. Experiments should be carried out in hybrid crosses to additionally monitor allelic expression. Our study has narrowed down the cell stages and the time window that need to be analyzed for these purposes, thereby enhancing future research on this topic.

Methods

Most statistical analyses and visualization were done in RStudio using R version 3.6.3.

Data processing and quality control

Our analysis of OR expression patterns during OSN differentiation is based on a scRNA-seq dataset generated by (Fletcher et al.³¹, GEO: GSE95601 “GSE95601_oeHBCdiff_Cufflinks_eSet.Rda.gz” file). Ngai group investigated the homeostatic differentiation in the postnatal olfactory epithelium. Horizontal basal stem cells (HBC) were released from quiescence by a conditional knockout of the Trp63 transcription factor. Briefly, cells were FACS (fluorescence-activated cell sorting) selected for Sox2-EGFP-pos-itive, ICAM1-negative, SCARB1/F3-negative expression to enrich for the cell population of interest (GBCs, later neuronal intermediates, and microvillous cells over sustentacular cells). Then scRNA-seq was done using the Fluidigm C1 microfluidics cell capture platform followed by Illumina multiplex sequencing. Processing of the raw data was done in Fletcher et al.³¹ by RefSeq transcript annotations, which were used to align reads to the GRCm38.3 mouse genome with Tophat2, followed by Trimmomatic, featureCounts, and then Cufflinks. This resulted in 849 cells with 47,083 transcripts (before read and cell quality control). Cells were filtered according to Fletcher et al.³¹ o remove contaminants, doublets and other technical artifacts, which resulted in 687 cells. Our transcript filtering slightly differs from Fletcher et al.³¹: We included all genes that have more than 40 counts in at least one cell to ensure retrieval of OR genes with sporadic expression (keeping 18,558 transcripts, among them 222 OR transcripts from an initial set of 1654 quantified OR transcripts). The filtering can be reproduced using an R script (Supplementary file ’modified_filtering.R’) which is modified from (https://github.com/rufletch/p63-HBC-diff). Count normalization and further transcript filtering was performed by SCTransform⁷² with default parameters as implemented in Seurat (version 3.1.4)^74,75,76. This resulted in 687 cells with a mean library size of 460 k unique reads and a median number of 4164 genes per cell and 17,179 quantified transcripts including 222 OR genes, then followed the LogNormalization seurat workflow. The library size distribution of these cells is shown in (SFigure 1).

Dimension reduction, clustering and cell type assignment

We followed the Seurat clustering workflow. First, dimension reduction was done using Principal Component Analysis (PCA). The number of principal components kept was set to 15, after assessing the goodness of approximation by JackStraw and ElbowPlot functions. A shared k-nearest neighbor graph was built by the FindNeighbors function. Afterwards, the Louvain algorithm was applied to define 13 distinct clusters from the shared nearest neighbor graph using the FindClusters function and the Jaccard index as a similarity measure. This number matches the number of clusters identified in (Fletcher et al.³¹) for this data. The expression of cell type marker genes that were collected from the literature (STable 2) served to assign cell clusters manually to known cell types according to the Seurat guidelines⁷⁴. At least two expressed markers were required to confidently annotate a specific cell type. Visualization of the data was performed by PCA, UMAP⁷⁷ and tSNE⁷⁸.

Trajectory inference and pseudotime assignment

Slingshot³⁸ was used to construct a minimum spanning tree (MST) based on the top 10-dimensional representation of the cells obtained above. The topology of the MST is independent of the root choice. For biological reasons, we selected qGBC as the root for the assignment of pseudo time, where it can differentiate to any cell type of MOE³⁷. For each cell, a pseudotime between 0 (cells at the root node) and 1 (leaf node cells) was assigned by the slingPseudotime function.

Differential expression analysis

For each cell stage, we identified marker genes showing differential expression compared to all other cell stages using FindAllMarkers in Seurat, using the Wilcoxon rank sum test. Supplemental Fig. 3 shows a heatmap of the top 10 differentially expressed genes (i.e., putative markers) for each cell stage of MOE.

Among 2712 (co-)TFs (include chromatin remodelers) obtained from the GO.db package and AnimalTFDB3.0 (http://bioinfo.life.hust.edu.cn/AnimalTFDB/#!/), we found 2004 (co-)TFs expressed (at least one count in one cell) in the neuronal lineage. In later differential expression analysis we compared the expression profiles of cell stages that were placed consecutively along the neuronal trajectory (i.e., the maturation of OSN) to identify genes that change their expression upon transition between cell stages (SFigure 2b). Volcano plots for each cell type transition were generated by a slightly modified EnhancedVolcano function (https://github.com/kevinblighe/EnhancedVolcano).

Differential expression analysis for silent to multigenic and multigenic to monogenic phase transitions were performed by comparison of pre-Late.INP cells vs. Late.INP cells and Late.INP vs. post-Late.INP cells, respectively. Genes with a Bonferroni adjusted p-value < 0.05 (Wilcoxon rank sum test, FindMarkers function in Seurat) and an average absolute FC ≥ 2 were considered differentially expressed. This yielded 83 respectively 39 differentially expressed (co-)TFs for the two transitions.

Motif analysis

The genomic ranges of 68 OR clusters and 63 Greek islands were compiled from Monahan et al.¹¹, which allows the matching of OR clusters and Greek islands. UCSC genome browser tools were used to find all genes inside OR clusters. We performed a motif search on the 63 Greek island sequences (STable 9 and FASTA file) and the approximate promoter regions (500 bp upstream the transcription starting sites) of all OR genes were obtained by using the "ucsc-twobittofa" bioconda package and the "biomart" R package respectively. The MEME suite web server for motif search and analysis^42,43 was used to predict the transcription factors (TF) that bind to Greek islands. We applied MEME using default values for all parameters to find the novel, ungapped motifs inside Greek islands with the following command: meme greek_islands.fa -dna -oc . -nostatus -time 14,400 -mod zoops -nmotifs 3 -minw 6 -maxw 50 -objfun classic -revcomp -markov_order 0.

Then we performed motif comparison between each motif found in the above-mentioned analysis against a database of known TFs motifs (JASPAR2018_CORE_non-redundant and uniprobe_mouse databases) using Tomtom tool⁴⁴. The Pearson correlation coefficient was used to measure the similarity between columns of position weight matrices (PWMs) and we restricted the results by setting q-value < = 0.1 (rather than 0.5 by default) as a threshold (10% FDR) using the following command: tomtom -no-ssc -oc . -verbosity 1 -min-overlap 5 -mi 1 -dist pearson -thresh 0.1 -time 300 query_motifs db/MOUSE/uniprobe_mouse.meme db/JASPAR/JASPAR2018_CORE_non-redundant.meme.

We also investigated the enrichment motifs in 63 Greek islands sequences using AME tool⁴⁸ by using an average odds score method and Fisher’s exact test as a motif enrichment test through the following command: ame –verbose 1 –oc . –scoring avg –method fisher –hit-lo-fraction 0.25 –evalue-report-threshold 1.0 –control –shuffle– –kmer 2 greek_islands.fa db/MOUSE/uniprobe_mouse.meme db/JASPAR/JASPAR2018_CORE_non-redundant.meme.

Finally, a strict motif search in Greek islands for selected TFs was done by "ucsc-findmotif" bioconda package, allowing for 3 mismatches.

Visualization of time series

Grouped time series⁷⁹ was used to visualize pseudotime series of individual genes and to calculate and visualize aggregated groups of genes, e.g. all OR genes. Since the original expression count matrix is sparse (75.45% zero count entries), we first applied ALRA80, which has specifically been designed for the imputation of missing values in scRNA-Seq data. The imputed expression matrix retrieved ∼2403 missing values, reducing the fraction of zero count entries to 61.50%. The median number of expressed genes per cell was 6715 (see SFigure 1b). Note that the imputed expression matrix was used only for visualization, for all analysis steps we used normalized counts without data imputation.

Data availability

The single cell data has been generated by³¹ and is available at GEO under the accession number GSE95601 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE95601]. The processed data and all additional data used in our analysis are available as Supplemental Materials.

References

Korsching, S. I. Olfaction. in The Physiology of Fishes 256, Chapter 14 (CRC Press, 2020).
Niimura, Y. Olfactory receptor multigene family in vertebrates: from the viewpoint of evolutionary genomics. Curr. Genomics 13, 103–114 (2012).
Article CAS PubMed PubMed Central Google Scholar
Buck, L. & Axel, R. A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell 65, 175–187 (1991).
Article CAS PubMed Google Scholar
Chess, A., Simon, I., Cedar, H. & Axel, R. Allelic inactivation regulates olfactory receptor gene expression. Cell 78, 823–834 (1994).
Article CAS PubMed Google Scholar
Mombaerts, P. et al. Visualizing an Olfactory Sensory Map. Cell 87, 675–686 (1996).
Article CAS PubMed Google Scholar
Ressler, K. J., Sullivan, S. L. & Buck, L. B. Information coding in the olfactory system: Evidence for a stereotyped and highly organized epitope map in the olfactory bulb. Cell 79, 1245–1255 (1994).
Article CAS PubMed Google Scholar
Buck, L. B. Information coding in the vertebrate olfactory system. Annu. Rev. Neurosci. 19, 517–544 (1996).
Article CAS PubMed Google Scholar
Feinstein, P. & Mombaerts, P. A contextual model for axonal sorting into glomeruli in the mouse olfactory system. Cell 117, 817–831 (2004).
Article CAS PubMed Google Scholar
Serizawa, S., Miyamichi, K. & Sakano, H. One neuron–one receptor rule in the mouse olfactory system. Trends Genet. 20, 648–653 (2004).
Article CAS PubMed Google Scholar
Mombaerts, P. Odorant receptor gene choice in olfactory sensory neurons: the one receptor–one neuron hypothesis revisited. Curr. Opin. Neurobiol. 14, 31–36 (2004).
Article CAS PubMed Google Scholar
Monahan, K. et al. Cooperative interactions enable singular olfactory receptor expression in mouse olfactory neurons. eLife 6, e28620 (2017).
Article PubMed PubMed Central Google Scholar
Zhang, X. et al. High-throughput microarray detection of olfactory receptor gene expression in the mouse. Proc. Natl. Acad. Sci. 101, 14168–14173 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Miyamichi, K., Serizawa, S., Kimura, H. M. & Sakano, H. Continuous and overlapping expression domains of odorant receptor genes in the olfactory epithelium determine the dorsal/ventral positioning of glomeruli in the olfactory bulb. J. Neurosci. 25, 3586–3592 (2005).
Article CAS PubMed PubMed Central Google Scholar
Tsuboi, A., Miyazaki, T., Imai, T. & Sakano, H. Olfactory sensory neurons expressing class I odorant receptors converge their axons on an antero-dorsal domain of the olfactory bulb in the mouse. Eur. J. Neurosci. 23, 1436–1444 (2006).
Article PubMed Google Scholar
Markenscoff-Papadimitriou, E. et al. Enhancer interaction networks as a means for singular olfactory receptor expression. Cell 159, 543–557 (2014).
Article CAS PubMed PubMed Central Google Scholar
Michaloski, J. S., Galante, P. A. F. & Malnic, B. Identification of potential regulatory motifs in odorant receptor genes by analysis of promoter sequences. Genome Res. 16, 1091–1098 (2006).
Article CAS PubMed PubMed Central Google Scholar
Hirota, J. & Mombaerts, P. The LIM-homeodomain protein Lhx2 is required for complete development of mouse olfactory sensory neurons. Proc. Natl. Acad. Sci. 101, 8751–8755 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Clowney, E. J. et al. High-throughput mapping of the promoters of the mouse olfactory receptor genes reveals a new type of mammalian promoter and provides insight into olfactory receptor gene regulation. Genome Res. 21, 1249–1259 (2011).
Article CAS PubMed PubMed Central Google Scholar
Plessy, C. et al. Promoter architecture of mouse olfactory receptor genes. Genome Res. 22, 486–497 (2012).
Article CAS PubMed PubMed Central Google Scholar
Magklara, A. et al. An epigenetic signature for monoallelic olfactory receptor expression. Cell 145, 555–570 (2011).
Article CAS PubMed PubMed Central Google Scholar
Clowney, E. J. et al. Nuclear aggregation of olfactory receptor genes governs their monogenic expression. Cell 151, 724–737 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lyons, D. B. et al. Heterochromatin-mediated gene silencing facilitates the diversification of olfactory neurons. Cell Rep. 9, 884–892 (2014).
Article CAS PubMed PubMed Central Google Scholar
Armelin-Correa, L. M., Gutiyama, L. M., Brandt, D. Y. C. & Malnic, B. Nuclear compartmentalization of odorant receptor genes. Proc. Natl. Acad. Sci. 111, 2782–2787 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Armelin-Correa, L. M., Nagai, M. H., Silva, A. G. L. & Malnic, B. Nuclear architecture and gene silencing in olfactory sensory neurons. BioArchitecture 4, 160–163 (2014).
Article PubMed Google Scholar
Serizawa, S. et al. Negative feedback regulation ensures the one receptor-one olfactory neuron rule in mouse. Science 302, 2088–2094 (2003).
Article ADS CAS PubMed Google Scholar
Lewcock, J. W. & Reed, R. R. A feedback mechanism regulates monoallelic odorant receptor expression. Proc. Natl. Acad. Sci. 101, 1069–1074 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Lyons, D. B. et al. An epigenetic trap stabilizes singular olfactory receptor expression. Cell 154, 325–336 (2013).
Article CAS PubMed PubMed Central Google Scholar
Dalton, R. P., Lyons, D. B. & Lomvardas, S. Co-opting the unfolded protein response to elicit olfactory receptor feedback. Cell 155, 321–332 (2013).
Article CAS PubMed Google Scholar
Hanchate, N. K. et al. Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis. Science https://doi.org/10.1126/science.aad2456 (2015).
Article PubMed PubMed Central Google Scholar
Tan, L., Li, Q. & Xie, X. S. Olfactory sensory neurons transiently express multiple olfactory receptors during development. Mol. Syst. Biol. 11, 844 (2015).
Article PubMed PubMed Central Google Scholar
Fletcher, R. B. et al. Deconstructing olfactory stem cell trajectories at single-cell resolution. Cell Stem Cell 20, 817-830.e8 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, X., He, Y., Zhang, Q., Ren, X. & Zhang, Z. Direct comparative analyses of 10X genomics chromium and smart-seq2. Genomics Proteomics Bioinformatics 19, 253–266 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, I.-H. et al. Spatial transcriptomic reconstruction of the mouse olfactory glomerular map suggests principles of odor processing. Nat. Neurosci. 25, 484–492 (2022).
Article CAS PubMed PubMed Central Google Scholar
Caggiano, M., Kauer, J. S. & Hunter, D. D. Globose basal cells are neuronal progenitors in the olfactory epithelium: A lineage analysis using a replication-incompetent retrovirus. Neuron 13, 339–352 (1994).
Article CAS PubMed Google Scholar
Chen, X., Fang, H. & Schwob, J. E. Multipotency of purified, transplanted globose basal cells in olfactory epithelium. J. Comp. Neurol. 469, 457–474 (2004).
Article PubMed Google Scholar
Chen, M. et al. Wnt-responsive Lgr5+ globose basal cells function as multipotent olfactory epithelium progenitor cells. J. Neurosci. 34, 8268–8276 (2014).
Article PubMed PubMed Central Google Scholar
Jang, W., Chen, X., Flis, D., Harris, M. & Schwob, J. E. Label-retaining, quiescent globose basal cells are found in the olfactory epithelium. J. Comp. Neurol. 522, 731–749 (2014).
Article CAS PubMed PubMed Central Google Scholar
Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 477 (2018).
Article PubMed PubMed Central Google Scholar
Monahan, K. & Lomvardas, S. Monoallelic expression of olfactory receptors. Annu. Rev. Cell Dev. Biol. 31, 721–740 (2015).
Article CAS PubMed PubMed Central Google Scholar
Monahan, K., Horta, A. & Lomvardas, S. LHX2- and LDB1-mediated trans interactions regulate olfactory receptor choice. Nature 565, 448–453 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Lomvardas, S. et al. Interchromosomal interactions and olfactory receptor choice. Cell 126, 403–413 (2006).
Article CAS PubMed Google Scholar
Bailey, T. L. et al. MEME suite: Tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).
Article CAS PubMed PubMed Central Google Scholar
Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME suite. Nucleic Acids Res. 43, W39–W49 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Article PubMed PubMed Central Google Scholar
Wang, H.-N. et al. Inhibition of c-Fos expression attenuates IgE-mediated mast cell activation and allergic inflammation by counteracting an inhibitory AP1/Egr1/IL-4 axis. J. Transl. Med. 19, 261 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ray, N. et al. c-Fos suppresses systemic inflammatory response to endotoxin. Int. Immunol. 18, 671–677 (2006).
Article CAS PubMed Google Scholar
Bahrami, S. & Drabløs, F. Gene regulation in the immediate-early response process. Adv. Biol. Regul. 62, 37–49 (2016).
Article CAS PubMed Google Scholar
McLeay, R. C. & Bailey, T. L. Motif enrichment analysis: A unified framework and an evaluation on ChIP data. BMC Bioinformatics 11, 165 (2010).
Article PubMed PubMed Central Google Scholar
Zhang, G., Titlow, W. B., Biecker, S. M., Stromberg, A. J. & McClintock, T. S. Lhx2 Determines odorant receptor expression frequency in mature olfactory sensory neurons. eNeuro 3, (2016).
Vyas, R. N., Meredith, D. & Lane, R. P. Lysine-specific demethylase-1 (LSD1) depletion disrupts monogenic and monoallelic odorant receptor (OR) expression in an olfactory neuronal cell line. Mol. Cell. Neurosci. 82, 1–11 (2017).
Article CAS PubMed Google Scholar
Wang, Y. et al. SSBP2 is an in vivo tumor suppressor and regulator of LDB1 stability. Oncogene 29, 3044–3053 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wang, H. et al. Crystal structure of human LDB1 in complex with SSBP2. Proc. Natl. Acad. Sci. 117, 1042–1048 (2020).
Article ADS CAS PubMed Google Scholar
Wu, Y. et al. Structure of the MADS-box/MEF2 domain of MEF2A bound to DNA and its implication for myocardin recruitment. J. Mol. Biol. 397, 520–533 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cao, Q. et al. The central role of EED in the orchestration of polycomb group complexes. Nat. Commun. 5, 3127 (2014).
Article ADS PubMed Google Scholar
Potjewyd, F. et al. Degradation of polycomb repressive complex 2 with an EED-targeted bivalent chemical degrader. Cell Chem. Biol. 27, 47-56.e15 (2020).
Article CAS PubMed Google Scholar
Qiao, Q. et al. The structure of NSD1 reveals an autoregulatory mechanism underlying histone H3K36 methylation. J. Biol. Chem. 286, 8361–8368 (2011).
Article CAS PubMed Google Scholar
Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Res. 21, 381–395 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wang, J. et al. Opposing LSD1 complexes function in developmental gene activation and repression programmes. Nature 446, 882–887 (2007).
Article ADS CAS PubMed Google Scholar
Kilinc, S., Savarino, A., Coleman, J. H., Schwob, J. E. & Lane, R. P. Lysine-specific demethylase-1 (LSD1) is compartmentalized at nuclear chromocenters in early post-mitotic cells of the olfactory sensory neuronal lineage. Mol. Cell. Neurosci. 74, 58–70 (2016).
Article CAS PubMed PubMed Central Google Scholar
Rusconi, F., Grillo, B., Toffolo, E., Mattevi, A. & Battaglioli, E. NeuroLSD1: Splicing-generated epigenetic enhancer of neuroplasticity. Trends Neurosci. 40, 28–38 (2017).
Article CAS PubMed Google Scholar
Coleman, J. H., Lin, B. & Schwob, J. E. Dissecting LSD1-dependent neuronal maturation in the olfactory epithelium. J. Comp. Neurol. 525, 3391–3413 (2017).
Article CAS PubMed PubMed Central Google Scholar
Song, Y. et al. Mechanism of crosstalk between the LSD1 demethylase and HDAC1 deacetylase in the CoREST complex. Cell Rep. 30, 2699-2711.e8 (2020).
Article CAS PubMed PubMed Central Google Scholar
Shi, Y.-J. et al. Regulation of LSD1 histone demethylase activity by its associated factors. Mol. Cell 19, 857–864 (2005).
Article CAS PubMed Google Scholar
Delgado-Benito, V. et al. The chromatin reader ZMYND8 regulates Igh enhancers to promote immunoglobulin class switch recombination. Mol. Cell 72, 636-649.e8 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lin, C., Garruss, A. S., Luo, Z., Guo, F. & Shilatifard, A. The RNA Pol II elongation factor Ell3 marks enhancers in ES cells and primes future gene activation. Cell 152, 144–156 (2013).
Article CAS PubMed Google Scholar
Levine, S. S. et al. The core of the polycomb repressive complex is compositionally and functionally conserved in flies and humans. Mol. Cell. Biol. https://doi.org/10.1128/MCB.22.17.6070-6078.2002 (2002).
Article PubMed PubMed Central Google Scholar
Vandamme, J., Völkel, P., Rosnoblet, C., Faou, P. L. & Angrand, P.-O. Interaction Proteomics Analysis of Polycomb Proteins Defines Distinct PRC1 Complexes in Mammalian Cells. Mol. Cell. Proteomics 10, (2011).
Shirato, H. et al. A Jumonji (Jarid2) protein complex represses cyclin D1 expression by methylation of histone H3–K9 *. J. Biol. Chem. 284, 733–739 (2009).
Article CAS PubMed Google Scholar
Li, N. et al. ZMYND8 reads the dual histone mark H3K4me1-H3K14ac to antagonize the expression of metastasis-linked genes. Mol. Cell 63, 470–484 (2016).
Article CAS PubMed PubMed Central Google Scholar
Mould, A. W. et al. Smchd1 regulates a subset of autosomal genes subject to monoallelic expression in addition to being critical for X inactivation. Epigenetics Chromatin 6, 19 (2013).
Article CAS PubMed PubMed Central Google Scholar
Qin, R. et al. CDYL deficiency disrupts neuronal migration and increases susceptibility to epilepsy. Cell Rep. 18, 380–390 (2017).
Article CAS PubMed Google Scholar
Liu, S. et al. Chromodomain protein CDYL acts as a crotonyl-CoA hydratase to regulate histone crotonylation and spermatogenesis. Mol. Cell 67, 853-866.e5 (2017).
Article CAS PubMed Google Scholar
Tanay, A. & Regev, A. Scaling single-cell genomics from phenomenology to mechanism. Nature 541, 331–338 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Article CAS PubMed PubMed Central Google Scholar
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
Article CAS PubMed PubMed Central Google Scholar
Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20, 257–272 (2019).
Article CAS PubMed Google Scholar
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: Uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
Article Google Scholar
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
MATH Google Scholar
Hyndman, R. & Athanasopoulos, G. Forecasting: Principles and Practice. (2021).
Linderman, G. C., Zhao, J. & Kluger, Y. Zero-preserving imputation of scRNA-seq data using low-rank approximation. bioRxiv 397588 (2018) https://doi.org/10.1101/397588.

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

These authors contributed equally: Sigrun I. Korsching and Achim Tresch.

Authors and Affiliations

Institute of Medical Statistics and Computational Biology, Faculty of Medicine, University of Cologne, Cologne, Germany
Mohammad Hussainy & Achim Tresch
Institute of Genetics, Faculty of Mathematics and Natural Sciences, University of Cologne, Cologne, Germany
Mohammad Hussainy & Sigrun I. Korsching
Cologne Excellence Cluster On Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
Achim Tresch
Center for Data and Simulation Science, University of Cologne, Cologne, Germany
Achim Tresch

Authors

Mohammad Hussainy
View author publications
You can also search for this author in PubMed Google Scholar
Sigrun I. Korsching
View author publications
You can also search for this author in PubMed Google Scholar
Achim Tresch
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.T. and S.I.K. conceived and designed the analysis; M.H. collected the data, conducted the analysis and prepared the figures. All authors wrote, read and approved the manuscript.

Corresponding author

Correspondence to Achim Tresch.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Supplementary Information 5.

Supplementary Information 6.

Supplementary Information 7.

Supplementary Information 8.

Supplementary Information 9.

Supplementary Information 10.

Supplementary Information 11.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hussainy, M., Korsching, S.I. & Tresch, A. Pseudotime analysis reveals novel regulatory factors for multigenic onset and monogenic transition of odorant receptor expression. Sci Rep 12, 16183 (2022). https://doi.org/10.1038/s41598-022-20106-w

Download citation

Received: 12 February 2022
Accepted: 08 September 2022
Published: 28 September 2022
DOI: https://doi.org/10.1038/s41598-022-20106-w

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.