Plasma cells (PCs) are terminally differentiated B cells that are specialized in producing antibodies. After initial differentiation in secondary lymphoid organs or inflamed tissues, some of these cells enter niches in the bone marrow (BM) and survive as long-lived PCs (LLPCs), persisting for up to the lifetime of the hosts in the absence of repeated antigen stimulation1,2. LLPCs are important components of humoral immune memory and the basis for vaccine-induced long-term protection3,4. Therefore, it is critical to understand LLPC differentiation and maintenance.

The study of LLPCs is hampered by their rarity. In humans, LLPCs are estimated to account for only around 25% of total bone marrow PCs (BMPCs), and less than 10 LLPCs induced by a given vaccine can be found in 1 ml of BM5. In mice, it is estimated that the BM only contains 20,000–30,000 virus-specific LLPCs after lymphocytic choriomeningitis virus infection1. Additional difficulties come from the fact that PCs in secondary lymphoid organs and BM appear to be a heterogenous population consisting of cells derived from different B cell precursors (for example, B1 or B2 cells) through diverse activation pathways and at various stages of differentiation4,6. In mice, all of these PCs express a shared set of surface and intracellular markers, for example, CD138, XBP1, BLIMP1 (encoded by prdm1) and IRF4 (ref. 4). Therefore, it remains a challenge to identify bona fide LLPCs among all PCs, a prerequisite for deeper understanding of cell biology underlying their differentiation and maintenance.

LLPCs that remain long after protein immunization typically carry B cell receptors (BCRs) that are heavily mutated and of higher affinities7,8, indicative of their germinal center (GC) origin. Therefore, it has been generally assumed that T cell help and affinity maturation during the GC reaction are crucial for LLPC development and longevity. Transcriptomic analyses of bulk splenic PCs and BMPCs at the population level have pointed to upregulation of genes involved in metabolism, chemotaxis and cell–cell interactions in the BM compartment9, where comparative enrichment of LLPCs is expected. Whether and how GC experience might promote those changes is not yet elucidated, and it is not clear whether any of those changes are unique to LLPCs. This uncertainty is further complicated by the observations that GC-independent LLPC generation may also take place4.

Mature B cells include two major lineages, namely, B1 cells and conventional B2 cells. B1 cells are innate-like B cells that develop at the fetal and neonatal stages and are found in peritoneal and pleural cavities and solid organs10. The antigen receptor repertoire in B1 cells is distinct from that of B2 cells, including BCRs that are preferentially rearranged during early development11,12 and recognize common self-antigens and microbial products13,14. Both B1 and B2 lineages of B cells can give rise to PCs15, but LLPCs are thought to be mainly derived from B2 cells, consistent with the notion of GC dependence. PCs with B1-lineage signatures can be found in murine BM9,16, although whether they represent LLPCs is unresolved.

To better characterize LLPCs and probe their heterogeneity and function, we have combined single-cell RNA sequencing (scRNA-seq) to analyze transcriptomes and BCR repertoires of splenic PCs (SPPCs) and BMPCs in naive and immunized mice, immune profiling to identify subset-defining surface markers and a genetic pulse–chase approach to estimate half-life times of PC subsets. Our results demonstrate that mouse antigen-specific IgA and IgG LLPCs can be uniquely identified by surface phenotypes of Ly6AhiTigit and EpCAMhiCXCR3, respectively. We further report an unmutated IgM LLPC subset that develops in resting mice in a T cell-independent manner but transcriptionally resembles IgG LLPCs induced by T cell-dependent antigens. These cells partially depend on the microbiota and are enriched with public antibody sequences, including those found in B1a cells that recognize self-antigen and commensal microbes.


Heterogeneity of SPPCs and BMPCs

To understand the potential heterogeneity of PCs and identify the long-lived subsets, we first performed scRNA-seq to analyze both the transcriptomes and the BCR repertoires of individual SPPCs and BMPCs pooled from naive mice and mice immunized with 4-hydroxy-3-nitrophenylacetyl hapten conjugated to keyhole limpet hemocyanin (NP-KLH; Fig. 1a,b). This immunization primarily led to generation of IgG LLPCs. The bacterial artificial chromosome transgenic Blimp1–enhanced yellow fluorescent protein (Blimp1–eYFP) reporter strain17 was used to aid in the identification of PCs. In B6 mice, those carrying BCRs composed of a VH1–72 (VH186.2 in previous nomenclature) heavy chain and a λ-light chain were reactive to NP hapten18, which permitted us to track antigen-specific PCs by sequencing. To capture the temporal order of PC generation, we analyzed two independent time series (before immunization and 14, 21 and 60 d after immunization), and in each series, we sampled SPPCs and BMPCs from six mice per condition (Fig. 1a). PCs were enriched from spleen (SP) and BM cells by their surface expression of CD138 and then sort purified as CD138+eYFP+ cells (Fig. 1b,c and Extended Data Fig. 1a). Four separate library preparations and scRNA-seq runs by 10x Genomics were conducted to process all samples (Fig. 1d). In total, 44,189 PCs with assembled BCRs were recovered across all time points and tissue types (Fig. 1e). After correction for batch effects (Extended Data Fig. 1b), automatic clustering classified PCs into 15 different clusters according to their transcriptomes after exclusion of immunoglobulin transcripts (Fig. 1f and Supplementary Table 1). We verified the quality of our datasets by checking the uniform distribution of genes and unique molecular identifiers (UMIs) detected per cell (Extended Data Fig. 1c). The identity of recovered cells in all clusters as PCs was verified by high expression of PC signature genes, including Prdm1, Xbp1 and Sdc1 (Cd138), and low expression of Bach2 (Extended Data Fig. 1d).

Fig. 1: Heterogeneity of SPPCs and BMPCs revealed by scRNA-seq.
figure 1

a, Experimental design; i.p., intraperitoneal; LPS, lipopolysaccharide. b–e, Workflow of scRNA-seq, including bead enrichment for CD138+ cells from the SP and the BM (b), sorting (c), scRNA-seq library preparation and sequencing (d) and analysis of the transcriptome (e; top) and assembled BCRs (e; bottom). CB, cell barcode. f, UMAP of 15 PC clusters color coded and numbered from C0 to C14. g, UMAP visualization of PC isotypes (left) and isotype distribution in each cluster (right). ‘None’ indicates cells without BCR constant regions assembled. h, Left, PC clusters spatiotemporally resolved to tissue origins and time of isolation after immunization color coded as in f and superimposed onto the UMAP of all PCs. Right, summary statistics of relative abundance of PC clusters. i, UMAP visualization of SHM profiles of the heavy (left) and light (right) chains. The color gradient indicates numbers of SHMs by the shown scale (see Methods for details of SHM calculation). j, Violin plots of heavy chain (top) and light chain (bottom) SHM counts in each cluster. The y axis is pseudo-log transformed. k, Composition of BMPCs in TCR KO (T cell receptor knockout, Tcrb−/−Tcrd−/−) mice as assayed by scRNA-seq; left, relative abundance of indicated PC clusters; right, SHM profiles.

Source data

Even though clustering was done without taking immunoglobulin transcripts into account, isotype-switched PCs were still clustered separately from those unswitched IgM PCs (Fig. 1g). IgG PCs were predominantly in C8 and C9, while IgA PCs were in C3 and C11, with IgM PCs dominating the remaining clusters. Given that isotype switching takes place relatively early in the primary response19, there might be certain coordination between factors that drive isotype switching and factors that imprint transcriptomes during PC differentiation. Alternatively, IgG and IgA receptors may signal to B cells differently compared to the IgM receptor20,21.

When PC clusters were further resolved according to tissue type and time of isolation (Fig. 1h), some clusters showed a tissue-biased distribution and only weak dependence on immunization. For example, C0 and C7, both containing an overwhelming IgM+ majority, were almost exclusively in the BM and appeared to be a relatively constant component before and after immunization. C1 and C5, however, represented SP-dominated unswitched PC clusters, with C5 but not C1 showing an increased presence after immunization (Fig. 1h). C2, C4 and C6 were also predominantly IgM+ but showed minimally to mildly tissue-biased distribution. IgA PCs were prominently located in the BM, consistent with previous findings22,23, whereas IgG PCs were more equally distributed between the BM and SP. The relative abundance of IgA cells (C3 and C11) in both BM and SP peaked on day 21 after immunization and then decreased on day 60 to a level slightly higher than that before immunization (Fig. 1h). IgG PCs (C8 and C9) exhibited a similar kinetic pattern in the BM. These data are consistent with the notion that the BM represents a compartment of terminal accumulation for IgA and IgG PCs generated in response to environmental antigen or active immunization.

Different PC clusters carried different contents of somatic hypermutations (SHMs). As shown in Fig. 1i–j, C1, C5 and C7 contained essentially no SHMs and are likely GC independent. Consistent with this possibility, these clusters made up the majority of BMPCs in T cell-deficient animals (Fig. 1k). Also, predominantly unswitched IgM+, C0, C2, C4 and C6 contained substantial numbers of SHMs, suggesting their GC origin. IgA-dominated C3 and C11 had the highest number of SHMs, followed by IgG subsets C8 and C9, indicating that cells in these clusters had the most extensive GC experience in comparison to PCs in other subsets. Cluster C10 is unique in that it was the only cluster to express a notable level of Mki67 (Extended Data Fig. 1e) and probably represents plasmablasts in transition to quiescent PCs. Taken together, these initial analyses reveal considerable heterogeneity in the PC compartment that is partly determined by immunoglobulin isotypes, tissue of residence and ancestral GC experience. These data also support the proposal that both intrinsic and extrinsic factors play a role in regulating PC maintenance and survival4.

Identification of immunization-induced LLPCs

To identify LLPCs among various PC clusters, we focused our analysis on VH1–72+λ+ NP-specific PCs generated in response to NP-KLH immunization. A total of 713 such cells were found in data obtained from immunized mice, while 4 were found in data from naive mice. Projected onto the 15-cluster map, the majority of NP-specific cells were concentrated in C8 and C9 (Fig. 2a) and mostly IgG switched (Fig. 2b). When spatiotemporally resolved, NP-specific cells exhibited a trend of accumulation in C8 and depletion from C9 over time (Fig. 2c). The C8 accumulation trend was particularly evident in the BM, while the C9 depletion trend was particularly evident in the SP. We enumerated the characteristic W33L affinity-enhancing mutation and found that NP-specific BMPCs were of the highest affinity in C8 by day 60, whereas C9 cells on day 21 were of the highest affinity among all SPPCs (Fig. 2d), suggesting that NP-specific cells in splenic C9 might give rise to C8 in the BM over time. Taken together, these data indicate that cluster C8 represents LLPCs generated in response to immunization, and C9 is likely a more short-lived and temporary cell state.

Fig. 2: Identification of LLPCs induced by NP-KLH immunization.
figure 2

a, UMAP visualization of NP-specific PCs that carry VH1–72 paired with a λ-light chain from NP-KLH-immunized mice, color coded according to clusters as in Fig. 1f. b, Isotype distribution of C8, C9 and C10 VH1–72/λ cells. ‘None’ indicates cells without BCR constant regions assembled. c, Cluster distribution of VH1–72/λ cells in the BM and SP at indicated time points. The number of NP-specific PCs under each condition is given at the top of each corresponding bar graph. d, Numbers of W33- (W; germline affinity) and L33-carrying (L; high affinity) cells in C8 and C9 VH1–72/λ cells recovered from the BM (left) or the SP (right) at indicated time points.

Source data

Identification of LLPCs by a genetic pulse–chase approach

To more systematically identify LLPC subsets without a preconception about antigen specificities, we constructed a knock-in Blimp1-IRES-CreERT2-P2A-DTR strain (called BICREAD hereafter; Extended Data Fig. 2a). When crossed with the Rosa26-loxP-STOP-loxP-tdTomato (Ai14) reporter strain, the BICREAD allele allows for inducible labeling of PCs following tamoxifen treatment or PC deletion following administration of diphtheria toxin (DT). Indeed, after treatment with 2 mg of tamoxifen for 4 consecutive days, over 95% of PCs in the BICREAD × Ai14 mice became tdTomato+ (Extended Data Fig. 2b,c).

Next, we performed scRNA-seq to analyze tdTomato+ LLPCs that remained 90 d after the initial labeling (Fig. 3a). By mapping these cells to the reference dataset (Fig. 1f), we identified each of the 15 clusters (Fig. 3b) and estimated their relative abundance after the 90-d chase period compared to their relative abundance in unmanipulated mice at the steady state (Fig. 3c). In the BM, C1, C4, C7 and C12, but not other IgM PC clusters, became more enriched in those capable of lasting for at least 90 d and so did C3 but not C11 IgA PCs and C8 but not C9 IgG PCs. In the SP, the pattern of relative enrichment was very similar except that C1 was not enriched (Fig. 3c).

Fig. 3: Identification of LLPCs by a genetic pulse–chase approach.
figure 3

a, Experimental design. In BICREAD × Ai14 mice, PCs were labeled with tdTomato following tamoxifen (TAM) treatment as indicated. b,c, Characterization of PCs persisting 90 d after labeling. b, UMAP visualization of PC clusters. c, Fold changes in abundance of each PC cluster compared to the same cluster in naive mice in the reference dataset (Fig. 1f, day 0). d,e, Expression of selected genes, as in the reference dataset (Fig. 1f). d, UMAP visualization (left) and violin plots (right) of UMIs of immunoglobulin genes. e, Violin plots showing genes known to regulate survival, immunoglobulin production and the unfolded protein response.

Source data

We further examined transcriptomes of the remaining PC clusters in search for known characteristics of LLPCs. First, LLPCs are thought to have a very high capacity of continuous antibody production24. Consistent with this notion, C3, C7 and C8 are three clusters that contained the most abundant immunoglobulin transcripts (Fig. 3d). Second, genetic evidence indicates that the long-term survival of LLPCs requires antiapoptotic proteins MCL1 and BCL2, the co-stimulatory receptor CD28 and the BCMA receptor (encoded by the Tnfrsf17 gene)4. We found that C3, C7 and C8 cells highly expressed all of these genes, whereas shorter-lived C11 IgA and C9 IgG PC subsets expressed little Bcl2. In addition, C3, C7 and C8 cells also expressed relatively high levels of genes encoding factors essential for antibody production (for example, transcription regulators EAF2 and ELL2, factors regulating nutrient uptake, including SLC3A2, SLC1A5 and ENPP1, and mediators of the unfolded protein response ATF4, ATF6 and ERN1 (ref. 4); Fig. 3e).

Taken together, these results further support our initial identification of C8 as the IgG LLPC subset (Fig. 2) and identification of C3 as the long-lived IgA subset. We also found a considerable level of heterogeneity within persisting IgM PCs. While C7 PCs highly express LLPC-related genes, cells in C4 and C12 might invoke different survival mechanisms yet to be defined.

Surface phenotyping of LLPCs

To be able to identify LLPC subsets in the BM or SP by surface phenotyping, we scrutinized the expression of genes encoding surface markers that may distinguish the 15 clusters (Extended Data Fig. 3 and Supplementary Table 2). For IgG and IgM cells, Epcam was highly expressed by the long-lived (C7, C8 and C12) and Cxcr3 by the short-lived (C2 and C9) cells in both BM and SP (Fig. 4a). For IgA PCs, the long-lived C3 subset can be distinguished from the short-lived C11 subset by relatively higher expression of Ly6a and lower expression of Tigit (Fig. 4a). By fluorescence-activated cell sorting (FACS) analysis of SPPCs and BMPCs from naive mice, we found that, in both BM and SP, most IgG and IgM PCs expressed EpCAM over a wide range of levels on the cell surface, similar to what was previously reported (Fig. 4b,c)9. It appears that EpCAMhiCXCR3IgG+ and EpCAM+CXCR3+IgG+ cells, as identified by FACS, could respectively correspond to the C8 and C9 clusters as identified by scRNA-seq. Additionally, EpCAMhiCXCR3IgM+ and EpCAM+CXCR3+IgM+ cells, as identified by FACS (Fig. 4c), may mainly correspond to the C7 and C2 clusters as identified by scRNA-seq, respectively (Fig. 4a). To validate these findings, we sort purified EpCAMhiCXCR3 BMPCs (Extended Data Fig. 4a) and conducted a new scRNA-seq analysis of these cells together with total BMPCs in parallel. Using the dataset in Fig. 1f as the reference dataset, we readily mapped the new batch of BMPCs onto the 15 clusters with an expected isotype distribution (Extended Data Fig. 4b). As shown in Extended Data Fig. 4c,d, the majority of sort-purified EpCAMhiCXCR3 BMPCs were mapped to C8 and C7, with additional presence in C12 that also expressed a high level of EpCAM (Fig. 4a). By FACS, we identified Ly6AhiTigit and Tigit+ IgA PCs that presumably correspond to C3 and C11, respectively (Fig. 4b,d).

Fig. 4: Surface phenotyping of PC subsets.
figure 4

a, Dot plots of surface marker expression, with the color intensity indicating expression level and the dot size indicating percentage of expressing cells in the PC clusters. bd, Pseudocolor FACS plots of PCs from the SP and BM at the steady state, showing intracellular staining to reveal different isotypes (b). The surface expression of CXCR3 versus EpCAM (c) and Tigit versus Ly6A (d) is shown for each isotype. e,f, Histograms of EpCAM and Ly6A expression in PC subsets color coded as in c. e, Surface EpCAM expression in IgA subsets compared to EpCAMhiCXCR3 IgG and IgM PCs. f, Surface Ly6A expression in different IgG (left) or IgM subsets (right) compared to Ly6AhiTigit IgA PCs.

To evaluate turnover and persistence of different PC subsets identified by surface markers as above, we enumerated tdTomato+ PCs at different time points (days 2, 30 and 60) after tamoxifen treatment of BICREAD × Ai14 mice (Fig. 5a). First, as shown in Fig. 5b, numbers of PCs of all major isotypes decayed over time in both the SP and BM, with those in the BM outlasting those in the SP25. Second, we separated tdTomato+ IgG or IgM PCs into EpCAMhiCXCR3 and EpCAMlowCXCR3 subsets (Fig. 5c,d) and tdTomato+ IgA PCs into Ly6AhiTigit and Tigit+ subsets (Fig. 5e). While these subsets were evenly labeled by tdTomato expression 2 d after tamoxifen treatment, by 60 d, those persisting IgG and IgA PCs were predominantly EpCAMhiCXCR3 and Ly6AhiTigit, respectively (Fig. 5c,g). IgM PCs that persisted to 60 d included both EpCAMhiCXCR3 cells and some CXCR3 cells that expressed low to intermediate levels of EpCAM (Fig. 5e). These latter IgM PCs probably represent the C4 cluster for which no unique markers were yet identified. Quantitatively, EpCAMhiCXCR3 IgG and IgM PCs in the BM exhibited a half-life greater than 100 d (Fig. 5f), far longer than their CXCR3+ counterparts at around 10 d. Ly6AhiTigit IgA PCs in the BM barely decayed, giving an apparent half-life of >730 d as opposed to ~23 d in the SP (Fig. 5h). Therefore, IgA LLPCs are Ly6AhiTigit, while IgG and IgM LLPCs are EpCAMhiCXCR3. It is notable that, while IgA LLPCs did not express as high levels of EpCAM as IgG LLPCs (Fig. 4c,e), all IgG PCs irrespective of half-lives highly expressed Ly6A (Fig. 4d,f), again indicating considerable heterogeneities within the LLPC compartment.

Fig. 5: Half-life analysis of EpCAMhiCXCR3 and Ly6AhiTigit PCs.
figure 5

a, Experimental schedule. PCs in BICREAD × Ai14 mice were genetically pulse labeled with tamoxifen (TAM) for 4 consecutive days and analyzed by FACS at the indicated time points. Numbers of tdTomato+ PCs isolated from one tibia and one femur were counted for each mouse. b, Half-lives (t1/2) of total IgG, IgM and IgA PCs in the SP and BM, respectively. ch, Half-lives of IgG (c and d), IgM (e and f) and IgA (g and h) PC subsets. c,e,g, FACS dot plots of tdTomato+ cells overlaid on top of total PCs on day 0 or day 60 with IgM and IgG subsets further resolved by EpCAM and CXCR3 (c and e) and IgA subsets by Ly6A and Tigit (g). d,f,h, Fractions of tdTomato+ PCs remaining at the indicated time points, with day 0 set as 100%. Curve fitting and t1/2 calculations were performed by using a one-phase decay model. Data are shown as mean ± s.e.m. pooled from three (bf) or two (g and h) independent experiments with at least two animals in each experiment.

Source data

Validation of EpCAMhiCXCR3 IgG LLPCs and Ly6AhiTigit IgA LLPCs

To further validate the phenotypes of IgG and IgA LLPCs, we examined antigen-specific BMPCs following protein immunization and viral infection. In the first set of experiments, NP-binding IgG BMPCs were identified by intracellular staining at multiple time points after NP-KLH immunization (Fig. 6a). As shown in Fig. 6b,c, the fractional abundance of EpCAMhiCXCR3 cells in NP-binding IgG+ PCs greatly increased in the BM but not SP from day 14 to day 21 after immunization. Concomitantly, the overall surface EpCAM level on NP-binding BMPCs also increased from day 14 to day 21 (Fig. 6d). Most importantly, NP-binding BMPCs that persisted 7 months after immunization were EpCAMhiCXCR3 (Fig. 6e,f). We then tested prime-boost immunization with influenza hemagglutinin (HA) protein emulsified in a manganese adjuvant26 (Fig. 6g). Four months after immunization, HA-specific IgG BMPCs were also mostly EpCAMhiCXCR3 (Fig. 6h,i). We went further to test infection with the influenza virus, which is known to induce both IgA and IgG BMPCs27,28. By 140 d after intranasal infection with the PR8 virus (Fig. 6j), HA-specific IgA BMPCs were mostly Ly6AhiTigit, while HA-specific IgG BMPCs were mostly EpCAMhiCXCR3 (Fig. 6k,l). Therefore, Ly6AhiTigit IgA PCs and EpCAMhiCXCR3 IgG PCs represent LLPCs, irrespective of protein antigens and adjuvants used, and are also applicable to viral infection.

Fig. 6: LLPC phenotypes in immunization and infection models.
figure 6

af, LLPC validation following NP-KLH immunization. a, Experimental setup. bd, EpCAM and CXCR3 on NP-specific IgG+ BMPCs 14 and 21 d after NP-KLH immunization. Shown are representative FACS profiles (b), relative abundance (c) and EpCAM mean fluorescence intensity (MFI) (d) of EpCAMhiCXCR3 cells in NP-specific IgG+ BMPCs; AU, arbitrary units. e,f, Seven months after immunization. e, FACS gating of NP-specific IgG BMPCs. FSC, forward scatter. f, EpCAM and CXCR3 expression overlays; left, IgG+ (red) overlaid on total BMPCs; right, NP-specific IgG+ (red) overlaid on total IgG BMPCs. Data were from pooled cells of three Blimp1–eYFP mice. gl, LLPC validation following HA immunization (gi) or intranasal (i.n.) PR8 infection (jl). g,j, Experimental setup; Mn, manganese; p.f.u., plaque-forming units. h,k, Gating (left) and statistical summaries (right) of HA-specific cells in total IgG+ or IgA+ BMPCs, as indicated. i,l, Representative FACS profile overlays (left) and summary statistics (right) showing phenotypes of HA-specific IgG+ or IgA+ BMPCs (in red) overlaid on total IgG+ or IgA+ BMPCs. Each symbol indicates one mouse, and lines denote means. Data were pooled from three (ai) or two (jl) independent experiments, with at least two animals included in the infected or immunized group and at least one animal in the control group. All P values were determined by two-sided t-test.

Source data

IgM LLPCs contain frequent public clones

In the BM, the EpCAMhiCXCR3 IgM LLPCs contained a unique C7 subset that habored essentially no SHMs (Fig. 1j), exhibited a half-life very similar to IgG LLPCs (Fig. 5) and was transcriptomically more similar to IgG LLPCs than to C1 or C5 (Fig. 7a and Extended Data Fig. 5), two IgM-dominated clusters with very few SHMs (Fig. 1j). These cells carried a distinct repertoire with the least overlap with any other PC clusters (Fig. 7b). Strikingly, we found that many C7 cells from different RNA-seq runs carried identical BCRs (Fig. 7c), suggesting that they were public clones of germline recombination shared by different animals. Prompted by these findings, we systematically searched and cataloged a total of 415 public clones in the entire dataset, with each clone having various numbers of cells (Extended Data Fig. 6a and Supplementary Table 3). We divided all public clones into those statistically overrepresented in C7 (C7-enriched; Supplementary Table 3 and Fig. 7d) and those not enriched in C7 (non-C7; Supplementary Table 3 and Extended Data Fig. 6d). C7-enriched public clones were exclusively non-switched IgM PCs containing few SHMs, while non-C7 public clones were mainly IgM PCs but also contained IgG- and IgA-switched PCs and exhibited considerable numbers of SHMs (Extended Data Fig. 6b,c). Finally, among all 15 clusters, C7 exhibited the highest prevalence of public clones (Fig. 7e).

Fig. 7: C7 LLPCs are enriched with public clones.
figure 7

a, Heat map of pairwise Pearson correlation coefficients among major PC clusters. b, Heat map of clone abundance in major PC clusters. Each row is a clone, and its color represents its relative abundance in a given cluster. Only clones with a relative abundance >0.2% were included in the analysis. Columns and rows were arranged according to unsupervised clustering. c, A Circos plot showing public clones shared by C7 clusters from eight independent collections of mice. Linked lines represent public clones, with the eight most abundant clones colored, and the rest are in gray. d, UMAP visualization of C7 public clones superimposed on total PCs in gray, with cells grouped by their usage of VH and JH segments. The five most abundant public clones in each group are differentially colored, with the rest of the C7 public clones in red. e, Histogram showing relative abundance of public clones in each cluster. fh, Verification of C7 public clones in individual mice. f, Barcoding and pooling strategy for analysis of EpCAMhiCXCR3 BMPC cells from six mice. See Methods for details. ND, barcode not detected. g, Pie charts showing public clones in EpCAMhiCXCR3 IgM BMPCs from the six mice. Clones detected in at least four mice are in rainbow colors, with the other public clones in light gray and remaining non-public clones in dark gray. The total numbers of cells recovered in each mouse are given at the center of the chart, surrounded by numbers of cells in each category. h, Heat map summary of public clones found in individual mice shown in g. Public clones defined in the reference dataset (Fig. 1f) were grouped by the number of individual mice they were detected in. ‘New public clones’ were the clones that were found in more than two individual mice but not identified as public clones in the reference dataset.

Source data

To formally validate public clones at the level of individual mice, we used a barcoding strategy to RNA-seq C7-corresponding EpCAMhiCXCR3IgM+ BMPCs from six individual mice (Fig. 7f). As shown in Fig. 7g, in every single mouse we detected clones shared by at least another mouse, with 14–74% of the C7-corresponding BMPCs being public clones. Importantly, C7-enriched public clones already detected in the Fig. 1f dataset were once again recovered in C7-corresponding BMPCs in these six individual mice (Fig. 7h). In addition, we also identified new clones that were shared by at least two of the six mice newly analyzed but were not detected as public clones in the Fig. 1f reference dataset, implying that the true prevalence of public clones in C7 cells is even higher.

In aggregate, these results suggest that C7 public-clone BCRs may recognize universal antigen, self or environment (for example, commensals) and promote LLPC generation independent of the conventional GC pathway.

T cell-independent LLPCs recognize self and microbial antigen

To explore whether C7 PCs recognize self or environmental antigen, we first measured the abundance of C7-corresponding EpCAMhiCXCR3IgM+ cells in the BM of specific pathogen-free (SPF) or germ-free (GF) B6 mice from different colony breeders. As shown in Fig. 8a, mice from GemPharmatech (GP) and the Tsinghua University (THU) facilities had significantly more C7-corresponding cells than mice from Vital River Laboratory (VRL), and GF mice from two different colony breeders, GP and Institute of Laboratory Animals Sciences (ILAS), had significantly reduced, albeit not eliminated, C7-corresponding BMPCs compared to all SPF mice as a whole. These data imply that some C7-corresponding BMPCs are induced by components of certain commensal microbes that differentially colonize animals among different facilities. In addition, we found that many of the C7-enriched public clones used the VH11-2 + J1 germline antibody sequences (Supplementary Table 3) that were previously identified in B1a cells as reactive to phosphatidylcholine13,29, an autoantigen exposed on dead cells. It is therefore very likely that at least some C7 BMPCs develop from self-reactive B1a cells. Consistent with T cell-independent activation of B1a cells10, our scRNA-seq results confirmed the presence of C7 BMPCs in T cell-deficient animals (Fig. 1k), and a comparable abundance of EpCAMhiCXCR3IgM+ BMPCs was found in the two groups by FACS analyses (Fig. 8a).

Fig. 8: C7 LLPCs are in part induced by gut microbiota.
figure 8

a, Relative abundance of EpCAMhiCXCR3 IgM BMPCs (C7) found in GF or SPF mice of indicated genotypes (WT, wild type; TCR KO) from different breeder facilities. Each symbol represents one mouse, and data from each group were pooled from six individual animals from three (GF group) or two (the rest of the groups) independent experiments, with at least two animals analyzed each time. be, Development of C7 PCs. b, Schedule of DT treatment and time-course analysis of BICREAD mice (see also Extended Data Fig. 7a). c, Relative abundance of C7 EpCAMhiCXCR3 IgM BMPCs. Each symbol represents one mouse, and the bars represent the mean. Data were pooled from three independent experiments. d, Single-cell gene expression in PBS- or DT-treated groups after a 3-month recovery, as measured by scRNA-seq, in C7 (top) and C8 (bottom) cells. Each dot represents one gene, with differentially expressed genes labeled in red or blue. Pearson correlation coefficients between PBS and DT groups are shown above each plot. e, Pie charts comparing the composition of C7 public clones in the reference dataset (Fig. 1f) and those in PBS (top) or DT (bottom) groups after a 3-month recovery. Each slice represents a single public clone detected in C7, and the arc length represents its relative proportion in total C7 public clones in the reference dataset (Fig. 1f). The red color intensity of each slice indicates the number of cells of a given public clone found in the current dataset. Gray slices are those C7 public clones not detected in the current dataset. f, The abundance of C7 EpCAMhiCXCR3 IgM BMPCs in BICREAD mice 30 d after DT or PBS treatment with or without antibiotic cocktail (Abx) in the drinking water. Each symbol represents one mouse, and bars indicate means. Data were pooled from three independent experiments. All P values were determined by two-sided t-test.

Source data

To confirm that some C7 BMPCs were induced by the microbiota, we examined their regeneration after PC deletion in the absence of the microbiota. Intraperitoneal injection of 20 ng of DT into the heterozygous BICREAD mouse ablated over 90% of SPPCs and BMPCs within 24 h, without changing the total BM cellularity (Extended Data Fig. 2d,e). While almost all of the EpCAMhiCXCR3 IgM BMPCs were lost 1 d after DT treatment, the population gradually recovered to ~60% of the level found in the PBS control group after 30 d (Fig. 8b,c), indicating the antigen and environment for inducing C7 cells persist at the steady state. Importantly, our scRNA-seq and BCR sequence analysis of EpCAMhiCXCR3 BMPCs from the DT and PBS control groups showed that after a 3-month recovery, the recovered C7 cells were essentially indistinguishable from the ones detected in the control group (Fig. 8d) and contained a similar set of public clones of comparable diversity and abundance (Fig. 8e). Next, to directly test whether the recovery of C7 BMPCs depends on the microbiota, we treated heterozygous BICREAD mice with DT or PBS and supplemented the drinking water with a cocktail of antibiotics to continuously deplete their microbiota30. After 30 d, whereas approximately 40% of the EpCAMhiCXCR3 IgM BMPCs recovered in the control mice, less than 10% recovered in mice given antibiotics (Fig. 8f). Without the initial DT depletion (PBS group), antibiotic treatment over 1 month only slightly reduced the abundance of these cells (Fig. 8f). Therefore, once generated in response to microbiota, EpCAMhiCXCR3 IgM BMPCs could persist without rapid turnover, consistent with the relatively long half-life estimated (Fig. 5f).

To examine whether public-clone antibodies produced by C7 BMPCs can directly recognize microbiota antigen, we expressed five of the C7 public clones (C7 clones 1–5) and four of the most abundant non-C7 public clones (non-C7 clones 1–4) as monoclonal antibodies as well as a W33L-carrying VH1–72/λ antibody as a control (Supplementary Table 4). Subsequently, we tested binding of these antibodies to three types of antigen: fecal microbes from the gut of GP B6 mice (as characterized in Fig. 8a), dead cells as a source of autoantigen recognized by natural IgM antibodies10 and live cells as a negative control. As exemplified in Extended Data Fig. 7a,b,d,e and summarized in Extended Data Fig. 7c,f, two of the five C7 antibodies exhibited notable binding to fecal microbes without reactivities to dead cells or live cells. By contrast, all four non-C7 public clones could bind to dead cells, and two of the four could also bind to fecal microbes, suggestive of broader polyreactivities. Therefore, some C7 BMPCs are generated from publicly used somatic recombination in response to commensal antigens.

Finally, similar to C7, a substantial fraction of C3 BMPCs can be found in T cell-deficient animals, suggesting that IgA LLPCs could also be generated through T cell-independent mechanisms. Interestingly, among all C3 cells in our reference dataset, we found 15 clones that shared the same heavy chains with antibodies that were previously cloned from the small intestine or BM and demonstrated the ability to recognize components of the microbiota (Supplementary Table 5)31. Some of these antibodies are naturally polyreactive and develop in a microbiota-independent manner31. Therefore, similar to the formation of C7 IgM LLPCs, there is an endogenous mechanism to promote homeostatic production of polyreactive IgAs with innate specificity to microbiota and deposition of such IgA-secreting cells in the LLPC compartment.


PC longevity is of central importance to our understanding of long-term antibody-mediated immune protection and appears to be regulated by both extrinsic and intrinsic factors4. On the one hand, LLPCs are clearly not self-sufficient in survival, losing the long-lived potential when removed from their microenvironment32,33 in which multiple immune and non-immune cell types provide survival factors, such as interleukin-6 (IL-6), CXCL12, APRIL, BAFF and CD80/CD86 (ref. 4). On the other hand, most immunization-induced LLPCs are hypermutated7,8 and require IL-21 derived from follicular helper T cells34, implying that longevity may be partly imprinted during PC differentiation in GCs. While extrinsic factors essential for LLPC survival are relatively well characterized, it has been challenging to define a transcriptomic program underlying the longevity of LLPCs, in part because a mixture of PCs with different inferred lifespans exists in the BM4,6,25, but no surface markers have yet been able to distinguish these subsets4. By combining single-cell sequencing, FACS analysis and a genetic pulse–chase mouse model, we have characterized PC subsets in the mouse SP and BM, measured their half-lives and identified the long-lived populations. Our results suggest that PC longevity is associated with specific transcriptomic states and that these states take shape relatively early during PC development, supporting the notion that PC longevity is instructed by intrinsic transcription programs in addition to being shaped by the microenvironment and metabolic changes incurred35.

Intravital imaging has revealed that BMPCs are motile but also forge long-lasting interactions with stromal cells36,37. In this context, it is interesting to note that high-level expression of the adhesion molecule EpCAM and a lack of chemokine receptor CXCR3 expression characterize IgG and IgM LLPCs. EpCAM mediates homophilic adhesion between epithelial cells and is involved in epithelial-to-mesenchymal transition and cancer metastasis38,39. EpCAM has also been shown to promote memory T cell survival when ectopically expressed40. An intriguing possibility is that EpCAM might help anchor IgG and IgM PCs to their particular survival niches and transmit signals to regulate their longevity, while CXCR3 might distract potential LLPCs away from those niches. Previous observations that CXCR3+ PCs express lower levels of BLIMP1 than CXCR3 PCs, suggestive of relative immaturity in the former, are also consistent with LLPC characterization reported here9.

It is interesting that IgA LLPCs are transcriptionally distinct from IgG and IgM LLPCs, potentially reflecting their unique differentiation history at the mucosa27,41. The fact that PC transcriptomes exhibit isotype-specific features was also reported in a recent transcriptome analysis of PCs42. Importantly, IgA LLPCs only express an intermediate level of EpCAM, but they can be distinguished from other IgA PCs by an Ly6AhiTigit phenotype (Fig. 4c,e). However, all IgG BMPCs exhibit an Ly6Ahi phenotype, while none of the IgM subsets highly express Ly6A (Fig. 4d,f). These observations suggest that PC longevity of different isotypes is differentially regulated, and there may be different survival niches dedicated to PCs of different isotypes. This level of heterogeneity creates the possibility to manipulate LLPCs in an isotype-specific manner and requires further mechanistic understanding.

Another unexpected aspect of PC heterogeneity rests in the similarity and distinction between EpCAMhiCXCR3 IgM and IgG LLPCs. Both C7 IgM and C8 IgG PCs are long lived and share similar transcriptomes, but they differ greatly in isotype and developmental pathways (Supplementary Table 1). The cell state shared by these two LLPC populations could be achieved through either a GC-dependent (in the case of C8) or GC-independent (in the case of C7) pathway. Most surprisingly, development of C7 IgM LLPCs does not require T cell help, suggesting that what survival advantage may be conferred by a GC experience could also be imprinted in a very different route. The fact that C7 is unique in its public BCR repertoire suggests the precursor state and antigen being recognized play an important role.

It is striking that different mice share in their respective C7 EpCAMhiCXCR3 IgM LLPC compartment identical clones, some of which carry phosphatidylcholine-specific antibody sequences previously identified in innate-like B1a cells29,43. In addition, although EpCAMhiCXCR3 IgM BMPCs can recover after PC deletion in the presence of normal microbiota, a majority of them cannot when the microbiota is removed by antibiotics, indicating that a relatively large fraction of these cells recognize microbiota-associated antigens. Consistent with this idea, at least two C7 public clones demonstrably recognize gut microbes but not dead cells. These results suggest that there exist mechanisms to ensure that B cells carrying evolutionary humoral memory of self and microbial antigens in the form of preferentially used VDJ recombinations can develop into LLPCs without requiring GCs or T cells. Further supporting this idea, we also identified clones in germline configuration with affinity for the microbiota in the IgA LLPC compartment (Supplementary Table 5). Taken together, our results indicate that a T cell-independent route permits LLPC differentiation from B cells with innate-like features.

We found that mouse BM harbors a large number of IgM PCs, consistent with previous studies44,45. This is in contrast to the fact that human BMPCs are mainly of IgA and IgG isotypes, although IgM BMPCs can also be found46,47. This difference between the two species could reflect their evolutionary separation. Alternatively, it could also result from the relatively limited experience of antigen exposure by SPF mice. In addition, B1 cells may be able to switch to IgG and IgA isotypes48,49 and contribute to the IgA PC pool31,50. Future studies are warranted to understand the difference in IgM BMPCs between the species and how innate-like B cells develop into LLPCs; as such, mechanistic understanding may hold a key to improving our ability to program PCs of desired specificities, isotypes and longevity for vaccine development and treatment of antibody-mediated diseases.


Mice and study design

C57BL/6J (Jax 664), Blimp1–eYFP (Jax 8828), Ai14 (Jax 7914) and Tcrb−/−Tcrd−/− (Jax 2122, C57BL/6J background) mice were originally from the Jackson Laboratory and were maintained in SPF facilities. Blimp1–eYFP animals were backcrossed to the C57BL/6J background for more than five generations. GF mice were obtained from either the ILAS in the Chinese Academy of Medical Sciences or GP. For certain experiments, C57BL/6J SPF mice were purchased from either GP or VRL.

BICREAD mice were constructed with the standard CRISPR–Cas9 approach on the C57BL/6N background by inserting the IRES-creERT2-P2A-DTR cassette downstream of the stop codon of the Prdm1 open reading frame, targeted with one single guide RNA (TGAAAATCTTAAGGATCCATCGG). Founders were screened by PCR, and the genotype of F1 progenies was confirmed by both PCR and Southern blotting. The strain was subsequently backcrossed to C57BL/6J for two generations, and heterozygotes were used for experiments. Primers used for genotyping (ordered from RuiBiotech) were 5′-AGAAACCAGTCTGAAAGTGTCTTTGC-3′ (forward), 5′-TAATTCCAGAGAGGTGCAGGGAAG-3′ (wild type reverse) and 5′-GGTCGCTACAGACGTTGTTTGTCTT-3′ (mutant reverse).

All animal experiments were performed in the SPF facilities at THU according to the governmental and institutional guidelines for animal welfare and approved by the Institutional Care and Use Committee (IACUC) at THU. Animals were housed with filtered air, sterile pellet food, an acidified watering system and a 12-h light/12-h dark cycle. The temperature was kept at 22–26 °C and humidity at 40–70%, as required by IACUC.

Cohoused and age-matched animals were randomly assigned into experimental and control groups. For FACS analyses, at least two animals were included in each group unless indicated differently. Sample sizes were determined either empirically or based on preliminary data instead of predetermined by statistical methods. Each experiment was independently and reproducibly repeated two to three times using both 6- to 12-week-old female and male mice, and no data points were excluded from the analysis. For sequencing experiments, cells were pooled from five to eight mice in one to two experiments, and data exclusion was performed based on preestablished criteria (details are available in the Nature Research Reporting Summary linked to this article). Data collection and analysis were not performed blind to the conditions of the experiments.

Immunization, tamoxifen and antibiotic treatment

To analyze NP-specific PC formation, mice were immunized intraperitoneally with 100 µg of NP-KLH (Biosearch Technologies) emulsified in 50% alum adjuvant (Invitrogen) with 1 µg of lipopolysaccharide (Sigma-Aldrich) in a total volume of 100 µl. For phenotyping HA-specific PCs, mice were either intraperitoneally immunized twice, 14 d apart, with 50 µg of HA protein51 emulsified in 50% manganese adjuvant (MnStarter Biotechnology) or intranasally infected with 200 plaque-forming units of PR8-GP33 virus. For PC labeling experiments, the BICREAD × Ai14 mice were gavaged once per day with 2 mg of tamoxifen (ApexBio) dissolved in 200 µl of sun flower seed oil (JSENB) for 4 consecutive days. For PC deletion experiments, mice were intraperitoneally injected with 20 ng of DT (Enzo Life Sciences) diluted in 100 µl of sterile PBS. For certain experiments, mice were given drinking water containing 1 g liter–1 ampicillin, 1 g liter–1 neomycin, 1 g liter–1 metronidazole, 0.5 g liter–1 vancomycin and 1% sucrose. Freshly prepared antibiotic water was replaced every 3 d.

PC isolation and sorting

PCs were first enriched from different organs by the use of magnetic beads. BM cells (from the tibia, femur and humerus) and splenocytes were incubated with a biotinylated anti-CD138 antibody (281-2, BD) in staining buffer (PBS with 1% fetal bovine serum and 5 mM EDTA) for 30 min, washed and incubated with streptavidin-conjugated microbeads (Miltenyi Biotec) for 30 min before bead-bound cells were isolated according to the manufacturer’s instruction. The bead-enriched cells were subsequently stained with streptavidin-PE-Cy7, PE-Cy7–anti-CD138 (281-2) and Zombie Yellow from BioLegend and ef450–anti-B220 (RA3-6B2) from eBioscience. For scRNA-seq analysis, 10,000–20,000 PCs (CD138+eYFP+ cells in Blimp1–eYFP mice, CD138+ cells in B6 mice or CD138+ tdTomato+ in BICREAD × Ai14 mice) were sorted into 5 µl of PBS containing 1% fetal bovine serum in 200-µl microtubes using an Aria III flow cytometer (BD). For certain experiments, bead-enriched cells were further stained with PE–anti-CXCR3 (CXCR3-173, BioLegend) and APC–anti-EpCAM (G8.8, BD) in addition to antibodies listed above and gated as indicated to sort purify different PC subsets.

Preparation of barcoded anti-EpCAM

To analyze EpCAM-expressing PCs from individual animals in a pooled manner by scRNA-seq, DNA barcodes were added to PE-Cy7–anti-EpCAM (BioLegend, clone G8.8) using the following procedure: 50 μg of antibodies was exchanged into PBS and concentrated to 50 μl with an Amicon Ultra 0.5-ml 30-kDa molecular weight cutoff centrifugal filter (Millipore) by two additions of 500 μl of PBS and centrifugation at 14,000g for 10 min at 4 °C. Then, 24 μl of dibenzocyclooctyne-PEG4-N-hydroxysuccinimidyl ester (DBCO-PEG4-NHS) was added to the antibodies, and the mixture was incubated for 2 h at 4 °C with rotation. Residual DBCO groups were removed by washing with 300 μl of PBS five times using the 30-kDa filter. Centrifugation was performed at 14,000g for 10 min at 4 °C, and antibodies were concentrated to 60 μl. DBCO antibodies were then divided into six tubes, and each antibody was mixed with a different 5′-azide barcoded DNA oligonucleotide (CGGAGATGTGTATAAGAGACAG-15 nucleotide barcode-CCCATATAAGA*A*A; the asterisk (*) represents phosphorothioate; ordered from Sangon Biotech). The mixture were incubated at 4 °C overnight with rotation. Residual DNA oligonucleotides were removed by washing five times with 300 μl of PBS using the 30-kDa filter. To prepare barcoded cells for scRNA-seq, BM cells were isolated from individual animals and stained with different barcoded PE-Cy7–anti-EpCAM and biotinylated anti-CD138. The cells were then washed twice with 5 ml of staining buffer and pooled together for PC enrichment as described above. The bead-enriched cells were then stained with streptavidin-BV510, BV510–anti-CD138 (281-2, BioLegend), PE–anti-CXCR3, Zombie Yellow and ef450–anti-B220. CD138+EpCAMhiCXCR3 cells were sorted for analysis.

Flow cytometry

For the analysis of PC subsets and NP-specific cells, splenocytes and BM cells were blocked with 10 µg ml–1 2.4G2 (BioXCell) antibody and stained with a combination of antibodies as indicated, including AF700–anti-B220 (RA3-6B2) from Invitrogen and PE-Cy7–anti-Tigit (1G9), Pacific Blue–anti-Ly6A (D7), BV510–anti-CD138 (281-2), APC–anti-EpCAM (G8.8), PE-Cy7–anti-EpCAM (G8.8), BV421–anti-CXCR3 (CXCR3-173) and Zombie Yellow from BioLegend. The cells were subsequently washed once with staining buffer and fixed and permeabilized by using the FOXP3/transcription factor staining buffer set (eBioscience). Antibodies used for intracellular staining include AF647–anti-BLIMP1 (5E7), PE–anti-BLIMP1 (5E7) and AF594–anti-IgG (poly4053) from BioLegend, ef450–anti-IgM (eB121-15F9) and PE–anti-IgA (mA-6E1) from Invitrogen, FITC–anti-IgA (C10-3) from BD and NP–PE from Biosearch Technologies. For the detection of HA-specific PCs, HA protein was labeled with AF488 using a microscale labeling kit according to the manufacturer’s instructions (A30006, Invitrogen). For the analysis in Fig. 6e,f, PCs were first enriched from the indicated organs and processed for FACS analysis. For enumeration of PCs, BM cells were isolated from one femur and one tibia from each mouse, and precision count beads (BioLegend) were added to the samples as a reference to achieve absolute cell counts. Data were acquired on the Aurora full-spectrum cytometer (Cytek) and analyzed with FlowJo software (Treestar).

scRNA-seq and BCR sequencing

After PC enrichment, single cells from different experimental groups were captured separately in nanoliter droplets using a Chromium single-cell A chip kit and Chromium controller (10x Genomics). scRNA-seq and scBCR-seq libraries were prepared according to the manufacturer’s guidelines using the Chromium Single-Cell 5′ Library & Gel Bead kit (10x Genomics). Single PC lysis and first-strand cDNA synthesis were performed in droplets. After reverse transcription, droplet cleanup was performed to collect cDNA samples from all PCs for downstream cDNA amplification. For the analysis of cells from individual animals, 5 μl of the 0.2 μM barcode amplification primer (5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3′, ordered from Sangon Biotech) was added to the cDNA amplification reaction, and the amplified mouse barcodes were separated from the cDNA by size selection DNA purification using AMPure XP beads (Beckman Coulter), and sequencing adaptors were then added to the barcode amplified products by index PCR. For scRNA-seq libraries, 50 ng of cDNA products was used for library construction according to the manufacturer’s guidelines. Briefly, cDNA was enzymatically fragmented and size selected to optimal size. Subsequently, sequencing adaptors were added via end repair, dA tailing, adaptor ligation and index PCR. For scBCR-seq libraries, BCR products were enriched from cDNA via PCR using the Chromium Single-Cell V(D)J Enrichment kits (mouse B cell, 10x Genomics) according to the manufacturer’s guidelines, and the sequencing libraries were generated following the same procedure for scRNA-seq library construction. All cDNAs, BCRs, barcodes and sequencing libraries were quantified by using Qubit (Thermo Fisher), and size was analyzed by using the 2100 Bioanalyzer High-Sensitivity DNA kit (Agilent). Sequencing was performed on an Illumina HiSeq × 10 system and NovaSeq 6000 system.

Preprocessing of single-cell sequencing data

The raw scRNA-seq and scBCR-seq fastq files were processed using Cellranger (version 3.0.2, 10x Genomics). Reads from scRNA-seq libraries were aligned to the mm10 genome supplied by 10x Genomics (refdata-cellranger-mm10-2.1.0), and cell gene expression matrices for each sample were obtained using the ‘cellranger count’ function with default parameters. Full-length BCR assembly and basic annotations were performed using the ‘cellranger vdj’ function with compatible references supplied by 10x Genomics (refdata-cellranger-vdj-GRCm38-alts-ensembl-2.2.0). The cell barcodes shared between scRNA-seq reads and scBCR-seq reads were used to link gene expression data and BCR information from the same cell.

Gene expression data analysis

Gene expression data were analyzed using Seurat52 (version 3.0.1). The following workflow was used for each dataset: (1) matrices of all samples were combined, and cells with fewer than 200 genes detected or with >10% of detected genes belonging to the mitochondria were excluded. Cells with more than one productive immunoglobulin heavy chain or light chain were possibly doublets and excluded, and cells with no heavy chain detected were also discarded. (2) Immunoglobulin-related genes (for example, IgH V genes, IgH J genes and constant genes) were removed from the gene expression matrix. (3) Data normalization, identification of highly variable genes, dimensionality determination, identification of cell clusters and data visualization via uniform manifold approximation and projection (UMAP) dimensional reduction were performed according to the package manual. (4) Cell clusters without PC marker gene expression (that is, Sdc1 and Prdm1) were excluded. (5) For the first dataset presented in Fig. 1f (including four time points and two tissue origins (BM and SP)), experimental batch effect correction was performed using the ‘IntegrateData’ function, and all processes of step 3 above were performed for the integrated data. This dataset was termed ‘reference dataset’ and used as a reference for further cluster prediction of other datasets. (6) For the other datasets, all processes of step 3 were rerun, and cell clusters of the reference dataset were projected and transferred as predicted cluster using the ‘TransferData’ function for each dataset, respectively. After all these procedures, data visualization and presentation were performed using the built-in function of Seurat and ggplot2 (ref. 53).

BCR data analysis

Mapping and annotation of full-length BCR sequences were performed against the reference downloaded from the IMGT54 database using NCBI IgBLAST55 (version 1.14.0). Data filtering and SHM calculation were performed for each assembled BCR chain, with non-productive BCR sequences excluded from the analysis. SHM numbers were calculated for V and J regions, respectively, as region length * region identity, and the total numbers of the three (two) regions represented the SHM numbers of BCR chains. Then heavy chains and light chains of the same cells were paired using cell barcodes. Doublet identification was performed as mentioned above, and cells with only BCR data but no gene expression data were discarded.

In this study, cells of the same ‘clone’ had identical junctional DNA sequences in the heavy chain- and light chain-coding genes. A ‘public clone’ was defined as a clone that contains cells from two or more different mice. In the reference dataset, as defined above, public clones were from two or more independent experiments, which by definition entails origins from different mice. A C7-enriched public clone was defined by statistical tests of enrichment for each public clone using hypergeometric distribution. In brief, N was the number of all public clones, and n was the number of public clones in C7; for each public clone, K was the cell number of this clone, and k was the cell number of this clone in C7. Then, the probability of k or more cells of this clone being detected by chance in C7 was defined as

$$P = \mathop {\sum }\limits_{i = k}^n \frac{{C_K^iC_{N - K}^{n - i}}}{{C_N^n}}.$$

Public clones with P values less than 0.05 were defined as C7-enriched public clones, and P values and clone classification of all public clones are listed in Supplementary Table 3.

For the identification of microbiota-reactive IgA LLPCs, C3 clones from the reference dataset were matched with IgA sequences with known antigen specificities31. Clones that share the same CD3 amino acid sequences and VDJ usage of heavy chains with published sequences are listed in Supplementary Table 5.

Calculation of PC half-lives

The curve-fit and half-life calculations were done using Prism 8, following a one-phase decay model by the following equation,

$$P = (Y_0 - {{{\mathrm{plateau}}}})e^{ - kx} + {{{\mathrm{plateau}}}},$$

where Y0 is the Y value when time x is zero, k is the rate constant and plateau is set to zero.

Analysis of VH1–72 cells

Cells with VH1–72 paired with the λ-light chain, canonical NP-specific cells in B6 mice were extracted from the reference dataset, and gene expression analyses were redone following step 3 mentioned above.

Mouse barcoding data analysis

For analysis of cells from individual animals in a pooled manner, gene expression and BCR analysis were performed as mentioned above. To determine mouse origin, all six mouse barcodes were counted for each 10x cell barcode using a home-made Python (version 3.6.10) pipeline. Cells with no gene expression data or mouse barcode data were excluded, and the resulting mouse barcode cell matrix was processed using the ‘HTODemux’ function of the Seurat package. Mouse IDs were linked to gene expression, and BCR results were linked by shared cell barcodes. After barcode analysis, cells could be divided into four groups: doublets (cells with more than one possible origin), singlets (cells with only one determined origin), negative (cells with undetermined origin) and ND (barcode not detected). Only cells with determined mouse IDs were used for further analysis. In Fig. 7g, IgM PCs were used as an approximation for C7 cells for further public-clone analysis.

Production of monoclonal antibodies

Monoclonal antibodies from chosen clones were expressed by AbCipher Bio-technology. Briefly, variable regions from the matched heavy chain and light chain of each public clone were assembled from the single-cell sequencing data, connected with the IgM and murine light chain constant regions, respectively, tagged with His-tag and cloned into the pTT5 vectors for expression. One night before the transfection, 293-F cells were seeded at a concentration of 1.5 × 106 cells per ml. The plasmids containing matched heavy chains and light chains were transfected into 293-F cells at a concentration of 1 µg ml–1 and mixed with polyethylenimine hydrochloride (molecular weight 4,000, Polysciences) at a ratio of 1:3 following the manufacturer’s instructions. Fresh medium was added to the transfected cells every 2 d, and the supernatants were collected after 6 d of culture. Subsequently, equilibrated nickel-nitrilotriacetic acid agarose beads (Smart-lifesciences) were added to the supernatant. The beads were washed twice, and the purified antibodies were eluted according to the manufacturer’s instructions. The antibodies were further concentrated and buffer exchanged into PBS (pH 7.2) using ultracentrifugal filters. The purity of the purified antibodies was verified by SDS–PAGE electrophoresis, and the protein concentrations were calculated based on optical density at 280 nm measurements.

Antibody-binding assays

To assess public antibody binding to dead cells, A20 cells (ATCC, TIB-208) were collected from cultures, washed once with staining buffer, blocked with 2.4G2 antibody and incubated with 10 µg ml–1 of different IgM antibodies. The cells were then incubated with ef450–anti-IgM, followed by Annexin V-APC (Invitrogen) and 7-AAD (Biotium) staining according to the manufacturer’s instructions. All incubations were done for 30 min on ice, and cells were washed once with staining buffer between steps. Dead cells were gated as 7-AAD+Annexin V+.

To test for public antibody binding to fecal microbes, fecal samples were collected from 8-week-old male mice purchased from GP and processed as previously described56. Briefly, fecal pellets were resuspended in sterile PBS at 25 mg ml–1 and were homogenized and filtered through a 30-µm cell strainer. Microbes were collected by centrifugation at 12,000g for 30 s and resuspended in blocking buffer (PBS with 2% bovine serum albumin). Public IgM antibodies were added at a concentration of 10 µg ml–1, followed by incubation with APC–anti-IgM (RMM-1, BioLegend). The microbes were then fixed with 4% paraformaldehyde and stained with 7-AAD at a concentration of 10 µg ml–1. Centrifugation was done at 12,000g for 30 s. All incubations were done in blocking buffer for 30 min on ice.

Statistical analyses

Statistical analyses were performed by using Prism 8 (GraphPad) on data pooled from multiple independent experiments. Two-sided Student’s t-tests were used to calculate P values. Data distribution was assumed to be normal, but this was not formally tested.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.