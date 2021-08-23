TCRβ sequences from human CoNGA clusters were matched to bulk TCRβ repertoires using TCRdist. To score the overlap between the set of TCR sequences in a CoNGA cluster and the set of sequences in a bulk repertoire, we developed a variant of the Morisita-Horn (MH) overlap index that accounts for sequence similarity in addition to exact identity (see Methods for further details). (a) The MH overlaps (y-axis) are plotted against subject age (x-axis) for the two CoNGA clusters indicated in the panel titles. The first cluster (a MAIT cluster) appears to decline with subject age, while the second one (a HOBIT cluster) appears to increase (R value and 2-sided P value in legend). (b) The distribution of MH overlaps for a set of CD4+ repertoires is compared with the distribution of MH overlaps for a set of CD8+ repertoires for two different clusters from the thymus_atlas dataset. (c) The distribution of MH overlaps for a set of memory repertoires is compared with the distribution of MH overlaps for a set of naive repertoires for the two clusters indicated in the panel titles. Boxes in panels b and c show quartiles with whiskers extending to 1.5*IQR. (d) All-vs-all scatter plots (with kernel density estimates along the diagonal) for the following CoNGA cluster features (see Methods for feature calculation details): log10_Pgen, the average log 10 generation probability of the cluster TCRβ chains; log10_publicity, the average log 10 rate of occurrence in a large (N = 666) dataset of PBMC repertoires; age_correlation, the linear correlation coefficient between MH overlap and subject age (see panel (a)); CD8_vs_CD4, t-statistic comparing MH overlaps for CD8 and CD4 repertoires (higher indicates greater preference for CD8 repertoires; see panel (b)); memory_vs_naive, t-statistic comparing MH overlaps for memory and naive repertoires (higher indicates greater preference for memory repertoires; see panel (c)). The CoNGA clusters are grouped according to the discussion in the main text; ‘pre_hobit’ refers to the two clusters in the thymus_atlas dataset that may be precursors of the HOBIT+ population, (CD8αα(I):2) and (CD8αα(II):2).

