Leukocytes with chromosome Y loss have reduced abundance of the cell surface immunoprotein CD99

Mosaic loss of chromosome Y (LOY) in immune cells is a male-specific mutation associated with increased risk for morbidity and mortality. The CD99 gene, positioned in the pseudoautosomal regions of chromosomes X and Y, encodes a cell surface protein essential for several key properties of leukocytes and immune system functions. Here we used CITE-seq for simultaneous quantification of CD99 derived mRNA and cell surface CD99 protein abundance in relation to LOY in single cells. The abundance of CD99 molecules was lower on the surfaces of LOY cells compared with cells without this aneuploidy in all six types of leukocytes studied, while the abundance of CD proteins encoded by genes located on autosomal chromosomes were independent from LOY. These results connect LOY in single cells with immune related cellular properties at the protein level, providing mechanistic insight regarding disease vulnerability in men affected with mosaic chromosome Y loss in blood leukocytes.

Y genes (EDY) in patients with cancer 31 as well as Alzheimer's disease 32 . Outside of the Y chromosome, transcriptome analyses of peripheral leukocytes identified almost 500 autosomal genes showing LOY associated transcriptional effects (LATE) including genes involved in immune functions and other biological processes, likely disturbing cellular homeostasis 7 . Moreover, men diagnosed with prostate cancer were primarily affected with LOY in T-lymphocytes and granulocytes while Alzheimer's disease patients displayed higher levels of LOY in NK cells 7 . In aggregate, the results from recent studies suggest that LOY in the hematological system of aging men is not phenotypically neutral.
One of the immune genes showing LATE, by consistent downregulation in LOY cells, is the CD99 gene positioned in pseudoautosomal region 1 (PAR1) of chromosomes X and Y 33 . CD99 escapes X-inactivation in females, indicating the importance of its balanced expression 34,35 . In males, previous transcriptome analyses of single cells and bulk sorted cellular populations found that the level of CD99 mRNA transcripts were lower in all studied types of leukocytes with LOY, such as NK cells, monocytes, T-and B-lymphocytes 7 . It is likely that the reduced expression of CD99 is directly linked with the copy number change at this locus due to the aneuploidy, while expression of the X-linked copy is retained 7 . CD99 is a transmembranous glycoprotein found at low levels in most tissues and highly expressed in cell types such as hematopoietic progenitor cells, peripheral blood cells and endothelial cells 36,37 . Its normal functions was recently reviewed 36 and notably, when present at the cell surface, this protein is essential for the process of transendothelial migration (TEM) in which immune cells cross vascular walls through a sequence of interactions with endothelial cells 38 . Likewise, cell surface CD99 is involved in cell-cell adhesion that facilitates immune cell interactions 39,40 . Furthermore, intracellular CD99 regulates post-Golgi trafficking and transport of proteins to the plasma membrane 41 . For example, the cell surface abundance of human leukocyte antigen (HLA class I) and T-cell receptor (TCR) as well as major histocompatibility complex (MHC class I and II), have been linked with CD99 function 41,42 . Moreover, associations between LOY and blood cell counts was recently reported in human populations 4,28 . It is possible that altered cell differentiation could be connected with LOY associated dysregulation of CD99, since in vitro studies of hematopoietic progenitors suggest a role for CD99 in normal immune cell differentiation, selection and apoptosis [43][44][45] . Given the previously described reduced level of CD99 mRNA in leukocytes with LOY and its vital immunological roles; we sought here to study in vivo collected leukocytes to investigate if LOY also affect the cell surface abundance of the functionally relevant CD99 protein.

Results and discussion
From freshly collected blood samples, we studied the abundance of CD99 cell surface protein as well as CD99 mRNA in single cells with and without LOY using Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq) 46 . The method combines a droplet-based high-throughput single-cell RNA sequencing technology with oligonucleotide-labelled antibodies targeting cell surface proteins. By incubating the cells with the antibodies prior to droplet generation, the oligonucleotide labels are indexed with the same cell specific barcodes as the mRNA during sequencing library preparation, and can thus be quantified and traced to the cell of origin after sequencing. CITE-seq therefore provide both transcriptional and phenotypical information simultaneously at the single cell level 46 . In the CITE-seq protocol applied here, RNA expression and protein level readouts were generated for individual leukocytes by combining 10X Genomics 3' transcriptome single cell solution v.2 with antibody-linked sequence tags for cell surface protein markers. After sequencing, a pooled dataset including 14,376 single cells originating from four male subjects diagnosed with Alzheimer's disease (median age 81.5 years) was established. Single cell identities were determined from RNA expression profiles and visualized using Uniform Manifold Approximation and Projection (UMAP) (Figs. 1, S1). Consistent results derived from the single cell experiments encompassing the pooled dataset suggest comparability between batches and individuals ( Fig. S1). In addition to the CITE-seq construct targeting Y-linked CD99 protein, we also applied antibody constructs targeting six cell surface proteins encoded by autosomal genes; i.e. CD19, CD14, CD16, CD56, CD8 and CD4. These markers are normally present on the cell surfaces of B-lymphocytes, classic and non-classic monocytes, NK cells, CD8 + T-lymphocytes and CD4 + T-lymphocytes, respectively. The occurrence of these cell surface proteins on the studied leukocytes is visualized Fig. S2 and confirm the mRNA-based cell type identification and clustering.
Next, the LOY status of each sequenced cell was determined from the transcriptome data by the lack of expression of genes located in the male-specific region of chromosome Y (MSY), as described previously 2,7 . Occurrence of single cells with LOY was observed in all subjects and in all types of studied leukocytes, ranging from 2.4 to 20.9% in frequency between different cell types (Figs. 1, S1, Table S1). To test if the abundances of CD99 derived mRNA and CD99 protein display alterations in cells with LOY, compared with normal cells without the aneuploidy, we first performed logistic regression using the pooled dataset in models adjusted for confounders such as cell donor, experimental batch, UMI counts, percentage of mitochondrial RNA and cell type. These primary analyses of CD99 abundance showed a significant overall reduction in the level of mRNA (Z = -7.6, p = 3.66e-14) as well as cell surface protein levels (Z = -12.6, p < 2e-16) in single cells with LOY. Further exploratory analyses showed that the reduction of CD99 protein was present in all types of studied leukocytes with LOY (Figs. 2, 3, Table 1). The largest reduction was observed in B-lymphocytes with an average log fold change between single cells of -0.31 (adj. p = 0.0006) representing a 27% decrease of CD99 protein abundance on the surface of B-lymphocytes with LOY. In contrast to the significant reduction of CD99 protein levels, the abundance of the six cell surface CD proteins encoded by autosomal genes investigated were not affected by LOY in any of the cell types studied (Fig. S3, Table 1).
The RNA-readout also displayed an overall reduction in expression of CD99 mRNA in single cells with LOY and exploratory analyses showed that the downregulation was significant in CD14 + monocytes and CD56 + NK cells (Table 1) www.nature.com/scientificreports/ compared with the general decrease of CD99 protein abundance on cell surfaces, we observed a greater variation in the CD99 mRNA abundance between single cells (Fig. 2). For example, a substantial proportion of single cells in our assay displayed no expression of CD99 mRNA transcripts, highlighting the issue of zero-inflation commonly observed in single cell mRNAseq data. In contrast, the cell surface abundance of the CD99 protein displayed a more even distribution and a significant reduction in single cells with LOY. This result supports the view that proteins constitute more stable markers for cellular properties of single cells, compared with the generally more fluctuating mRNA levels [46][47][48] . From a translational perspective, the possible functional consequences of LOY and CD99 deficiency in leukocytes are promising. Previous studies have shown that CD99 and PECAM-1 are independently responsible for interactions necessary for TEM, specifically the passage of leukocytes through endothelial junctions 38 . In tests where monoclonal antibodies were used to block cell surface CD99 in monocytes; TEM was severely inhibited in vitro 38 as well as in vivo 49 , with monocytes arrested partway through the junction. Other functional studies show an impact on cell-to-cell adhesion of lymphocytes after blocking CD99 with monoclonal antibodies 39,40 . Furthermore, CD99 regulates transport of proteins to the plasma membrane. For example, low abundance of CD99 in B-lymphocytes was associated with reduced cell surface levels of MHC class I proteins; a deficiency that could be restored by increasing CD99 abundance 41 . Moreover, blocking CD99 resulted in the intracellular accumulation of MHC class I molecules in B-as well as T-lymphocytes 41,42 . On the other hand, engagement of CD99 increased the abundance of the immunoproteins TCR and MHC class I and II molecules on the surface of human T-lymphocyte progenitors 50 , further supporting the importance of CD99 for physiological intracellular protein transport. CD99 has also been shown to be involved in regulation of apoptosis and differentiation of developing B-and T-lymphocytes 44,45,51 . These studies show that cell death could be induced by the ligation of monoclonal antibodies to CD99. Another study further show that the cell surface level of CD99 affected the developmental trajectories of human hematopoietic progenitors 43 . For example, B-lymphocytes were mainly produced by hematopoietic progenitors with high CD99 levels. Interestingly, recent studies suggest an association between LOY and blood cell counts in human populations 4,28 that might be connected with dysregulation of CD99 in progenitors with LOY. In aggregate, the results from functional studies, together with our results showing an overall reduction of CD99 connected with LOY, suggest that this aneuploidy could have direct impact on leukocyte physiology.
In summary, men carrying circulating immune cells without the Y chromosome display an increased risk for disease and mortality. Here we demonstrate that single cells with LOY show a reduced abundance of CD99 protein, encoded by a gene located on chromosomes X and Y. This cell surface immunoprotein is a key molecule for leukocyte properties such as transendothelial migration, adhesion, differentiation, apoptosis as well as intracellular trafficking of proteins involved in immune surveillance. These results provide proof-of-concept for the detection of a disease associated protein on single cells with LOY and support the hypothesis that LOY in leukocytes could be directly connected with impaired immune functions, via disruption of physiological CD99 biology. This result, however, do not exclude other potential LOY-related disease mechanisms. Nonetheless, a direct role

Methods
Generation of CITE-seq probes. The antibodies used for detection of different target proteins (Table S2) were buffer exchanged to PBS using 7 MWCO Zeba columns (ThermoFisher, USA) and concentrated to 1 µg/ µl using Amicon 30 kDa spin columns (Merck). The antibodies were conjugated to azide modified DNA oligonucleotides (Table S2) using DBCO-NHS ester cross-linker (Sigma-Aldrich) using cross-linker:antibody ratio 30:1 and oligonucleotide:antibody ratio 3.33:1. After confirming successful conjugation with polyacrylamide gel electrophoresis, NaN3 (0.04% final concentration) was added to conjugates to quench further conjugation. Antibody-DNA conjugates were pooled at equal ratios. Unconjugated oligonucleotides were removed using Amicon 100 kDa spin columns (Merck).   www.nature.com/scientificreports/ and incubated with the pooled antibody-conjugates (1 µg of each antibody) as previously described 46 . The study was performed in accordance with relevant guidelines and regulations and was approved by the local research ethics committee in Uppsala, Sweden (Regionala Etikprövningsnämnden i Uppsala (EPN), Dnr: 2013/350) and informed consent was obtained from all participants.
Library preparation and sequencing. Sequencing libraries were prepared using Chromium Single Cell 3' v2 protocol CG00052 (10X Genomics) with modifications described in the detailed protocol CITE-seq_190213 (cite-seq.com/protocol) 46 . Two libraries (from subjects UAD100 and UAD101) were first sequenced in a pilot batch and then re-sequenced together with two additional libraries (from subjects UAD104 and UAD105) in a second batch. The RNA and protein libraries were pooled 19:1 and sequenced on a HiSeq2500 instrument in the pilot study and using NovaSeq S1 flow cell (Illumina, USA) in the second batch, according to manufacturer's instructions.
Bioinformatic analyses. After sequencing, the raw base calls for each sample were de-multiplexed and mapped to the hg19 version of the human genome or the oligonucleotide sequences linked with specific antibody tags using Cellranger v2.0.2 (10X Genomics) 52 . Following this, RNA reads mapping to the human genome were quantified using Cellranger. Reads mapping to the antibody derived tags of each investigated protein were counted using CITE-seq-Count, with the recommended settings for 10X-derived sequencing libraries (https:// github. com/ Hoohm/ CITE-seq-Count). We used R (version 3.6.1) applying the package Seurat (version 3.0.0) 53,54 and the data from all samples were pooled into a single Seurat object for further analyses. The package Future (version 1.12.0, https:// github. com/ Henri kBeng tsson/ future) was used to multi-thread Seurat functions. Quality assessment was performed for each single cell and was based on three following criteria; number of expressed genes, number of unique molecular identifiers (UMI) and percentage of mitochondrial reads. Specifically, to reduce the risk of including droplets containing more than one cell per droplet, all observations of more than 2000 expressed genes were excluded. To avoid inclusion of dead or low quality cells, at least 2500 UMI counts were required. Furthermore, the percentage of reads originating from mitochondrial genome was quantified and only cells showing 1.5-5% mitochondrial content were considered normal. The three cut-offs used in quality check were defined from visualization of data using histograms. For the 14,376 cells passing quality control, we performed normalizations for the RNA and protein assays separately in Seurat. For the RNA, the function Log normalized was used to a scale factor of 10,000, while for the protein assay data we applied the function Centered log-ratio.
Clustering and identification of cell types. The clustering was performed on the RNA-assay using the 1000 most variable features (i.e. expressed genes) based on the Seurat function FindVariableFeatures. To minimize variation due to technical factors, the included features were scaled based on number of UMI, percentage of mitochondrial RNA, library prep-batch and sequencing batch. Principle component analysis was performed to cluster based on only the most explanatory features. Hence, 36 principle components evaluated using functions JackStrawPlot and ElbowPlot and of these, the first 22 were subsequently used for clustering. We implemented FindNeighbors method and FindClusters with 0.6 resolution to identify clusters and visualized the results using Uniform Manifold Approximation and Projection applied in the package UMAP (version 0.3.8, https:// github. com/ tkono pka/ umap). To determine the type of leukocytes within each of the 14 clusters predicted we applied an in-house script using expression of previously known cell type specific markers. Next, the six cell types targeted by the CITE-seq constructs was identified by plotting the protein-derived data upon the RNA-based clusters using the Viridis package (version 0.5.1, https:// sjmga rnier. github. io/ virid is) and validated using heatmaps.
Determination of LOY in single cells. LOY status in each single cell was determined as described previously 2,7 . Briefly, all genes located on the Y chromosome were retrieved from Ensembl (v.99) 55 using the BioMart package (v.2.40.0) 56 . The sum of all features in the RNA-assay, with HGNC-symbols matching those on the Y chromosome, was calculated for each cell. Each sequenced cell with expression of autosomal genes, but without transcripts from genes located in the male-specific region of chromosome Y (MSY) was considered as LOY cells.

Analyses of phenotypical effects in LOY cells. The Seurat wrapper
FindMarkers was used to estimate the average log fold change in RNA and protein abundances between LOY and normal cells. This was done separately for each cell type. The parameters in FindMarkers for minimum fold-change threshold and fraction of cells with any expression was set zero in order to capture the maximal amount of information from all features in the models. A summary of relevant single cell metrics including the number of cells, reads and percentage of LOY per subject is provided in Table S3.
Statistical analyses. For the FindMarkers wrapper, the MAST algorithm (version 1.9.2) was used for the RNA assay, which is specifically developed to handle the zero-inflation aspect of scRNAseq data. Logistic regression was the model implemented for the protein assay, as it did not suffer from zero-inflation. Both FindMarker models were corrected for batch effects (library preparation and sequencing run), number of UMI and percentage mitochondrial reads. All p-values, from each cell type and assay test, were adjusted together using Benjamini-Hochberg correction for multiple testing. Logistic regression models implemented by the glm function in R was used for tests of overall CD99 abundance in relation to LOY in singe cells. The same covariates as used in the FindMarkers tests described above, with the binary outcome described by the model set as LOY-status.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.