Single-cell transcriptional profiles in human skeletal muscle

Skeletal muscle is a heterogeneous tissue comprised of muscle fiber and mononuclear cell types that, in addition to movement, influences immunity, metabolism and cognition. We investigated the gene expression patterns of skeletal muscle cells using RNA-seq of subtype-pooled single human muscle fibers and single cell RNA-seq of mononuclear cells from human vastus lateralis, mouse quadriceps, and mouse diaphragm. We identified 11 human skeletal muscle mononuclear cell types, including two fibro-adipogenic progenitor (FAP) cell subtypes. The human FBN1+ FAP cell subtype is novel and a corresponding FBN1+ FAP cell type was also found in single cell RNA-seq analysis in mouse. Transcriptome exercise studies using bulk tissue analysis do not resolve changes in individual cell-type proportion or gene expression. The cell-type gene signatures provide the means to use computational methods to identify cell-type level changes in bulk studies. As an example, we analyzed public transcriptome data from an exercise training study and revealed significant changes in specific mononuclear cell-type proportions related to age, sex, acute exercise and training. Our single-cell expression map of skeletal muscle cell types will further the understanding of the diverse effects of exercise and the pathophysiology of muscle disease.

through different cell cycle stages, although no specific cell cycle markers are present. As can be seen in Supplementary Figure S10, sequencing depth can vary considerably between cells, even post-normalization. The major outlier among endothelial cell clusters is cluster 7, which contains 205 differentially expressed genes relative to clusters 0 and 1, 89 of which have a higher L2FC than the top marker of differential expression between clusters 0 and 1. The top differentiating marker for cluster 7 is duffy antigen/chemokine receptor (DARC) which has been shown to be exclusively expressed in post-capillary and small collecting venule endothelial cells and is completely absent from other potential endothelial cell populations 6 . This distinction led us to classify cluster 7 as post-capillary venule (PCV) endothelial cells.
One other interesting division of endothelial cell populations that is too slight to reflect a separate cluster is a group of cells in the main endothelial cell population that is lacking expression of AQP1 and ITGA6, two of the main endothelial cell marker genes. We can again look at differentially expressed markers between the AQP1+ and AQP1-populations. We find a set of genes that are overexpressed in AQP1-cells, topped by adenosylmethionine decarboxylase 1 (AMD1), which require further study. The divide between endothelial cells expressing AQP1 and AMD1 can be seen in Supplementary Figure S11a. This set of endothelial cells likely represents another location-dependent subset of endothelial cells as location is a major determinant of their function and gene expression.

LUM+ FAP and FBN1+ FAP Cells
Clusters 2 and 5 both express the two canonical markers for fibro/adipogenic progenitor (FAP) cells: platelet derived growth factor alpha (PDGFRA) and CD34 (Supplementary Figure   S2a). PDGFRA expression is specific to FAPs 7 and, while CD34 is expressed in endothelial cells, it is a distinguishing marker of FAPs from fibroblasts 8 . While both FAP clusters heavily   express collagen types I, III and VI, collagen types IV, XIV and XV are differentially  Cluster 2 expresses a number of collagen-producing cell-specific markers, including apolipoprotein D (APOD) which is primarily found in fibroblasts near blood vessels 9 . Lumican (LUM) regulates collagen fibril assembly and is involved in fibril contractility and decorin (DCN) binds to type 1 collagen fibrils and is expressed in both cluster 2 and cluster 5 10 . Alcohol dehydrogenase 1B (ADH1B) expression is also highly specific to cluster 2 as well as myocilin (MYOC), which plays a role in cytoskeleton structural function and mutations in this gene can be a major cause of glaucoma 11 . The protein myocilin is normally expressed in corneal fibroblasts and is secreted in the aqueous humor of the eye.  13,14 . Procollagen c-endopeptidase enhancer 2 (PCOLCE2) is a cartilage marker as well, with increased expression in neocartilage 15 . Additionally, microfibrillar-associated protein 5 (MFAP5), also known as MAGP2, is often used a synovial cell marker 16 . This is surprising as our muscle biopsy was not located near a portion of muscle tissue that is believed to contain synovial cells or chondrocytes. We label these cells as FBN1+ FAP Cells, while we name Cluster 2 LUM+ FAP Cells to reflect the expression of two of their most distinguishing markers.

Satellite Cells
Cluster 3 expresses two canonical satellite cell markers: paired box protein 7 (PAX7) 17 and myogenic factor 5 (MYF5) 18 . Interestingly, the two genes with the strongest p-values for differential expression for cluster 3 have a less well-known relationship with satellite cells and are in fact two other apolipoproteins: apolipoprotein C1 (APOC1) and apolipoprotein E (APOE).
Both lipoproteins were believed to be primarily expressed in the liver 19 , although recent research has also targeted them for a role in late-onset Alzheimer's disease 20 . APOE has been shown to be expressed in skeletal muscle concentrated at neuromuscular junctions, which could be the source of its risk for neurodegenerative diseases such as Alzheimer's disease and Parkinson's disease 21 .

Pericytes
The three canonical pericyte markers are CSPG4, also known as neuron-glial antigen 2 (NG2), beta-type platelet-derived growth factor (PDGFRB) and melanoma cell adhesion molecule (MCAM) 22 , also known as CD146. While CSPG4 expression is very limited in our sample, each of these genes are most highly expressed in cluster 4. The marker with the strongest p-value is RGS5 whose expression is known to align strongly with PDGFRB and is a pericyte marker as well 23 . Neurogenic locus notch homolog protein 3 (NOTCH3) is also primarily expressed in cluster 4 whose expression in vascular tissue is restricted to mural cells, including pericytes 24 .

NK Cells
Cluster 6 includes the expression of many NK cell-specific markers, such as natural killer cell granule protein 7 (NKG7) 25 , granulysin (GNLY) 26 and granzyme A (GZMA) 27 , all of which are only expressed in this cluster. Chemokine ligands 3,4,5 are all top markers for cluster 6, despite often also being associated with macrophages; however, research has shown all three to be expressed in NK cells as well 28 . NK cells can often be differentiated based on expression of CD16 (FCGR3A) and CD56 (NCAM1) 29 . Supplementary Figure S11b highlights the expression of these two marker genes in this cluster. While FCGR3A is expressed in cluster 6 cells, NCAM1 expression is very rare, which suggests cluster 6 consists of CD16+ CD56dim NK cells.
Interestingly, both of these markers are expressed in other cell populations in our samples: FCGR3A is present in cluster 10 (myeloid cells, which is most probably due to the presence of CD16+ monocytes) and NCAM1 is lightly expressed in cluster 3 (satellite cells) highlighting the complex nature of marker genes in gene expression studies and the importance of context when using cell-type-classifying markers.

T & B Cells
Cluster 8 includes expression of lymphocyte-specific markers such as lymphotoxin beta (LTB) 30 , interleukin-7 receptor-α (IL7R) 31 and L-selectin (SELL) 32 . CD52 is expressed in both monocytes and lymphocytes and we see it present in both clusters 8 and 10 as expected 33 .
Because sequencing depth is limited in single cell data, we do not capture every canonical marker for T and B cells; however, we can still differentiate these two populations based on the markers that are present. Supplementary Figure S11c shows expression of CD3D, a T cell marker gene 34 , and MS4A1, a B cell marker gene 35 , within cluster 8. There is no overlap between cells expressing these two genes so while these cells cannot be divided by our clustering algorithm because of their similarity in overall gene expression, we can still differentiate between these two cell types.

Smooth Muscle Cells
Cluster 9 exhibits expression of a number of top markers for both smooth muscle cells and myofibroblasts. Both alpha smooth muscle actin (ACTA2) 36 and transgelin (TAGLN) 37 are well known markers for myofibroblasts; however, they are both also expressed in smooth muscle cells. What does differentiate these cell types is expression of myosin, which is only seen in smooth muscle 38

Myeloid Cells
The smallest cluster in our sample is cluster 10 consisting of myeloid cells. This cluster likely includes multiple cell types of myeloid origin, including granulocytes, monocytes and mast cells, but because their individual populations are so small and their gene expression so similar, they cannot be divided nor treated as individual cell types. Top markers include S100 calcium binding proteins A8 and A9 which are secreted by active monocytes, granulocytes and neutrophils 39 . Lysozyme (LYZ), leukocyte-specific transcript 1 (LST1), allograft inflammatory factor 1 (AIF1) and m-ficolin (FCN1) are all top markers for cluster 10 and are expressed specifically in myeloid-lineage cells [40][41][42][43] .

Comparison of blood-and muscle-resident immune cells
Fifty marker genes were found for each blood immune cell type in the IRIS dataset using the default CellCODE method, with a cutoff of 2.0 (as used previously) 44 . Then, for each muscle-resident immune cell type (neutrophils, monocytes, B cells, T cells, and NK cells), we normalized its transcriptomic profile in the context of the other blood immune cell types (e.g., for neutrophils, we normalized the muscle-resident myeloid cells relative to the transcriptome of blood monocytes, B cells, T cells, and NK cells) using quantile normalization. We then found fifty marker genes for the muscle-resident cell type relative to the other blood immune cell types using the default CellCODE method with a cutoff of 2. Finally, we compared the fifty marker genes for the blood immune cell type to the fifty marker genes for the muscle-resident immune cell type and found the marker genes that overlap. Figure S1: Cell type-specific expression of dissociation-affected genes. Stacked histograms of the fraction of cells for each cell type that contain a given fraction of dissociation-affected genes. Satellite cells (gold) and myeloid cells (yellow) possess the highest-skewing distribution of dissociation-affected gene fractions.