Immuno-Detection by sequencing (ID-seq) enables large-scale high-dimensional phenotyping in cells

Cell-based small molecule screening is an effective strategy leading to new medicines. Scientists in the pharmaceutical industry as well as in academia have made tremendous progress in developing both large-scale and smaller-scale screening assays. However, an accessible and universal technology for measuring large numbers of molecular and cellular phenotypes in many samples in parallel is not available. Here, we present the Immuno-Detection by sequencing (ID-seq) technology that combines antibody-based protein detection and DNA-sequencing via DNA-tagged antibodies. We used ID-seq to simultaneously measure 84 (phospho-)proteins in hundreds of samples and screen the effects of ~300 kinase inhibitor probes on primary human epidermal stem cells to characterise the role of 225 kinases. Our work highlighted a previously unrecognized downregulation of mTOR signaling during differentiation and uncovered 13 kinases regulating epidermal renewal through distinct mechanisms.


Introduction
Quantification of protein levels and phosphorylation events is central to investigating the cellular response to perturbations such as drug treatment or genetic defects. This is particularly important for cell-based phenotypic screens to discover novel drug leads in the pharmaceutical industry.
However, the complexity of biological and disease processes is not easily captured by changes in individual markers. Currently, a major limitation is the trade-off between the number of samples and the number of (phospho-)proteins that can be measured in a single experiment. For instance, immunohistochemistry 1 and immunofluorescence 2 allow high-throughput protein measurements using fluorescently labelled antibodies. However, these methods are limited in the number of (phospho-)proteins that can be measured simultaneously in each sample due to spectral overlap of the fluorescent reporter dyes. One commercial solution, Luminex®, has circumvented this limitation by using color barcoded antibody loaded beads and allows multiplexing of some 50 proteins per sample [3][4][5] . However, this approach requires cell lysis and does currently not include phospho-specific signaling detection. Several alternative approaches based on antibody-DNA conjugates have been developed in recent years 6,7 . For instance, Ullal et al. use the Nanostring system to quantify 88 antibody-ssDNA oligo conjugates in fine needle aspirates 6 . Although powerful, this strategy is not well suited for high-throughput applications. Furthermore, the commercial Proseek® strategy entails a proximity extension assay using pairs of ssDNA oligo coupled antibodies in combination with quantitative PCR as a read-out 7 . This assay is generally performed on cell lysates and currently there are no assays for phospho-proteins available. In addition, several other recently described antibody-DNA conjugate based methods that use high-throughput sequencing as a read-out detect only a few epitopes or at low sample throughput [8][9][10][11][12] , limiting their scope. Here, we present Immuno-Detection by sequencing (ID-seq) as a streamlined universal technology for measuring large numbers of molecular and cellular phenotypes in many samples in parallel. We show that high-throughput sequencing of antibody coupled DNA-barcodes allows accurate and reproducible quantification of 84 (phospho-)proteins in hundreds of samples simultaneously. We applied ID-seq in conjunction with the Published Kinase Inhibitor Set (PKIS) to start investigating the role of >200 kinases in primary human epidermal stem cell renewal and differentiation. This demonstrated a previously unrecognized downregulation of mTOR signaling during differentiation and uncovered 13 kinases regulating epidermal renewal through distinct mechanisms.

Precise and sensitive multiplexed (phospho-)protein detection via sequencing antibody-coupled DNAtags.
We designed the Immuno-Detection by sequencing (ID-seq) technology to simultaneously measure many proteins and post-translational modifications in high-throughput (Figure 1a). At the basis of IDseq lie antibodies that are labelled with a double-stranded DNA-tag 13 containing a 10 nucleotide antibody-dedicated barcode and a 15 nucleotide Unique Molecular Identifier (UMI, Figure S1, Supplementary note 1). Each antibody signal is now digitized and non-overlapping, allowing many antibodies to be combined and measured simultaneously. Following immunostaining and washing, DNA-barcodes are released from the antibodies through reduction of a chemically cleavable linker 13 and a sample-specific barcode is added through PCR. Finally, samples are pooled to prepare an indexed sequencing library (Figures 1a, S1 and Supplementary note 1). This triple barcoding strategy facilitates straightforward incorporation of hundreds (and potentially thousands) of samples per experiment and achieves count-based quantification ( Figure S2 and Supplementary note 2) over four orders of magnitude ( Figure S3). Furthermore, analyses of 17 antibody-DNA conjugates using singleplex and multiplexed measurements show high correspondence (R = 0.98 ± 0.046), demonstrating that multiplexing does not interfere with antibody detection (Figure 1b). Moreover, the ID-seq library preparation procedure is reproducible (R = 0.98, Figure 1c) and precise, as determined using nine distinct DNA-tag sequences per antibody, serving as technical replicates (R > 0.99, Figure S4). Thus, the ID-seq technology allows precise and sensitive multiplexed protein quantification through sequencing antibody-coupled DNA-tags.
Constructing a 70 antibody-DNA conjugate ID-seq panel.
In order to fully exploit the multiplexing capacity of ID-seq we obtained 111 antibodies targeting intracellular and extracellular epitopes. All of the selected antibodies were validated for specificity in immunofluorescence (IF), immunohistochemistry (IHC) and/or fluorescence activated cell-sorting (FACS) applications by the vendor (see supplementary table 1 for details and links to datasheets). As these applications include cell fixation we reasoned that this selection would increase the chance of identifying antibodies that are suitable for ID-seq. From these 111 antibodies, 84 showed robust signals in In-Cell-Western /IF and/or immuno-PCR experiments using antibody dilutions and/or IgG control antibodies. Next, to increase our confidence in this set of antibodies, we performed a series of experiments using IF, immuno-PCR and/or ID-seq as a read-out to verify that the tested antibodies show the expected signal dynamics in response to specific perturbations. These perturbations included: induction of differentiation, stimulation with epidermal growth factor (EGF) or bone morphogenetic proteins (BMP), induction of DNA damage signalling with mitomycin C or hydroxyurea, as well as inhibition of EGF and BMP signalling with the small molecule inhibitors AG1478 and DMH1, respectively. In addition, signals of a subset of phospho-specific antibodies were decreased upon phosphatase treatment of fixed cell populations. Taken together, 64 out of the 84 antibodies exhibited the expected protein or phospho-protein dynamics in our primary skin stem cells in these experiments, whereas the rest was stable (Supplementary table 1 and Figure S5), indicating their utility in ID-seq. These 84 antibodies displayed ~75-fold signal over no-cell background, a measure of technical noise ( Figure S6). Moreover, we found that the variability of the signals from a subpanel of 69 antibody-DNA conjugates was below 20% among 14 biological replicates (coefficient of variation, CV < 0.2, Figure 1d

Exploring (phospho-)protein dynamics in human epidermal stem cells with ID-seq.
Primary human epidermal stem cells (keratinocytes) depend on active epidermal growth factor receptor (EGFR) signalling for self-renewal in vitro and vivo 14 . We inhibited this pathway using the potent and selective inhibitor AG1478 (10 μM, 48 hrs) to determine whether ID-Seq recapitulates keratinocyte biology. In these experiments, we would expect to at least observe dynamic changes in downstream EGFR signalling pathway activity, as well as in the expression of differentiationassociated proteins. To analyse the effects of AG1478 treatment on each antibody signal, we developed a generalised linear mixed (glm) model that takes into account the negative binomial distribution of ID-seq count data and incorporates potential sources of variation (e.g., replicates, batches, sequencing depth). This model derives the effect ('estimate') of treatment on each antibody, followed by a likelihood ratio-test to determine the significance of the effect (Supplementary note 3). We identified 13 increased and 7 decreased (phospho-)proteins upon AG1478 treatment (p < 0.01, Figure 1e). Up-regulation of the known differentiation markers transglutaminase 1 (TGM1) and Notch1 confirmed successful differentiation. Next, we projected the estimates and significance levels of our ID-seq results onto a literature derived signalling network ( Figure 1f, see Figure S7a for node identities). As expected, EGFR pathway activity was down regulated upon AG1478 treatment. We also identified effects on the activity of several other pathways, including the bone morphogenetic protein (BMP) and Notch cascades, which are known players in epidermal biology [15][16][17][18] . RT-qPCR analysis revealed that these effects arose from changes in mRNA expression of BMP ligands and Notch receptors ( Figure S7b). We confirmed activation of the Replicate screens were highly correlated (R = 0.98) and had low UMI duplicate rates (1.2%), indicating high data quality ( Figure S8). We annotated the effect and its significance for each kinase inhibitor probe on each of the measured molecular phenotypes (as measured by our antibody conjugates) and used the results of this analysis to interrogate the effect of kinase inhibition on skin stem cell biology.
A key advantage of the high multiplexing capacity of ID-seq is its potential to measure multiple antibodies reflecting a given biological process simultaneously. We anticipate this to result in a more reliable and comprehensive measurement of the affected processes than quantification of a single marker. We exploited the multiplexed nature of the ID-seq data by combining the individual phenotypic ID-seq measurements into principal components (PCs) through principal component analysis. This essentially aggregates the phenotypes that jointly explain independent fractions of variation in the data into a single score that represents the effect of the inhibitory probes on the skin stem cells. To determine the underlying processes associated with each PC per probe, we correlated and clustered the measured phenotypes with the top 4 principal components explaining 38% of total variation in the dataset (Figures 2b, S9). As expected from a screen using kinase inhibitor probes a considerable fraction of this variation is associated with effects on signalling pathway activity phenotypes, as represented by PC1 (Figure 2b). The second largest principal component, PC2, strongly correlates with proteins that are significantly up-regulated upon differentiation, including the known marker proteins TGM1 and Notch1 (Figures 2b, 15,27 ). Interestingly, Cyclin B1, GAPDH and Smad3 were also included in this cluster. We confirmed that these phenotypes truly reflect keratinocyte differentiation, by forcing the cells to differentiate using the EGFR inhibitor AG1478 for 48 hours and subjecting these samples to ID-seq (n=6). Indeed, protein levels of TGM1, Notch1, Smad3, CyclinB1 and GAPDH are upregulated upon differentiation, corroborating the results of our screen ( Figure 2c). To validate this further, we selected 18 probes that showed high PC2-scores from the PKIS library for colony formation experiments, the gold standard in vitro assay for epidermal stem cell activity ( Figure S11a,b, 28 ). Automated image analysis was used to quantify colony number, colony size (and size distribution), as well as the level of the differentiation marker TGM1 level per colony for each probe (n = 3 replicates). Fifteen out of the eighteen tested probes showed a significant effect on at least one of the measured colony phenotypes ( Figure S11a,b). This indicates that that high-PC2 probes indeed affect epidermal cell colony forming capacity and authenticates the PC2-score as a bona fide reflection of differentiation.

Processes distinguishing differentiating and renewing epidermal stem cells.
We gathered that the top and bottom 10% of PC2-ranked probes are likely to distinguish the differentiating (high-PC2) and non-differentiated (low-PC2) epidermal cell states. To determine which cellular processes are different between these two cell states, we determined the molecular phenotypes that display a significant in-or decrease (p < 0.01, 1% FDR) between cell-populations with high-PC2 versus low-PC2. This revealed involvement of the Wnt pathway (measured by Fzd3 and phosphorylated-LRP6), MAPK signalling (phospho-p38, phospho-SRC, phospho-cFOS and phospho-RSK), integrin-mediated adhesion (phospho-FAK) and the mammalian target-of-rapamycin (mTOR) pathway (phospho-mTOR and phospho-S6) in epidermal renewal and differentiation (Figure 3a and S10). Plotting the PC2-scores versus the measured phenotypes (Figure 3b-d), revealed that high-PC2 probes indeed have increased differentiation markers (Figure 3b), but also have increased cell-cycle arrest markers (Figure 3c). Indeed, keratinocyte differentiation is associated with a G2/M cell cycle arrest in vivo and vitro [29][30][31] . Strikingly, all high PC2 probes have a strong down-regulation of the mTOR pathway activity, suggesting an integral role in epidermal biology (Figure 3d). Thus, the ID-seq dataset captures relevant molecular phenotypes associated with keratinocyte differentiation.

Identification of novel kinases involved in epidermal stem cell renewal.
We reasoned that the inhibited kinases that strongly associated with PC2 are likely involved in epidermal stem cell renewal, as their inhibition leads to increased differentiation and cell-cycle arrest. To determine which kinases are inhibited by probes with high PC2 scores, we made use of available data on the biochemical selectivity and potency of the PKIS compounds towards 225 individual kinases. 26 We applied outlier statistics to assign a set of inhibitory probes to each of these 225 kinases (p < 0.01, supplementary table 2). Subsequent Gene Set Enrichment Analysis (GSEA), identified 13 probe-sets enriched (p < 0.01, 1% FDR) in PC2 (i.e., probes inducing cell differentiation and/or cell cycle arrest) and of which the corresponding kinase is expressed in keratinocytes (Figures 4, S12 and S13). Our analysis returned the EGFR as the top hit, reflecting its key importance in epidermal stem cell renewal in vitro and in vivo. The PKIS probe library was designed as a platform for lead discovery and to provide chemical scaffolds for further medicinal chemistry [24][25][26] .
Consequently, each probe may inhibit more than one kinase ( Figure S14a). Importantly, the probes that inhibit the EGFR are distinct from those inhibiting the other kinases, indicating that the identification of these 12 kinases did not result from cross-reactivity of the probes towards the EGFR ( Figure S14b). The list of other identified kinases included PRKD3 and FYN, two intracellular kinases shown to impact epidermal biology [32][33][34][35][36] . Moreover, immunohistochemical staining of human skin sections showed that the expression of the EGFR, RSK1, and PHKG1 is restricted to cells residing in the epidermal stem cell niche, whereas NUAK1 is expressed throughout the epidermis ( Figure S15), consistent with our findings that inhibition of these kinases affects epidermal stem cell biology.
Taken together, ID-seq identified both known and previously unrecognized kinase effectors of epidermal renewal across four major kinase families (Figure 2a, yellow nodes).

Four subclasses of kinases affect epidermal stem cells through distinct mechanisms.
The rich information contained in the ID-seq dataset may be used to explore the underlying molecular mechanism of the kinase inhibition. For each kinase-set, we calculated the mean effect on each of the measured molecular phenotypes. These kinase-set level molecular profiles were used for both hierarchical clustering and PCA analysis, separating these 13 enriched kinases into 4 distinct subgroups (Figure 5a and b). The EGFR and its immediate downstream kinases, LYN and FYN, form a tight cluster, indicating that this grouping reflects molecular mechanistic relationships. In turn, this predicts that the kinases in the other subgroups may function through mechanisms that are different and potentially independent from the EGFR. If this were indeed the case, we would expect inhibition of these kinases to result in distinctive effects on (subsets of) the interrogated molecular phenotypes.
We compared the inhibitory effects (average model derived estimate +/-SEM) on the 20 phenotypes distinguishing differentiated and renewing epidermal cells (as defined in Figure 2c) for exemplars of these 4 subgroups of kinases. Ranking these phenotypes based on the effect of EGFR inhibition and plotting the data of the other kinases in the same order showed that the overall trend in the measured phenotypes was conserved among the kinases, reflecting that these probe-sets indeed affect epidermal cell differentiation (figure 5c). Importantly, most of the kinases showed statistically significant deviations (p < 0.05) in distinct subsets of phenotypes compared to the EGFR, indicating that they function through discrete mechanisms to regulate epidermal renewal (Figure 5c). Together, this analysis shows that ID-seq allows highly multiplexed screening of hundreds of chemical probes, identifying kinases involved in epidermal renewal and at the same time provides information on the underlying molecular mechanism to categorize the identified effector kinases.
Finally, to verify the importance of the kinases representing these 4 subgroups in epidermal selfrenewal, we obtained proven highly selective inhibitors of the EGFR, RSK1-4, p70S6K, NUAK1, and DYRK1A and examined their effect on epidermal stem cell function using in vitro colony formation assays (Figure 6a and b). For this, stem cells were first allowed to adhere to the culture plate containing a layer of feeder cells. Twenty-four hours after seeding the cultures were exposed to the kinase inhibitors in a broad range of concentrations for the remainder of the culture period. Subsequent automated imaging based quantification of the resulting colonies revealed that inhibition of the EGFR, RSK1-4, and DYRK1A decreased the self-renewal capacity of the epidermal stem cells (as determined by colony size) and stimulated the expression of the late differentiation marker TGM1. In contrast, p70S6K and NUAK1 inhibition resulted in decreased renewal but did not increase TGM1 expression. These results highlight the importance of these kinases in epidermal renewal.

Discussion
Cell-based phenotypic screens are frequently used in academia and the pharmaceutical industry to identify leads for drug development 37,38 . However, obtaining insight into the molecular mechanism of action of the selected compound is generally time consuming and expensive [39][40][41] . We developed the Immuno-Detection by sequencing (ID-seq) technology as an approach to facilitate high throughput highly multiplexed molecular phenotyping. We showed that ID-seq is precise, sensitive and applicable to large numbers of samples. We applied ID-seq on primary human epidermal keratinocytes in conjunction with the Published Kinase Inhibitor Set (PKIS) and identified 13 kinases that are important for epidermal stem cell function. Since many of the targeted therapies that are approved or in clinical trials are directed against kinases 41,42 , our findings serve as an important resource to identify potential drugs that could lead to severe skin toxicity. The straightforward ID-seq workflow was designed to be compatible with automation for applications in industry and will enable many potential mechanisms of action to be assayed for 100s, potentially even 1000s of compounds in a single experiment. In principle, ID-seq can be applied to any cell system, any perturbation and with any validated high-quality antibody, making it a flexible solution for large-scale high-dimensional phenotyping.  Conjugation of antibodies to dsDNA. Antibodies and dsDNA were functionalised and conjugated as described 44 . See Supplementary Table 1  Subsequently, a buffer exchange into PBS (pH7.4) was performed using two Zeba-spin desalting columns. The size of the eluted DNA was checked on an agarose gel, confirming that all unconjugated DNA was removed using this purification approach. Supplementary   Table 1. In brief, we selected antibodies suitable for immuno-histochemistry or immunofluorescence validated by manufacturer. The DSHB antibodies were produced and purified as described 44 . We performed antibody-dilution series with all antibodies on our primary human keratincoytes to show antibody-dependent signal via immuno-fluorescence. Moreover, we coupled antibodies to DNA barcodes as described in our conjugation and immuno-PCR protocol 44 Table 1) or ID-seq ( Figure S5). ID-seq data analysis. Sequence data from the NextSeq500 (Illumina) was demultiplexed using bcl2fastq software (Illumina). The quality of the sequencing data was evaluated using a FastQC tool (version 0.11.4 and 0.11.5, Babraham Bioinformatics). Then, all reads were processed using our dedicated R-package (IDSeq, Supplementary Note 2). In short, the sequencing reads were split using a common "anchor sequence" identifying the position of the UMI sequence, Barcode 1 (antibody specific) and Barcode 2 (well specific) sequence. After removing all duplicate reads, the number of UMI sequences were counted per barcode 1 and 2. Finally, barcode 1 and barcode 2 sequences were matched to the corresponding antibody and well information.

Immuno-staining with
Using R-package DESeq2 45 , we calculated normalisation factors (Estimated Size factor) to account for differences in sequencing depth per sample. Using lme4, we analysed the effect of a specific condition using a linear mixed effect model (Supplementary Note 3).
For each antibody in the PKIS screen the effect and significance of each treatment was determined as described in Supplementary Note 3. Then the 'signed p-value' was derived from the sign of the model estimate (positive/negative) and the p-value. This signed p-value was used as input for the PCA analyses (Figure 2b and S9). To calculate effects of probe-sets per phenotype, the mean model estimate was calculated. These means were used for subsequent PCA analysis and 'phenotype profiles' described in Figure 5.
Immuno-PCR experiments. The immuno-PCR experiments were performed as described previously in our paper on antibody-DNA conjugates 44 . In short, each antibody was conjugated to dsDNA, and used in an immunostaining as described. DNA was released using 10 mM DTT in BBS pH8.4 and measured by quantitative PCR using iQTM SYBR Green Supermix on CFX 96 machine. The 2^-Ct values were used to calculate the mean signal and standard deviation from 4 biological replicates.
The Pearson correlation between these immuno-PCR and the multiplexed ID-seq signal was calculated using the mean. Sequencing was performed using the NextSeq500 from Illumina.
Colony formation assay. In 6 wells plate, 200.000 feeders (J2-3T3) cells were seeded in DMEM (with 10% BS and 1% pen/strep). After one day, feeder cells were inactivated by 3-hour treatment with mitomycin C. After thorough washes with DMEM, 1000 keratinocytes were seeded into each well 31 .
The following day, treatment was started (day 0) by refreshing medium and addition of the indicated concentration of compound, or DMSO as a vehicle control. Cells were grown in the presence of compounds for eight more days, and the medium was refreshed on days 2 and 5. Rocki was present until day 2 of the treatment. Cells were fixed, stained with TGM1 specific antibodies and scanned as described before 27 . Raw images from the LiCor Odessey system were processed with CC Photoshop and CellProfiler with consistent settings. Data obtained via automatic counting and imaging analysis via CellProfiler was analysed and visualized in the R programming language.     Figure S12). Probes (points in the graph) are ranked according to PC2 (x-axis), and point size shows % of inhibition for the indicated kinase by the probe.