Global quantitative analysis of the human brain proteome and phosphoproteome in Alzheimer’s disease

Alzheimer’s disease (AD) is characterized by an early, asymptomatic phase (AsymAD) in which individuals exhibit amyloid-beta (Aβ) plaque accumulation in the absence of clinically detectable cognitive decline. Here we report an unbiased multiplex quantitative proteomic and phosphoproteomic analysis using tandem mass tag (TMT) isobaric labeling of human post-mortem cortex (n = 27) across pathology-free controls, AsymAD and symptomatic AD individuals. With off-line high-pH fractionation and liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) on an Orbitrap Lumos mass spectrometer, we identified 11,378 protein groups across three TMT 11-plex batches. Immobilized metal affinity chromatography (IMAC) was used to enrich for phosphopeptides from the same TMT-labeled cases and 51,736 phosphopeptides were identified. Of these, 48,992 were quantified by TMT reporter ions representing 33,652 unique phosphosites. Two reference standards in each TMT 11-plex were included to assess intra- and inter-batch variance at the protein and peptide level. This comprehensive human brain proteome and phosphoproteome dataset will serve as a valuable resource for the identification of biochemical, cellular and signaling pathways altered during AD progression.

proteins across AsymAD and AD stages of disease may reveal defects in kinase-or phosphatase-mediated signaling pathways and biomarkers involved in AD progression.
Advances in liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) now facilitate high-throughput detection and quantification of thousands of proteins in a given sample. Data-dependent acquisition (DDA) or shotgun approaches are the traditional methods for proteomics workflows 18 . However, a drawback of DDA approaches is that high abundance peptides are biased towards selection for tandem MS/MS and subsequent identification 19 . To improve the detection of the low abundance peptides and enhance the depth of the proteome, different off-line fractionation methods have been used to reduce the complexity of peptide mixtures in samples prior to LC-MS/MS analysis. Methods including two-dimensional gel electrophoresis 20,21 , strong cation exchange (SCX), electrostatic repulsion-hydrophilic interaction chromatography (ERLIC) 22 , and high-pH reversed-phase chromatography 23,24 are used to increase peptide identification by separating peptides in an orthogonal dimension 25 . With the advancement of multiplex isobaric tandem mass tags (TMT), off-line fractionation and high-resolution MS, proteomic datasets are beginning to rival the depth and breadth of transcriptomic datasets 22,26 . Moreover, integrative proteomic and transcriptomic analyses in an AD post-mortem brain cohort suggests that transcriptome-and proteome-wide analyses can generate both complementary and unique information 5 . Proteomic analyses also offer the important opportunity to identify disease-specific PTMs that may participate in key pathological processes, and potentially serve as novel biomarkers and therapeutic targets.
Here we utilized a multiplex TMT MS-based proteomic approach, using similar protocols established by the Clinical Proteomics Tumor Analysis Consortium (CPTAC) 23 , to comprehensively quantify the total proteome and phosphoproteome of human post-mortem cortical cases (n = 27) across pathology-free controls, AsymAD and symptomatic AD individuals. This led to the quantification of 11,378 unique protein groups, as well as 48,992 phosphorylated peptides representing 33,652 phosphosites. This dataset can serve as a valuable resource to help researchers elucidate the complexity of AD as it relates to proteomic signatures found in post-mortem human brain.

Methods
Human brain tissue. Human brain post-mortem tissues from the dorsolateral prefrontal cortex (Frontal Cortex, Brodmann Area 9) were obtained from the Emory Alzheimer's Disease Research Center (ADRC) brain bank. All procedures for ADRC are approved by Emory University Institutional Review Board (IRB), and a written informed consent form is obtained before tissue collection. In accordance with Emory University policy, the use of control postmortem tissues was considered exempted research. Postmortem neuropathological evaluation of Aβ plaque distribution was performed according to the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) criteria 27 , while the extent of neurofibrillary tangle pathology was assessed in accordance with the Braak staging system 28 . In total, 27 samples from 3 groups (n = 10 control, n = 8 AsymAD, and n = 9 AD) were used for brain proteome and phosphoproteome analyses. All case metadata, including disease state, age of death, Post-mortem interval (PMI), sex, and apolipoprotein (ApoE) genotype are listed in sample traits file 29 (Supplementary Tables 1).
Brain tissue homogenization and protein digestion. Procedures for tissue homogenization were performed essentially as previously described 22 . Approximately 100 mg (wet tissue weight) of brain grey matter tissue was homogenized in 500 μL of 8 M urea lysis buffer (8 M urea, 100 mM Na 2 HPO 4 , pH 8.5) with HALT protease and phosphatase inhibitor cocktail (ThermoFisher) using a Bullet Blender (NextAdvance). The samples were homogenized for two full 5 min cycles at 4 °C with ~100 μL of stainless-steel beads (0.9 to 2.0 mm blend, NextAdvance). The lysates were transferred to new Eppendorf LoBind tubes followed by 3 cycles of sonication consisting of 5 s of active sonication at 30% amplitude with 15 s incubation periods on ice in between sonication pulses. Samples were then centrifuged for 5 min at 15,000 g and the supernatant was transferred to a new tube. Prior to further processing, protein concentration and integrity were tested by bicinchoninic acid (BCA) assay (Pierce) and SDS-PAGE, respectively. For protein digestion, 500 μg of each sample was aliquoted and volumes were normalized with additional lysis buffer. Samples were reduced with 1 mM dithiothreitol (DTT) for 30 min, followed by 5 mM iodoacetamide (IAA) alkylation in the dark for another 30 min. Samples were diluted 4-fold with 50 mM triethylammonium bicarbonate (TEAB) before incubating with Lysyl endopeptidase (Wako) at 1:100 (w/w) for 12 hr. Trypsin (Promega) was then added at a 1:50 (w/w) ratio and digestion was carried out for another 12 hr after the urea concentration was diluted to 1 M with 50 mM TEAB. The peptide solutions were desalted with a C18 Sep-Pak column (Waters). Briefly, the Sep-Pak columns were activated with 3 × 1.5 mL of methanol, then equilibrated with 6 × 1.5 mL 0.1% triflouroacetic acid (TFA). The samples were loaded after acidification to a final concentration of 1% formic acid (FA) and 0.1% TFA. Each column was washed with 6 × 1.5 mL 0.1% TFA. Elution was performed with 2 × 1.5 mL 50% acetonitrile.
Tandem mass tag (TMT) peptide labeling. A 600 µL aliquot from each sample was pooled and the mixture was divided into 6 global internal standard (GIS) samples with a total volume of 2400 µL each, consistent with our previous work 22 , and peptide solutions were dried by vacuum (Labconco). All 27 samples from 3 groups and 6 GIS were divided and labeled using two sets of 5 mg 11 plex TMT reagents (Thermo Scientific A34808, Lot No for TMT 10-plex: SI258088, and 131 C channel: SJ258847). The batch arrangement is provided in Supplementary  Table 2. Briefly, each of the TMT reagents were dissolved in 256 μL anhydrous acetonitrile and the same channel was combined together from two 5 mg reagents. The samples were reconstituted in 400 μL of 100 mM TEAB buffer and mixed with 3.2 mg (164 μL) of the corresponding labeling reagent channel. The reaction was incubated for 1 hr and subsequently quenched with 32 μL of 5% hydroxylamine (Pierce). For each TMT plex, labeled peptides from all 11 channels were mixed and desalted with a 500 mg Sep-Pak column (Waters). The labeled peptide mixture was eluted in 4.5 mL of 50% acetonitrile and dried by vacuum.
IMAC phosphorylated peptide enrichment. For phosphorylated peptide enrichment, 95% of the 24 high-pH fractions were further combined into 12 fractions in an alternating manner (1 and 13, 2 and 14, etc.). Peptide amounts were assumed to be equally distributed in all fractions. The IMAC enrichment method was performed according to CPTAC protocol with some minor modifications 23 . Briefly, 1200 μL of slurry, in which the beads/solvent ratio is 1:1 (v/v), was utilized for one batch of TMT fractions. Beads were stripped of nickel with 8 mL of 100 mM EDTA and then equilibrated with 8 mL of 50 mM FeCl 3 both by end-to-end rotation for 30 min. To remove excess Fe 3+ ion, beads were washed with 3 × 8 mL of water and resuspended in 2.4 mL of 1:1:1 (v/v/v) ratio of acetonitrile/methanol/0.01%(v/v) acetic acid. The beads were re-rinsed with 2.4 mL of 100% acetonitrile/0.1% TFA and divided into 12 tubes. The supernatant was removed before the peptide mixture was added. All 12 dried fractions were reconstituted in 0.4 mL of 50% acetonitrile/0.1% TFA and then diluted 1:1 with 100% acetonitrile/0.1% TFA to obtain a final 75% acetonitrile/0.1% TFA peptide solution at a concentration of 0.5 μg/μl. The peptide mixture was incubated with treated beads for 30 min with end-to-end rotation. Enriched IMAC beads were resuspended in 100 μL of 80% acetonitrile/0.1% TFA before the stage tips were conditioned. Stage tips were equilibrated with 2 × 100 μL methanol washes, 2 × 100 μL 100% acetonitrile/0.1% trifluoroacetic acid washes, followed by 2 × 100 μL of 1% FA washes. The IMAC bead slurry was loaded onto the stage tips and washed with 3 × 100 μL of 80% acetonitrile/0.1% TFA, then 3 × 100 μL of 1% FA. The phosphorylated peptides were released from IMAC beads by 3 × 100 μL 500 mM dibasic sodium phosphate (Na 2 HPO 4, Sigma, S9763), pH 7.0, and washed by 3 × 100 μL 1% FA. The phosphorylated peptides were eluted from stage tips by 3 × 100 μL 50% acetonitrile/0.1% FA. The phosphorylated peptide solutions were dried with vacuum.

LC-MS/MS and TMT data acquisition on an Orbitrap Lumos mass spectrometer. Both proteome
and phosphoproteome samples were run on a Fusion Lumos equipped with a NanoFlex nano-electrospray source (ThermoFisher). The same volume of loading buffer (19 μL of 0.1% TFA) was added to each of the fractions assuming equal distribution of peptide concentration across all 24 proteomic subfractions. Therefore, an equal 2 μL (1 μg equivalent) of each fraction was loaded for proteomic analysis. It was assumed that phosphorylated peptides were ~ 1% (w/w) of all peptides. The same volume of loading buffer (7 μL, 0.1% TFA) was added to all IMAC elution samples, and of this, 2 μL (1 μg equivalent) was analyzed by mass spectrometry. All proteome and phosphoproteome samples were separated on 25 cm long (75 μm ID) fused silica columns (New Objective, Woburn, MA) packed in-house with 1.9 μm Reprosil-Pur C18-AQ resin (Dr Maisch). All fractions were eluted over a 140 minute gradient using an Easy nLC 1200 (Thermofisher). The gradient started with 1% buffer B (A: water with 0.1% formic acid and B: 80% acetonitrile in water with 0.1% formic acid) and went to 7% in 3 minutes, then increased from 7% to 30% in 137 minutes, then to 95% within 5 minutes and finally staying at 95% for 25 minutes. The mass spectrometer was operated in top speed mode with 3 second cycles. Both the MS and MS/ MS scans were collected in the Orbitrap. The full scan was performed with a range of 375-1500 m/z, a nominal resolution of 120,000 at 200 m/z, automatic gain control (AGC) at 4 × 10 5 , a 50 ms max injection time and a radio frequency (RF) lens setting of 30%. Higher-energy collision dissociation (HCD) MS/MS scans settings were the following: resolution of 50,000, AGC at 1 × 10 5 , isolation width of 0.7 m/z, max injection time of 105 ms, and a collision energy of 38%. Only charge states from 2 + to 7 + were chosen for tandem MS/MS. All resulting raw files (n = 108) are provided 30 . protein identification and quantification. Raw data files obtained from the Orbitrap Fusion were processed using Proteome Discoverer ™ (version 2.3). MS/MS spectra were searched against the UniProt Knowledgebase (UniProtKB) Human proteome database (downloaded in 2015) with 90,411 total sequences as previously reported 22 . UniProtKB is a comprehensive 31 and widely used database 32 , which contains both reviewed Swiss-Prot and predicted TrEMBL sequences 33 . Since the additional depth provided by off-line bRPLC enables the sequencing of rare protein isoforms, this complete database will give us full potential for protein identification. A few peptides were also manually added to the database 34 , including: one APOE 2/3-specific peptide and one APOE 3/4-specific peptide which allows APOE proteotyping of samples; 4 non-tryptic C-terminal Aβ-peptides specific to Aβ38 (GAIIGLMV), Aβ40 (GAIIGLMVGGVV), Aβ42 (GAIIGLMVGGVVIA), and Aβ43 (GAIIGLMVGGVVIAT). Sequences that map to tau microtubule-binding repeat (MTBR) domains were also set as an additional entry encompassing residues 224-370 (tau 2N4R Isoform, 1-441) in the Uniprot sequence, while all tau isoform sequences were modified by removing MTBR peptides and replicated as new "deltaMTBR" entries 35,36 . The Aβ sequence (corresponding to residues 1-43) within the canonical APP isoform (P05067) was also excluded. Separation and quantification of these peptide sequences facilitated the investigation of APOE www.nature.com/scientificdata www.nature.com/scientificdata/ allele, Aβ peptide and MTBR peptide-specific regulation of biology in AD datasets 35 . The respective FASTA database used in this study was deposited on Synapse (syn20820455). The SEQUEST HT search engine was used and parameters were identical for both total and IMAC proteomes and specified as the following: fully-tryptic specificity; maximum of two missed cleavages; minimum peptide length of 6; fixed modifications for TMT tags on lysine residues and peptide N-termini ( + 229.162932 Da) and carbamidomethylation of cysteine residues ( + 57.02146 Da); variable modifications for oxidation of methionine residues ( + 15.99492 Da); deamidation of asparagine and glutamine ( + 0.984 Da); phosphorylation of serine, threonine and tyrosine ( + 79.9663 Da); precursor mass tolerance of 20 ppm; fragment mass tolerance of 0.05 daltons. Percolator was used to filter peptide spectral matches (PSM) and peptides to a false discovery rate (FDR) of less than 1% using target-decoy strategy. The phosphosite localization site threshold was set to 0.75, ensuring < 5% false-localization rate (FLR) of PTM assignments as described 37 . Following spectral assignment, peptides were assembled into proteins and were further filtered based on the combined probabilities of their constituent peptides to a final FDR of 1%. In cases of redundancy, shared peptides were assigned to the protein sequence in adherence with the principles of parsimony. Reporter ions were quantified from MS2 scans using an integration tolerance of 20 ppm with the most confident centroid setting. The search results and TMT quantification are included 38 .

Data Records
All files have been deposited on Synapse 39 . These include sample traits 29 , mass spectrometry raw files (n = 108) from both total proteome and phosphoproteome 30 , the FASTA database 34 , search result 38 , and the ANOVA analysis input and output result 40 .The mass spectrometry proteomics raw files and data analysis files have also been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository 41 .

Technical Validation
Deep dive proteome of human AD brain. We utilized a modified version of the CPTAC protocol to identify the total proteome and phosphoproteome from the same cases across different stages of AD. Control, AsymAD and AD tissues were randomized across the 3 batches (each containing 11 TMT channels) with 9 individual cases per batch ( Fig. 1a and Supplementary Tables 1 and 2). Two TMT channels in each batch were dedicated to global reference internal standards (GIS), representing an equivalent amount of pooled peptides from all cases, which allows assessment of the intra-and inter-batch variance 22 . To reduce sample complexity and increase proteome depth prior to LC-MS/MS, we employed off-line high-pH reversed-phase fractionation essentially as described in the CPTAC protocol 23 . A total of 77 individual fractions were collected and combined into 24 fractions for total proteome analysis (Supplementary Figure 1) for each batch. A step-wise concatenation strategy was used for pooling the fractions. A total of 5% of the material by volume was used for total proteome analysis, and the remaining 95% of the sample was used for phosphopeptide enrichment by immobilized affinity chromatography (IMAC) with Fe 3+ -loaded nitrilotriacetic acid (NTA) beads. Each of the 24 fractions were pooled into 12 subfractions prior to IMAC. Both the total proteome (n = 72 fractions across 3 batches) and phosphoproteome (n = 36 fractions across 3 batches) were analyzed by LC-MS/MS with high-resolution precursor and MS/MS scans on an Orbitrap Fusion Lumos mass spectrometer.
For the total proteome runs, a total of 164,034 unique peptides were identified that mapped to 11,378 protein groups at a 1% FDR on the peptide spectrum match (PSM) level across all batches, which represented 10,373 coding gene products. The total numbers of identified peptides, proteins and PSMs for all batches in the total proteome are listed in Table 1. For each batch, there were approximately 10,000 protein groups identified (Fig. 1b), which was comparable to the depth achieved in the CPTAC protocol using different tissue sources 42,43 . The confidence of identification for peptide and protein is highly related with the number of PSMs and unique peptides. In the total proteome dataset, more than 77% of the proteins were identified with 2 or more unique peptides (Fig. 2a), while each unique peptide averaged approximately 6 PSMs (Table 1). Approximately 93% of all proteins were identified with at least 2 PSMs (Fig. 2b).
To obtain deep coverage of the phosphoproteome, enrichment strategies are usually applied due to the relatively low abundance of phosphorylation. To assess the quality of our IMAC phosphopeptide enrichment method, we calculated the percent phosphopeptide content (peptide level) in both the total proteome and IMAC phosphoproteome datasets (Tables 1 and 2, and Fig. 1d). The total proteome identified a total of 164,034 unique peptides, and approximately 2% were phosphopeptides (Fig. 1d). Although the IMAC dataset identified less peptides overall (n = 72,138), approximately 71% of the IMAC proteome were phosphopeptides ( Table 2, and Fig. 1d). The IMAC enrichment method therefore led to an 18-fold increase in phosphopeptide identification using half of the instrument time. We set the threshold of phosphosite identification to 0.75 by SEQUEST, estimating less than 5% false localization rate (FLR) of each assigned site. After filtering, approximately 83% of all phosphosites identified had localization scores greater than or equal to 0.99. There were 51,736 phosphorylated peptides representing 33,652 unique individual phosphosites in total mapping to 8,415 proteins. A total of 34,379 of the phosphorylated peptides were identified in at least two of the three IMAC TMT batches (Fig. 1c). These figures are similar to the depth reported using the same protocol from breast cancer tissue 44 . The numbers of identified peptides in each IMAC batch are listed in Table 2 with calculation of phosphorylation enrichment at both the total peptide and PSM level. Of note, the phosphorylated peptides showed slightly higher level of enrichment at the PSM level (83.55% as average) than peptide level (71.72% as average), which indicates that phosphopeptides were more intense and thus more frequently sequenced by LC-MS/MS than an average unmodified peptide in the phosphoproteome. Indeed, each non-phosphopeptide was identified by an average of 3.5 PSMs, whereas each phosphopeptide was identified by an average of 7 PSMs (Table 2). In total, 74% of all phosphopeptides identified by IMAC enrichment had two or more PSMs, which is consistent with the frequency of PSMs for peptides identified from the total proteome (Fig. 2c,d). Thus, although phosphopeptides were highly enriched in the IMAC www.nature.com/scientificdata www.nature.com/scientificdata/ proteome, they were sampled at a rate generally consistent with non-phosphopeptides from the total proteome, allowing greater sequencing depth of the phosphoproteome.
Assessing intra-and inter batch variance utilizing a pooled global internal standard. A major advantage of TMT approaches is the ability to quantify multiple samples in a single run, thereby critically reducing overall MS instrument time. This becomes especially important when the total number of samples increases to dozens or even hundreds 36,45,46 . Typically, one or more TMT channels are dedicated for global internal standard(s) (GIS) and included in all batches, which can be used to normalize the measurement for protein or peptide signal from all samples across all batches 22 . In this study, we included two pooled reference standards in each 11-plex TMT batch (channels 126 and 131 C) (Fig. 1a), which allows normalization within and across TMT batches. The two reference standards essentially serve as technical replicates (i.e., a null-experiment), which can  Fig. 1 Deep coverage of human brain proteome and phosphoproteome in Alzheimer's disease using TMT. (a) TMT Workflow. There were 27 human post-mortem tissues from the dorsolateral prefrontal cortex, which were digested and labeled with 3 batches of 11 plex TMT reagents. Two pooled global internal standards (GIS) were labeled with channels 126 and 131 C in each batch. There were 9 samples from 3 groups (control, AsymAD and AD) randomized and labeled in the remainder channels. The labeled sample mixture was fractionated by off-line high-pH reversed-phase chromatography. Whole proteome analysis was conducted using 5% of peptides from each fraction. Phosphopeptides were enriched from 95% of the samples by immobilized affinity chromatography (IMAC) with Fe 3+ -loaded nitrilotriacetic acid (NTA) beads. All fractions were analyzed on an Orbitrap Fusion Lumos Tribrid mass spectrometer. (b) Venn diagram of total proteome proteins from 3 batches. Approximately 10,000 protein groups were identified from each batch, with 8,694 shared protein groups found in all 3 batches, and 9,966 protein groups found in at least 2 batches. (c) Venn diagram of phosphopeptides identified in 3 batches. There were 19,474 shared phosphopeptides found across all 3 batches. (d) Peptide composition from both total proteome and IMAC proteome were illustrated. Non-phosphopeptides (grey) and phosphopeptides (red) are shown. Phosphopeptides were greatly enriched in IMAC proteome (71%), compared with the total proteome (1.7%). www.nature.com/scientificdata www.nature.com/scientificdata/ be used to assess the variance in measurements. Thus, the degree of the signal variation between two internal GIS channels can be used as a threshold to further filter out poor quantitation data. Indeed, the signal of proteins and peptides from 126 and 131 C channels were very consistent and showed very good linear correlation across all three batches (Fig. 3a). We also consistently observed a strong correlation at both the peptide level from the total proteome (Fig. 3b) and phosphoproteome (Fig. 3c). Notably, some peptides exhibited large variation in signal between the two pooled standard channels, especially those peptides with lower total signal abundance as previously described 22 . According to the central limit theorem, the log 2 ratio for the two GIS channels (log 2 TMT channels 126/131 C) should fit a standard Gaussian distribution with the mean at or near zero (Supplemental Figure 2a), which can be used to assess the technical variation of measurements 47,48 . This allows end-users of the datasets to impose a filtering criterion that can be used to remove peptides or proteins that do not meet variance metrics ( > 2 standard deviations (SD) from the mean). Following this filtering criteria, a total of 1,123 peptides were filtered out of the analyses due to large variance, equivalent to ~4% of all quantitated peptides in IMAC Batch 1. In Batch 2 and Batch 3, there were 2,465 and 2,289 peptides filtered out by the > 2 SD standard, respectively, representing 5% of all peptides identified (Supplementary Figure 2b).   www.nature.com/scientificdata www.nature.com/scientificdata/ It is also worth noting that batch effects may be significant when the sample number is large and the variation due to sample preparation cannot be ignored. In this case, post-hoc data normalization strategies should be employed to remove these batch effects 36,49 . In this project, however, this step was not necessary given the relatively modest sample size (n = 27) and since all samples were digested at the same time.
Assessing amyloid levels and tau phosphorylation. Aβ plaque and hyperphosphorylated tau neurofibrillary tangle (NFT) accumulation in the brain are the core pathological hallmarks of AD 50,51 . Thus, as a quality control of our measurements, we assessed the levels of Aβ and tau in our dataset. To confirm increased Aβ levels in diseased cases, the ion intensities from the first two tryptic peptides of Aβ were used as a surrogate for amyloid levels in the brain 5 , corresponding to residues 6-16 (Peptide 1) and 17-28 (Peptide 2) of the Aβ sequence, since the C-terminal non-tryptic peptides were not stably detected in all batches. Indeed, both of these two peptides showed significant increases in AsymAD and AD groups compared with control samples (Fig. 4a). Additionally, measurements of Peptide 1 and Peptide 2 were highly correlated (Fig. 4b). Given this, the sum intensity of the two a Correlation between two GIS channels from Total proteins www.nature.com/scientificdata www.nature.com/scientificdata/ peptides was used to represent Aβ levels in each sample, which showed significant increase in both AsymAD and AD samples when compared to the control group (Fig. 4c).
Another hallmark of AD is hyperphosphorylated tau 52 , which is the core component of neurofibrillary tangles (NFTs) in diseased neurons. Remarkably, 22 phosphorylated tau peptides were detected in the total proteome even without IMAC enrichment, which highlights the robust phosphorylation of this protein in AD brain (Fig. 5a). After IMAC enrichment, a total of 112 tau phosphopeptides were identified. Of note, there were 47 peptides containing two or more phosphosites, which was approximately 42% of all phosphopeptides mapped to tau (Fig. 5a). Since the MTBR domains form the core of neurofibrillary tangles and is required to seed tau aggregation 53,54 , it was included as an additional protein entry as MAPT-MTBR within the database, while all other tau isoforms were replaced as new "deltaMTBR" entries after the MTBR sequence was removed from the original sequences. As shown in Fig. 5b,c, both MAPTdeltaMTBR (MAPTΔMTBR) and MAPT-MTBR show differences between AD and control groups. However, as expected, the effect size (log 2 fold change) for the tau MTBR is larger than tau without the MTBR (ΔMTBR) in AD. The tau MTBR sequenced from the phosphoproteome, which contained stronger phosphopeptide signal, yielded even better separation between AD and control groups compared with MTBR sequenced from the total proteome.
A one-way ANOVA of peptide levels across three groups (CTL, AsymAD and AD) was also performed 40 , and peptide volcano plots were calculated, showing log 2 fold changes and log 10  The GIS-normalized abundances of MAPTΔMTBR in CTL, AsymAD, and AD cases. The MTBR was removed from MAPT and set as an additional sequence in the FASTA. AD group was significantly increased compared to control and AsymAD groups. (c) The GIS-normalized abundances of MAPT-MTBR in CTL, AsymAD, and AD cases. AD group was significantly increased compared to control and AsymAD groups. Compared to MAPTΔMTBR, MTBR domains showed better separation from both the total proteome and phosphoproteome data. (d) Volcano plots showing log 2 abundance fold changes (AD/CTL) and log 10 -transformed p-values of nonphosphopeptides (grey) and phosphopeptides (red) after one-way ANOVA across three groups (CTL/AsymAD/ AD) in the total proteome dataset (left) and IMAC-enriched phosphoproteome dataset (right). In order to achieve better accuracy, only the peptides quantified across all 3 batches were plotted. Several phosphorylated MAPT peptides (blue) identified from both total proteome and IMAC proteome were significantly increased in AD compared with CTL and labeled. Both tryptic Aβ peptides increased in the AD group. (e) The log 2 fold changes in AD over CTL brains of tau phosphopeptides were largely increased in the Proline-rich (Pro-rich) domain (yellow) and MTBR domains (brown). Unchanged phosphopeptides also mapped to the N-terminal acidic domains (green). The log 2 abundances of Tau peptides quantified in at least two batches were illustrated ranging from increased [Log2(X/CTL) = + 4, red] to decreased [Log2(X/CTL) = -4, blue]. www.nature.com/scientificdata www.nature.com/scientificdata/ from both the total proteome and phosphoproteome datasets (Fig. 5d). In total proteome peptide data, we observed both Peptide 1 and Peptide 2 from Aβ to be significantly increased in AD when compared to controls. Additionally, the tau phosphopeptides were among the most changed peptides between AD and control, with significantly increased peptide abundances (Fig. 5d). In agreement with this, tau phosphopeptides were among the most significantly-changed peptides in the IMAC proteome as well. To illustrate this, all tau phosphopeptides quantified in more than two batches were colored according to the degree of fold change between AD and control from IMAC proteome (Fig. 5e). Importantly, the IMAC enrichment allowed deep sequencing and quantification of phosphopeptides mapping to the Proline-rich (Pro-rich) domain (residues 103-244) and MTBR (residues 244-368) domain 55 . Both these regions showed the most consistently increasing in abundance in AD compared with other regions of the tau protein.

Usage Notes
Ultimately, these deep human brain proteomic and phosphoproteomic datasets serve as a valuable resource for a variety of research endeavors including, but not limited to, the following applications: Use case 1: protein abundance at steady state. This dataset provides a reference for relative protein abundance in brain, especially if an investigator wants to determine whether their protein of interest is abundantly expressed in human brain 38 . Use case 2: AD stage-specific differential protein expression. There were three separate clinical and pathological groups of human post-mortem tissues representing three stages of AD. One can compare the expression differences between different stages at the protein, peptide or phosphopeptide level. The volcano plots shown in Fig. 5d displays the substantive changes in peptide levels between AD and control groups. The same analysis between AsymAD and control can also be applied. The ANOVA analysis between the three groups (i.e., proteins, peptides and phosphopeptides) is included 40 . The average levels from each group are also included in the output, which can be used to assess stage-specific trends across the groups.
This analysis also includes the quantification of peptides with and without phosphorylation sites 38 in the same peptide within the same sample, which can greatly benefit the investigators working to fully describe the phosphorylation stoichiometry of certain proteins.
Use case 3: protein co-expression network analysis. In this dataset, there were more than 10,000 proteins quantified which is more than enough to conduct systems-level analysis. WeiGhted Co-expression Network Analysis (WGCNA) and related algorithms can be utilized for systems-based network analyses, which generate modules of proteins clustered by correlated expression patterns 5,36,46 . The protein clusters can then be correlated to molecular functions and pathways. These programs can also be used to correlate expression clusters to various biological traits. Furthermore, the cell-type specificity of individual proteins may be investigated according to the module membership of a protein and the brain cell-type enrichment data for that particular module. The systems-level co-expression analysis also includes the average abundance of proteins or phosphopeptides across disease stages that form early, intermediate and late change clusters 46 . Use case 4: Identification and quantification of signaling pathways. Pathway analysis is routine with software 56 or web services 57,58 to analyze different high-throughput omics data, like genomics, transcriptomics, proteomics, lipidomics and metabolomics. Pathway analyses help to organize a list of proteins into a cohesive list of pathway maps to interpret proteomics results. These analyses have proved to be a very powerful interpretation tool in biological research, facilitating novel insights in disparate fields including development 59 , apoptosis 60 , cancer 61,62 , and other diseases 63,64 . Several biological pathways have been linked to AD using similar methods 36,[65][66][67] . Given the excellent coverage of the AD proteome and phosphoproteme from the same samples described here, this dataset may therefore serve as a useful resource for pathway analysis.
Use case 5: Domain or motif analysis. A protein domain or motif is a part of a given protein sequence that serves as a substrate for kinases or other enzymes to recognize and chemically modify, and is replicated in other sequences in the proteome, playing conserved roles in protein function 68,69 . Recent advances in genomics and proteomics sequencing following the development of bioinformatics 70,71 make large-scale domain or motif analyses possible. As kinases reliably phosphorylate motif sequences specific to that enzyme, the altered phosphorylation of certain motifs may reflect impaired kinase dynamics in AD. Given the enhanced coverage of the AD proteome and phosphoproteome, this dataset can be an excellent tool for AD-related domain or motif analysis. Use case 6: Targeted proteomics. Due to the multiplexing nature of the TMT method, proteomic sample processing has become increasingly high throughput and a more popular mode of research. As innovative technical advances in instrumentation, computing and processing have steadily improved, TMT-labeled peptide analyses have begun to be applied to targeted proteomic methods, such as TOMAHAQ 72 . In this dataset, we have identified 164,034 peptides and 51,736 phosphopeptides through TMT isobaric labeling. Importantly, this dataset includes peptide-specific characteristics such as intensity, charge and modification state 38 , which can serve as a resource to reference for targeted proteome analyses in the future.