Complementary proteomics strategies capture an ataxin-1 interactome in Neuro-2a cells

Ataxin-1 mutation, arising from a polyglutamine (polyQ) tract expansion, is the underlying genetic cause of the late-onset neurodegenerative disease Spinocerebellar ataxia type 1 (SCA1). To identify protein partners of polyQ-ataxin-1 in neuronal cells under control or stress conditions, here we report our complementary proteomics strategies of proximity-dependent biotin identification (BioID) and affinity purification (via GFP-Trap pulldown) in Neuro-2a cells expressing epitope-tagged forms of ataxin-1[85Q]. These approaches allowed our enrichment of proximal proteins and interacting partners, respectively, with the subsequent protein identification performed by liquid chromatography-MS/MS. Background proteins, not dependent on the presence of the polyQ-ataxin-1 protein, were additionally defined by their endogenous biotinylation (for the BioID protocol) or by their non-specific interaction with GFP only (in the GFP-Trap protocol). All datasets were generated from biological replicates. Following the removal of the identified background proteins from the acquired protein lists, our experimental design has captured a comprehensive polyQ-ataxin-1 proximal and direct protein partners under normal and stress conditions. Data are available via ProteomeXchange, with identifier PXD010352.


Methods
These methods are expanded versions of descriptions in our related work 23 .

Sample preparation
A workflow for sample preparation prior to the mass spectrometry analysis is shown in Fig. 2.
For the BioID protocol, Neuro-2a cells were transiently transfected using Lipofectamine 2000 (ThermoFisher) to express mycBioID-ataxin-1[85Q] or remained untransfected as a control; biotin (50 μM) was included during the transfection protocol. The untransfected control in this protocol allowed establishment of the endogenous biotinylation patterns in Neuro-2a cells, without restricted compartmentalization that would be expected for either cytoplasmic restricted GFP-tagged BirA* ligase or nuclear restriction if BirA* was linked to a nuclear localization sequence.
For the affinity purification protocol, subsequently called the GFP-Trap protocol, Neuro-2a cells were transiently transfected using Lipofectamine 2000 (ThermoFisher) to express GFP-ataxin-1[85Q] or GFP as a control. The GFP-transfected control in this protocol allowed establishment of the proteins interacting with GFP in the absence of ataxin-1.

Protein identification and background reduction
Resultant MS/MS data was analyzed using the Mascot search engine (Matrix Science version 2.4) against the SWISSPROT database released July 2015 (with the settings as follows: taxonomy -Mus musculus, enzyme -Trypsin, Protein Mass -± 20 ppm, Fragment Mass Tolerance -± 0.2 Pa, Max Missed Cleavages: 2). Identifications in all samples (test or background/non-specific binding) were accepted for proteins with >two significant peptides (p o 0.05). Background/non-specific binding proteins were identified as follows: for the GFP-Trap protocol assessments, proteins identified in samples prepared from GFP only-transfected cells processed in parallel in each of the 3 replicates were defined as background/non-specific binding; for BioID assessments, proteins identified in samples prepared from untransfected cells across all 3 replicates were pooled and defined as background as endogenously biotinylated proteins. Proteins after background reduction were retained for further biological triplicates comparison and bioinformatics analyses.

Code availability
Data comparision was performed by Miscrosoft ® Excel for Mac (Version 15.19.1) using the conditional formatting function and also by the Venn diagrams tool, available online (http://bioinformatics.psb. ugent.be/webtools/Venn/).

Data Records
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository (Data Citation 1). The dataset includes 21 raw files, 21 mgf files, and 21 mzid files. Raw files are non-processed outputs from Q-Exactive plus mass spectrometer. Mgf files are original peak list files that were used by search engine MASCOT. Mzid files describe the results of peptide/protein identification. There are 21 samples that represent 7 conditions (Fig. 2

Technical Validation
Three steps, as illustrated in Fig. 3, were taken in technical validation to maximize data quality. At Step 1, the protein identification stage following the MASCOT search, ion scores for each peptide were compared to ion score significance thresholds (p o0.05). Peptides with an ion score (peptide value) higher than the thresholds (expected value) were considered as significant peptides (SP). To ensure a false discovery rate (FDR) of o5%, proteins with o2 significant peptides were discarded (proteins identified by MASCOT search, Data Citation 2). This step diminished the inaccuracy caused during the mass spectrometer operation or the peptide searching/matching processes.
Step 2 was undertaken for background reduction. We included negative controls, to define background proteins: non-transfected cells to determine endogenous biotinylated proteins in BioID and GFP-transfected cells for non-specific binding in the GFP-Trap protocol. The background protein reduction (BPR) percentages for all samples were calculated as the ratio of the number of identified background proteins in each sample against the number of total background proteins (Fig. 3, bar graph). The average BPR percentage for each condition was thus calculated to be >90%, indicating an effective background reduction that identified a majority of the background proteins in each sample (proteins after background reduction, Data Citation 2).
Step 3 was performed to ensure the consistency of the results. Biological triplicates were performed and the results compared to filter lower confidence proteins. This step decreased noise caused by random biological variations in samples, enriching proteins with reproducible binding/proximity. Only proteins that appeared at least twice in triplicates in each condition were considered for further bioinformatics analyses (final proteins after biological replicates comparison, Data Citation 2).
Our results revealed 675 protein identifications in total (ataxin-1 interactome, Data Citation 2), with 91 proteins appearing three or more times in the four conditions. Our complimentary approaches have identified numerous proximal or interacting proteins of ataxin-1, including many RNA-binding proteins, reinforcing ataxin-1's role in transcription and RNA splicing 12,24,25 . The results also show good coverage of well-defined binding partners of ataxin-1, e.g. CIC, U2AF65, and 14-3-3 proteins 9,11,12 . Importantly, these results also reveal multiple new discoveries not previously made in the single screening approaches by others; these new members of the ataxin-1 interactome thus constitute a rich resource requiring further interrogation. Thus, the employment of proximity labelling (BioID) and affinity purification (GFP-Trap) can reveal a complex interactome of polyQ-ataxin-1 protein with future applications of this combined approach possible for various other proteins, including other polyQ proteins that drive neurodegenerative diseases.