An equation to estimate the difference between theoretically predicted and SDS PAGE-displayed molecular weights for an acidic peptide

The molecular weight (MW) of a protein can be predicted based on its amino acids (AA) composition. However, in many cases a non-chemically modified protein shows an SDS PAGE-displayed MW larger than its predicted size. Some reports linked this fact to high content of acidic AA in the protein. However, the exact relationship between the acidic AA composition and the SDS PAGE-displayed MW is not established. Zebrafish nucleolar protein Def is composed of 753 AA and shows an SDS PAGE-displayed MW approximately 13 kDa larger than its predicted MW. The first 188 AA in Def is defined by a glutamate-rich region containing ~35.6% of acidic AA. In this report, we analyzed the relationship between the SDS PAGE-displayed MW of thirteen peptides derived from Def and the AA composition in each peptide. We found that the difference between the predicted and SDS PAGE-displayed MW showed a linear correlation with the percentage of acidic AA that fits the equation y = 276.5x − 31.33 (x represents the percentage of acidic AA, 11.4% ≤ x ≤ 51.1%; y represents the average ΔMW per AA). We demonstrated that this equation could be applied to predict the SDS PAGE-displayed MW for thirteen different natural acidic proteins.

The N-terminus of Def is not modified by glycosylation. We previously showed that the N-terminus (1-377 AA) of Def is essential for the Def-Capn3 pathway in mediating p53 degradation in the nucleolus 2 . Considering the fact that the N-terminus of Def is also responsible for the MW difference, we were intrigued to study the characteristics of the N-terminus of Def. Considering the Δ MW is ~13 kDa, we first went about finding out whether the discrepancy between the predicted Def MW and its SDS PAGE-displayed MW was attributed to post-translational modification(s), such as glycosylation or ubiquitination, which often causes drastic gel mobility shift 16,17 . To facilitate our study, we divided D10 (encoding 188 amino acid residues) by halving it into two parts, namely D14 and D15 (Fig. 2a) and fused them to the EGFP tag. Western blot analysis revealed that the SDS PAGE-displayed MWs for EGFP-D14 and EGFP-D15 were approximately 5.3 kDa and 7.9 kDa larger than their predicted MW, respectively (Fig. 2b). Since N-or O-glycosylation often drastically increases the MW of a protein we treated EGFP-D14 and EGFP-D15 with PNGase (an N-glycosidase, for N-deglycosylation) and O-glycosidase plus Neuraminidase (for O-deglycosylation), respectively. The gel mobility of EGFP-D14 and EGFP-D15 were not affected by treatment with any of these glycosidase (Fig. 2c) whilst, as expected, RNase B and Fetuin, two positive controls, migrated faster after glycosidase treatment (Fig. 2d). Therefore, the N-terminus of Def is not glycosylated and this rules out the possible contribution of glycosylation to the observed MW difference.

The N-terminus of Def is not modified by ubiquitination/sumoylation. Lysine (Lys or K) is an
important amino acid for post-translational modifications including methylation, acetylation, sumoylation and ubiquitination 18 . Since D15 exhibited a MW that is ~8 kDa larger than its predicted MW (Fig. 2b) we wondered whether there is(are) any modification(s) on the K residues. In total there are seven K residues in D15. We substituted these K with R singly (K129R; K136R; K144R) or in combination (K139, 140R; K164, 165R) or in summation (all seven K with R, 7KR) in the EGFP-D15 plasmid by site-directed mutagenesis and found that none of the mutant proteins exhibited an obvious gel mobility shift (Fig. 3a,b), thus excluding the possible contribution of lysine modification to the observed MW difference of D15. We also mutated all ten K in D14 and found that the EGFP-D14_10KR protein displayed two bands, with one showing identical MW to the wild type D14 peptide and the other ~1 kDa smaller than the wild type D14 peptide (Fig. 3c). Although currently we cannot explain this observation, considering the fact that this lower band is still about 4 kDa larger than the predicted D14 MW we conclude that the difference between the predicted MW and the SDS PAGE-displayed MW is not caused by K modification in D14.
High percentage of acidic AA in the N-terminus of Def is the key determinant of the observed MW discrepancy. Size analysis showed that both D14 and D15 fragments (Fig. 2b) contributed to the SDS PAGE-displayed MW of D10 (Fig. 1e). We further divided D15 into D16 and D17 and fused them to EGFP, respectively, and found that both EGFP-D16 and EGFP-D17 showed obvious difference between the predicted and SDS PAGE-displayed MW (Fig. 4a,b). Previous reports have implicated (f) Western blot using the anti-EGFP antibody to detect EGFP, EGFP-D9 and EGFP-D10 in embryos eight hours after injection with their respective mRNA. (g) Δ MW for EGFP, EGFP-D9 and EGFP-D10 based on f. Loading control: CBB (coomassie brilliant blue) staining or western blot of GAPDH or β -Actin. In a (n = 3), b (n = 3), e (n = 3) and g (n = 4), value above indicates the Δ MW mean and error bar stands for SEM. The gel picture (for CBB staining) and western blot images were cropped with a grey cropping line. All gels for western blot analysis were run under the same experimental conditions. that high percentage of acidic amino acid residues might result in retardation of protein mobility [10][11][12][13][14][15] . Domain analysis showed that the N-terminus of Def contains a glutamate-rich region (amino acid residues 82-206) (Fig. 4c) 2 . We thus went to determine the relationship between the SDS PAGE-displayed MW and amino acid composition of Def. We divided the 20 AA into five groups including hydrophobic (A, I, L, F, W, V), polar (N, C, Q, S, T, Y), strongly basic (K, R), strongly acidic (E, D) groups and a group of the remaining amino acids (G, M, P, H). We then calculated the percentage of each of this group AA in each peptide including Myc-Def, Myc-D1, Myc-D2, Myc-D3, Myc-D4, Myc-D9, Myc-D10, D9, D10, D14, D15, D16 and D17. We also calculated Δ MW for each of the aforementioned peptide and used this Δ MW to divide the number of amino acids to get the average Δ MW per amino acid residue in each peptide (Fig. 5a). We then plotted the percentage of each of the five groups against the Δ MW per amino acid residue in each peptide ( Fig. 5b-f). We found that only the percentage of the group of strongly acidic AA showed a linear correlation with the average Δ MW per amino acid residue (Fig. 5b) while none of the other four groups showed such correlation ( Fig. 5c-f). A mathematic calculation allowed us to get the linear equation as y = 276.5x − 31.33, where x stands for the percentage of strongly acidic amino acids (11.4% ≤ x ≤ 51.1%) and y for the average Δ MW per amino acid residue (Fig. 5b). Considering the possible effect of positively charged amino acids (K and R) on the equation, we noticed that among these eleven peptides the percentages of K/R range from 7.4% (D15) to 17% (D14). Plotting the Δ MW per or EGFP-D15 mRNA injection into one-cell stage embryos. (d) Rnase B and Fetuin were used as the positive controls in glycosidase treatment as indicated and were stained with CBB (coomassie brilliant blue). The gel picture (for CBB staining) and western blot images were cropped with a grey cropping line. All gels for western blot analysis were run under the same experimental conditions. amino acid residue against the percentages of K/R did not reveal a linear correlation (Fig. 5c). However, we cannot rule out the possibility of the effect of higher percentage of K/R on our equation.
Successful prediction of the SDS PAGE-displayed MW for three acidic proteins Sas10, Mpp10 and Bms1l. Sas10 19 , Mpp10 13 and Bms1l 20,21 are all nucleolar acidic proteins. We cloned sas10, mpp10 and bms1l into the expression vector pCS2 + with an HA tag and expressed them in the cultured human cells (293T), respectively. Western blot was used to determine the MW of these proteins in an SDS PAGE gel. HA-Sas10 is composed of 485 AA with a predicted isoelectric point of 5.13 (Fig. 6a). The theoretically predicted MW for HA-Sas10 is 56.6 kDa (Fig. 6a) and the actual SDS PAGE-displayed MW for HA-Sas10 is 77.7 kDa obtained by the western blot analysis (Fig. 6b, left panel). The actual SDS PAGE-displayed MW nicely matched the predicted SDS PAGE-displayed MW for HA-Sas10 (73.5 kDa) using the equation (Fig. 6c, panel for HA-Sas10). HA-Mpp10 is composed of 707 AA with a predicted MW of 81.2 kDa and isoelectric point of 4.39 (Fig. 6a). The actual SDS PAGE-displayed MW for HA-Mpp10 (114.8 kDa) (Fig. 6b, left panel) also nicely matched the predicted SDS PAGE-displayed MW (110.5 kDa) using the equation (Fig. 6c, panel for HA-Mpp10). HA-Bsm1l is composed of 1230 AA with a predicted MW of 141.1 kDa and isoelectric point of 5.18 (Fig. 6a). Similarly, the equation was successfully applied to predict the SDS PAGE-displayed MW for Bms1l (Fig. 6b, middle panel; 6c, panel for HA-Bms1l). As expected, the equation is also applicable to predict the SDS PAGE-displayed MW for Rcl1, a non-acidic protein (Fig. 6a-c, right panel in 6b, panel for HA-Rcl1 in 6c).
Successful prediction of the SDS PAGE-displayed MW for ten acidic proteins reported in literatures. In order to further examine the applicability of the equation, we searched the literatures 10,[22][23][24][25][26][27][28][29][30] and got records for 10 acidic proteins with a percentage of E/D ranging from 18.8-31.2% (Table 1). As expected, the SDS PAGE-displayed MW for each of these ten proteins was larger than the predicted MW based on their amino acid composition as shown in the cited references (Table 1). We used the equation to estimate the SDS PAGE-displayed MWs for these ten proteins. The result clearly showed that the equation nicely predicted the SDS PAGE-displayed MW for each of these ten acidic proteins (Table 1). D14 and D15. (a-c) Western blot using EGFP antibody to detect EGFP-D15 and single or double K to R mutants of EGFP-D15 (a) or EGFP-D15_7KR (all seven K in D15 were mutated to R) (b) or EGFP-D14_10KR (all 10 K in D14 were mutated to R) (c). Protein samples were extracted from embryos at 8 hpf after corresponding mRNA injection into one-cell stage embryos. CBB staining: loading control. The gel picture (for CBB staining) and western blot images were cropped with a grey cropping line. All gels for western blot analysis were run under the same experimental conditions.

Discussion
It is not unusual to notice a protein displaying a MW on an SDS PAGE gel different from its predicted size (for example, while the predicted size for human p53 is 43.7 kDa it runs as a 53 kDa band in an SDS PAGE gel 31,32 ). In many cases, this MW difference is attributed to chemical modifications of the protein, especially glycosylation and uibiquitination/sumoylation which causes drastically retarded gel mobility shift 16,17 . On the other hand, phosphorylation modification may causes subtle but significant band shift on an SDS PAGE gel 33 except for hyper-phosphorylation which appears as a slower-migrating smear 34 . Therefore, there is a need to determine whether the larger size in an SDS PAGE gel is resulted from chemical modifications or from certain features (e.g AA composition) of the protein.
Def displayed an SDS PAGE MW approximately 13 kDa larger than its predicted one. To find out the reason behind, we carried out a series of peptide mapping experiments and found that the N-terminal 188 amino acid residues (2-189 AA) (D10 fragment) but not other regions of Def was responsible for its ~13 kDa mobility shift. We then used various approaches (including enzyme treatment and site-directed mutagenesis) to check whether the size difference was due to post-translational modification(s) of amino acid residue(s). We ruled out the contribution of glycosylation and unbiquitination/sumoylation to this MW difference. We did find that the first 95 amino acids at the N-terminus of Def was phosphorylated, however, that only accounts for a mere ~1.7 kDa mobility shift (data not shown), far less than the observed ~13 kDa MW difference for Def. Eventually, we turned our focus on the AA composition of Def since the N-terminal region of Def (first 188 AA) contains a high percentage (35.6%) of acidic AA (E and D). We analyzed the relationship between MW of each of the thirteen peptides derived from Def with percentages of AA grouped based on their properties. We surprisingly found that the difference between the predicted and SDS PAGE-displayed MW for the Def N-terminus showed a linear correlation with the percentage of acidic AA (E and D) that fits the equation y = 276.5x − 31.33 (where x represents the percentage of E and D, and y represents the average Δ MW per amino acid residue). Based on this formula we predicted that y (Δ MW) will be zero when x is 11.3%, in that case the observed MW on an SDS-PAGE gel would match the predicted MW. This was indeed the case for non-acidic protein Rcl1, a nucleolar protein with 10.3% of acidic AA (pI = 8.60). Finally, we demonstrated that this equation could be successfully applied to predict the SDS PAGE-displayed MW for thirteen acidic proteins, including Sas10, Mpp10 and Bms1l and ten others reported in the literatures. Therefore, this equation is practically useful because when a research encounters the MW difference issue it will allow us to predict the SDS PAGE-displayed MW conveniently prior to determining whether the MW difference is caused by chemical modifications of the protein of interest. Since the range of x value was deduced based on the lowest (Myc-D9) and highest (D16) percentages of E/D in the eleven peptides we tested (Fig. 5), it would be interesting to test whether our equation is applicable to predict the SDS PAGE-displayed MW for proteins with percentage of acidic amino acids beyond this range (lower or higher) in the future. Plasmid construction. Target gene cDNA fragments (including def and its derivatives, sas10, mpp10, rcl1 and bms1l) were cloned into the pCS2 + vector for in vitro mRNA synthesis. myc tagged def, D1, D2, D3 and D4 were constructed by Tao et al. 2 . The primers used for myc tagged D9 and D10 were listed in Supplementary Table 1.
To make the EGFP-D9 construct, primer pairs (EGFP Fw) + (EGFP tag Rv) and (D9 Fw) + (myc-D9 Rv) were used to amplify the EGFP and D9 respectively. PCR products were mixed and denatured together to allow annealing of the sticky ends to join the two parts, and this mixture was then used as the template for the second-round PCR using primers (EGFP Fw) and (myc-D9 Rv) to get EGFP-D9. Similar method was used to obtain EGFP tagged D10, D15 and D17. EGFP tagged D14 and D16 were amplified from EGFP-D10 and EGFP-D15 respectively. The primers were listed in Supplementary Table 1. All def-related mutant genes with point mutations were produced by site-directed mutagenesis PCR using the primers pairs listed in Supplementary Table 2 mRNA synthesis and western blot. mRNAs were in vitro synthesized using mMESSAGE mMA-CHINE ® Kit (Ambion) according to manufacturer's instructions. mRNA was injected into one-cell stage zebrafish embryos to overexpress protein of interest. Embryos were deyolked and then lysed in SDS lysis buffer supplied with 1 × Complete Protease Inhibitor Cocktail (EDTA-free, Roche). The protein samples were used for western blot analysis immediately or kept at − 20 °C for storage. Def rabbit polyclonal antibody used in western bolting was generated by Hangzhou HuaAn Biotechnology Company (China) using the synthesized peptide CLRLPDSPQRPEPDS. Anti-Myc tag antibody was purchased from Clontech (No. 631206). Sigma HA mouse monoclonal (HA-7) antibody (H3663) was used to detect HA tag. GFP Antibody (B-2) (Santa Cruz, sc-9996) was used to detect EGFP. β -Actin antibody was purchased from Cell Signaling (#4967). GAPDH rabbit monoclonal antibody (EPR1977Y) was from Epitomics (#5632-1). 40, 55, 70, 100, 130 and 170 kDa, Fermentas) was loaded along with different protein samples. After gel electrophoresis, an R f value (the migration distance of a protein divided by the migration distance of the front-running dye) was obtained for each standard control protein. The R f value was plotted against the lg(MW) of corresponding standard control protein to get the linear formula lg(MW) = aR f + b (where a is the slope and b is the y-intercept). The R f value for each protein sample was then obtained and used for the calculation of the SDS PAGE-displayed MW of the corresponding protein.