Integral approach to biomacromolecular structure by analytical-ultracentrifugation and small-angle scattering

Currently, a sample for small-angle scattering (SAS) is usually highly purified and looks monodispersed: The Guinier plot of its SAS intensity shows a fine straight line. However, it could include the slight aggregates which make the experimental SAS profile different from the monodispersed one. A concerted method with analytical-ultracentrifugation (AUC) and SAS, named as AUC-SAS, offers the precise scattering intensity of a concerned biomacromolecule in solution even with aggregates as well that of a complex under an association-dissociation equilibrium. AUC-SAS overcomes an aggregation problem which has been an obstacle for SAS analysis and, furthermore, has a potential to lead to a structural analysis for a general multi-component system.

S tructural investigations of biomacromolecules and their complexes in solution are essential to understanding physiological phenomena in biological systems. Several analytical methods have been developed and/or improved to address the investigations of such systems. Small-angle X-ray and neutron scatterings (SAXS or SANS) are classical techniques that give size information of particles in solution 1 . In addition, improved modern SAS gives further structural information: three-dimensional structure and structural fluctuation by combining scattering data with computational analyses, such as ab initio modeling 1,2 and molecular dynamics [3][4][5] , respectively. These analyses require high-quality scattering data, i.e. the data purely from the target biomacromolecule.
There is an intrinsic obstacle to satisfy this requirement. Because SAS offers an ensemble-averaged scattering intensity of all particles in solution, unspecified aggregates in a solution ( Supplementary Fig. 1a) will pollute the scattering intensityespecially the aggregates make abnormally upturn on the scattering intensity in the lower scattering angles ( Supplementary  Fig. 1b, c). In addition, a hidden problem has appeared 6,7 . We usually prepare for a purified sample in a structural analysis of a single molecule. However, even though the measured scattering intensity does not show such abnormal upturn and holds Guinier approximation 6 , it could include a small amount of aggregates and they should be removed to obtain correct structure parameters 7 . In other words, it is difficult to judge that there still remains a small amount of aggregates in a solution only with SAS: A highly purified sample ( Supplementary Fig. 2) looks nicely holds Guinier approximation but it included a small amount of aggregates. Practically, "revealing and removal of unspecified aggregates" has been one of the most significant challenges for SAS in many years.
The breakthrough for the "removal" is the development of size exclusion chromatography SAXS (SEC-SAXS) 7-9 which directly observes scattering of a size-separated particle solution eluting from a SEC-column. However, even with SEC-SAXS, we still have problems: demand of a relatively large amount of sample (>2 mg for a typical case), quick re-aggregation, and destruction of a weakly bound complex.
Analytical ultracentrifugation (AUC) 10 is an interesting technique for the "revealing" because it provides the concentration distribution of particles in solution with a small amount of sample (~0.05 mg) and is less destructive to complexes. Therefore, several previous studies used AUC to check the quality of SAS samples [11][12][13] . On the other hand, we have conceived in the advance to utilize AUC for the "removal": The SAS intensity of multi-component solution can be decomposed into those of components by utilizing the information of their concentration distribution obtained with AUC. In this way, the SAS intensity of a certain component, such as a monomer in solution, can be extracted from that of the whole solution even though the solution includes the unspecified aggregates.
By integral use of AUC and SAS, we have succeeded in extracting the SAS intensity of a specific monomer in a solution with also contained its aggregates. Furthermore, we have applied the method to obtain the SAS intensity of a weakly bound complex under an association-dissociation equilibrium. In this paper, we report on this newly developed "AUC-SAS" method.

Results
Extraction of monomer scattering (removing aggregation). A scattering intensity I mul (q) (q: magnitude of scattering vector) of a multi-component system is represented by where c, i mul (q), and n are total mass concentration, scattering intensity per c, and number of components, and c j and i j (q) are mass concentration and scattering intensity per c j for the jth component, respectively. Here, because i mul (q) is provided with SAS experiment as i exp q ð Þð¼ I exp q ð Þ=cÞ to solve a scattering intensity of a concerned component k, i k (q), the followings are required; "number of components, n", "concentrations of all components, {c j }", and "scattering intensity(s) except for the concerned component, i j (q) (j ≠ k)". AUC gives n and {c j }. Therefore, the remains are i j (q) (j ≠ k).
Firstly, our purpose is confined to extract the scattering intensity of the monomer (k = 1) from the scattering of the whole solution including unspecified aggregates (j ≥ 2). Accordingly, the task is to figure out the scattering intensities of unspecified aggregates i j (q) (j ≥ 2). In general, it is difficult to know these scattering intensities, individually. Here, we notice that samples for a general SAS are highly purified, for which the following three conditions hold in most cases. Condition (i): Highly denatured contaminates have already been removed by purification. The remaining aggregates are simple oligomers with the low aggregation degree (2 ≤ j ≤ 4 at most) and their total weight fraction is less than ca 10%. Condition (ii): The inner structures of aggregates are identical with that of the monomer because the aggregates are assumed to be simple homo-oligomers, such as assembly of neat monomers. Condition (iii): Guinier approximation is established for experimental scattering intensity ci exp (q). This is the dangerous point because this could mislead us that the solution is monodispersed as described in "Introduction".
Under these conditions, we have developed a method, which extracts i 1 (q) from i exp (q) including the unspecified aggregates. Here, we explain the developed five-steps protocol shown in Fig. 1, taking a purified bovine serum albumin (BSA) solution as an example (see "Methods" and Supplementary Fig. 2).
Step 1: SAS measurement Scattering intensity of multi-component system ci mul (q) is measured as ci exp (q) (open black circles in Fig. 2a). The mean gyration radius R ge and forward scattering intensity ci exp (0) are calculated with Guinier analysis (black circles and line in Fig. 2b and non-treated SAXS in Supplementary Table 1). If the scattering intensity deviates from the Guinier approximation ( Supplementary Fig. 1b, c) or the obtained R ge is abnormally large, further purification should be applied.
Step 2: AUC measurement Sedimentation velocity-AUC (SV-AUC) 14 is conducted for the same solution subjected to the SAS measurement. The measured sedimentation coefficient distribution c(s 20,w ) offers n and {c j } (Fig. 3, Supplementary Table 2).
Step 3: Forward scattering intensity of monomer, i 1 (0) The forward scattering intensity ratio t 1 of monomer c 1 i 1 ð0Þ to the whole forward scattering ci mul (0) is calculated with AUCmeasured {c j } as follows (Supplementary Note 1): The i 1 (0) is calculated as t 1 ci mul ð0Þ=c 1 ¼ t 1 ci exp ð0Þ=c 1 : ci mul (0) is provided as ci exp (0) with a SAS experiment for a multicomponent system as described before. Here, ci exp (0) and c 1 i 1 ð0Þ are black and blue squares in Fig. 4, respectively, clearly indicating that a few % of aggregates (5.9% in the example, BSA1) generates the excess scattering c a i a ð0Þ (¼ P 4 j¼2 c j i j 0 ð Þ in the example) (red bar in Fig. 4) in the observed whole ci exp (0).
Step 4: Scattering of monomer in the high q, i lh (q h ) It is assumed that the aggregates are simple associated homooligomers (condition (ii)). In this case, an inner structure of the aggregates is same as a monomer. Therefore, the scattering intensity of monomer in the high q-range, i 1h q h ð Þ, is almost identical to that of oligomer i jh q h ð Þðj ≥ 2Þ.
It should be considered that the difference between i lh (q h ) and i mul (q h ) arises toward lower q-range. Here, the consideration is briefly described as follows (Supplementary Figs. 3 and 4, and Supplementary Note 2 in detail). To estimate the lowest q * , where Eq. (3) holds, and to calculate the maximum difference at q * , the intensity ratio r(q) of whole scattering intensity to that of monomer is introduced as follows: Here, two simple models for oligomers are introduced: linearly aligned and closed packing oligomerization models (j ≤ 4; Supplementary Fig. 3). As the first assumption, the orientation of monomers in the oligomers is averaged and then their scattering intensities i j q ð Þðj ≥ 2Þ (Eqs. (S7)-(S12) in Supplementary Note 2-1) are calculated based on Debye function with monomer scattering intensity i 1 (q) (Eqs. (S6) in Supplementary Note 2-1).
As shown in Supplementary Fig. 4, r(q) with i j (q) and {c j } for the present example (BSA1) showed rapidly asymptotical approach to unity, and the deviation from unity is less than 1.8% in q * R g1 ≥ 1:0, where R g1 is the gyration radius of the monomer. Therefore, in the case with a few % of aggregates (5.9% in the example, BSA1), it is approximated to be Step 1: SAS measurement Step 4: Estimation of scattering intensity in high , * + scattering intensity(s) of measurable component(s) (e.g. and ) No (e.g. aggregation system) Inner structures of aggregates ( ) are identical to that of monomer ( ).

Yes
Step 3: Calculation of forward scattering intensity, → ⁄ Step 5A: Extraction of concerned scattering intensity, Calculation of scattering intensity with information of Steps1, 2 Whole mass concentration, Whole scattering intensity per mass concentration, Mass concentration of -th component, Scattering intensity per mass concentration of -th component, obtained?
Step q h > q * ¼ 1:0=R g1 (Fig. 5a). Accordingly, in the present AUC-SAS protocol, the initial (tentative) scattering intensity of monomer in the high q-range i lh (q h ) * is set to be i exp (q h ) (q h > q * ): The closed blue circles in Fig. 5b represents c 1 i 1h q h ð Þ * and open blue circles do extrapolation of c 1 i 1h q ð Þ * in the lower q-range (q < q * ).
Step 5A: Extraction of scattering intensity of monomer To obtain the scattering intensity of monomer in whole qrange, it is necessary to find the scattering intensity in low q-range i 1l (q l ) filling the gap between i 1 (0) calculated in Step 3 and i 1h q h ð Þ * ð¼ i exp ðq h ÞÞ set in Step 4. 5A-1. Setting the initial scattering intensity, i 1 (q) * : In general, i 1l (q) should satisfy Guinier approximation, holding the y-intercept of i 1 (0) (¼ t 1 ci mul ð0Þ=c 1 ). Therefore, the initial scattering intensity in low q-range i 1l q l ð Þ * is semi-empirically set as follows: Here, the initial gyration radius of monomer R * g1 is chosen for making a smooth connection of i 1l q l ð Þ * to i 1h ðq h Þ * at the   ( Supplementary Note 3-1). Then, the initial whole scattering intensity of monomer i 1 q ð Þ * is provided with i 1l q l ð Þ * ðq l < q c Þ and i 1h q h ð Þ * ðq h ≥ q c Þ ( Supplementary Fig. 5). 5A-2. Refinement of the whole scattering intensity, Rði 1 ðqÞ * Þ): There is a possibility that the initial whole scattering intensity i 1 q ð Þ * involves errors caused from the semi-empirically obtained i 1h q h ð Þ * and i 1l (q 1 )*. In order to refine i 1 q ð Þ * , we utilized the expanded Guinier formula which holds to a relatively higher q-range: The expanded Guinier formula is prepared with the polynomial expansion of Debye formula 15 (Supplementary Figs. 6  and 7, Supplementary Notes 3-2 and 3-3). The final result i 1 (q), the full scattering profile of monomer extracted by AUC-SAS, is shown with closed blue circles in Fig. 2a and Supplementary  Fig. 8. The structural parameters extracted by the above procedure completely agree with those from SEC-SAXS as listed in Supplementary Table 1, and the derived three-dimension ab initio model 2 well reproduced the crystal structure (Fig. 2a). The AUC-SAS has advantages over the SEC-SAXS to obtain the monomer scattering intensity from solution including aggregates. A required sample amount for AUC-SAS is 0.1-0.25 mg of proteins (1-3 mg/mL in 30 μL for SAXS and 50 μL for SV-AUC) whereas that even for the recent high performance SEC-SAXS (https://www-ssrl.slac.stanford.edu/smbsaxs/content/documentation/sec-saxs/introduction) is 0.2-0.5 mg of proteins (4-10 mg/mL in 50 μL). Furthermore, AUC-SAS has better resolution for molecular separation than SEC and does not make it problem that the monomers quickly re-assemble after the separation with SEC. However, there is also limitation to provide correct result with the present AUC-SAS protocol about the upper concentration of aggregates around 12%. This is mentioned in Supplementary Note 4 (Supplementary Figs. 11-13 and Supplementary Table 3).
Extraction of complex scattering. AUC-SAS can extract the scattering intensity of a complex under an association-dissociation equilibrium. It should be noted that SEC-SAXS is unavailable to observe a weakly bound complex due to destruction of it by the SEC process ( Supplementary Fig. 14). To the contrary, AUC has an ability to reveal concentration distribution of all components under an association-dissociation equilibrium even though the complex is weakly bounded. Considering the equilibrium system of A + B ↔ AB, Eq. (1) is explicitly rewritten as AUC-SAS protocol is slightly modified to extract i AB (q) from I exp (q). Here, i A (q) and i B (q) are known with their individual SAXS measurements (Step 1) and then Steps 3 and 4 are skipped. Therefore, the key subject is to know the accurate concentrations for all components with AUC (Step 2) and combined analysis (Step 5B). Here, the modified protocol is demonstrated with an association-dissociation equilibrium system of hHR23b-UBL (component A) and PNGase-PUB (component B), which makes a weakly bounded complex 16 .
Step 2: AUC measurement SV-AUC cannot provide the concentrations of all components (c A , c B , and c AB ) for the system under the fast association-dissociation process ( Supplementary Fig. 16, Supplementary Note 5). Therefore, the concentrations of all components should be calculated with the dissociation constant K D by measured with the sedimentation equilibrium-AUC method (SE-AUC). Here, K D for the demonstration system were measured at three concentrations and with three rotation speeds by SE-AUC and c A , c B , and c AB were obtained (Fig. 6b,  Supplementary Fig. 17, Supplementary Table 5, and Supplementary Note 6). Here, it is important that the absence of aggregates should be confirmed with SV-AUC (Supplementary Note 5) prior to SE-AUC measurements.
Step 5B: Extraction of scattering intensity of complex The scattering intensity of complex i AB (q) (blue closed circles in Fig. 6a) is obtained from Eq. (6) with the intensities and concentrations obtained in Steps 1 and 2. The extracted scattering intensity was subjected to three-dimension modeling as well as the size analysis ( Supplementary Fig. 18, Supplementary Table 4, Supplementary Note 7). This is the first report of the detailed structural information of this complex in solution.

Discussion
Recently, SAS is required to study biomacromolecular structures in more complicated multi-component solution, for example, an association-dissociation equilibrium system involving aggregates. As an advanced example for demonstrating this technique, the nucleosome was measured with AUC-SAS (Supplementary Note 8). The sample was highly purified but still included small amounts of aggregates, liberated histone complex, and DNA ( Supplementary Fig. 19a, Supplementary Table 6). Therefore, both techniques, removing aggregation and complex scattering extraction, were required to find the precise scattering intensity of the nucleosome in solution. As shown in Supplementary Fig. 19b, c and Supplementary Table 7, AUC-SAS succeeded to provide the scattering intensity of nucleosome and, using it, the 3D-structure was reconstructed (Supplementary Fig. 20).
For many years, aggregation has been a real difficult obstacle in bio-SAS. AUC-SAS overcomes this aggregation problem and offers a precise scattering intensity of a concerned biomacromolecule. In these days, the structural analysis of oligomers 17 (Supplementary Note 8). Furthermore, AUC-SAS also embraces structural analysis of a weakly bounded complex. In conclusion, AUC-SAS has a potential to become one of the standard methods to analyze structures of biomacromolecules in solution as shown in Supplementary Fig. 20.
Finally, we would like to remak the following. AUC-SAS does not require the very high-intensity beam for a sample-flow experiment such as SEC-SAXS. Therefore, AUC-SAS has a potential to be a complementary method for a laboratory-based SAXS and a standard SANS to synchrotron-based SEC-SAXS for structural analysis of biomacromolecules in solution.

Methods
Samples. BSA (product# A2153), ovalbumin (OVA; product# A5503), and apoferritin (AF; product# A3641) purchased from Sigma Aldrich Co. were dissolved in 100 mM Tris/HCl (pH 7.5) buffer containing 100 mM NaCl. The solutions were purified by the anion-exchange chromatography with Resource Q 6 mL column (GE Healthcare) followed by the size exclusion chromatography with Superdex 200 increase 10/300GL column (GE Healthcare). The mass concentrations subjected to SAXS and AUC measurements were 2.29 mg/mL for BSA1 (used for demonstration of AUC-SAS protocol in the main text), 3.15 mg/mL for BSA2-4 (used for concentration boundary check in Supplementary Note 4), 2.00 mg/mL for OVA, and 1.81 mg/mL for AF, respectively. A quality of BSA1 was checked with SDS-PAGE. As shown in Supplementary Fig. 2a, no clear aggregation and contamination were observed in the sample.
The mixture solution of ubiquitin-like domain of the proteasome shuttle factor hHR23b (hHR23b-UBL; PDB code, 1P1A) and PUB domain of peptide:Nglycanase (PNGase-PUB; PDB code, 2D5U) were utilized as a demonstrated system which forms a weakly bounded complex under an association-disassociation equilibrium. The molecular weights of hHR23b-UBL and PNGase-PUB are 9.5 and 12.5 kDa, respectively. The expression and purification of the proteins have described previously 16  The nucleosome was prepared as the same manner in the previous report 18 . The sample was dialyzed against 20 mM Tris/HCl (pH7.5) buffer containing 50 mM NaCl and 1 mM DTT. The mass concentrations subjected to SAXS and AUC measurements were 1.29 mg/mL.
Small-angle X-ray scattering. All SAXS measurements were carried out with a laboratory-based instrument NANOPIX (Rigaku) equipped with high-brilliance point-focused generator of a Cu-Kα source (MicroMAX-007 HFMR, wavelength (λ) = 1.54 Å). The scattered X-rays were detected using a two-dimensional semiconductor detector (HyPix-6000) with the spatial resolution of 100 μm. The sampleto-detector-distances were set to be 1280 mm (covered q-range: 0.010-0.20 Å −1 ) for BSA, OVA, AF, and nucleosome experiments, and 355 mm (covered q-range: 0.030-0.70 Å −1 ) for PNGase-PUB + hHR23b-UBL and nucleosome experiments, respectively. Two-dimensional scattering pattern was converted to a onedimensional scattering intensity with SAngler software 19 . After the correction by the transmittance and subtraction by the buffer scattering, the absolute scaled scattering intensity was obtained by referring to a standard scattering intensity of water (1.632 × 10 −2 cm −1 ) 20 . All measurements were conducted at 25°C.
Analytical ultracentrifugation. AUC measurements were conducted with a Pro-teomeLab XL-I analytical-ultracentrifuge (Beckman Coulter). The cell with a small volume (optical path: 1.5 mm, volume: 50 μL, Nanolytics) was used for the measurements. Two measuring methods, sedimentation velocity-AUC (SV-AUC) and sedimentation equilibrium-AUC (SE-AUC), were conducted depending upon the sample situations. The former measures sedimentation speed of particles and gives sedimentation coefficient distribution c(s 20,w ). The later observes a concentration gradient under sedimentation equilibrium, which provides a dissociation constant K D for an association-dissociation system. The sample solutions were loaded at 50 μL for SV-AUC and 20 μL for SE-AUC. SV-AUC was performed using Rayleigh interference optics at 40,000 r.p.m. of rotor speed for BSA, OVA, AF, and nucleosome, and at 60,000 r.p.m. of rotor speed for hHR23b-UBL, PNGase-PUB, and their mixture, respectively. SE-AUC was carried out using absorbance optics at 20,000, 30,000, and 35,000 r.p.m. for PNGase-PUB and hHR23b-UBL. All measurements were conducted at 25°C. The SV-AUC data were analyzed with SEDFIT (http://www.analyticalultracentrifugation.com/sedfit.htm) software which executed the fitting with Lamm formula 14 . The sedimentation coefficient was converted to the value at 20°C in pure water (s 20,w ). The molecular weight for each component was calculated using the peak s 20,w and friction ratio f r . The weight fraction c j for each component was obtained from the peak area. The SE-AUC data were analyzed with SEDPHAT (http://www.analyticalultracentrifugation.com/sedphat/default. htm) software, which conducts the fitting to the SE-AUC results with the association-dissociation equilibrium model as follows 21 : where a(r, K D ) is an absorbance at radius r for the equilibrium system with the dissociation constant K D , ε is the extinction coefficient, c(r 0 ) is the concentration at the reference radius r 0 , ω is the angular velocity, M is the molecular weight, v is the partial specific volumes, ρ is the solvent density, and RT is the multiplication of the gas constant and absolute temperature. For accurate determination of the free parameters, we carried out the global fitting analysis for the three different concentrations and three different rotation speeds with same K D . For the analysis, the partial specific volumes ( v) of each protein were calculated from their amino acid sequences with SEDNTERP (http://www.jphilo.mailway.com/download.htm) software. The density and viscosity of solvents were measured with the density meter DMA4500M (Anton Paar) and the viscometer Lovis 2000 M/ME (Anton Paar), respectively.
Statistics and reproducibility. The fittings for "Guinier formula" to derive i(0) and R g were performed with the linear least-square method by Igor Pro (7.04) and the fitting for SE-AUC to derive K D was done with the non-linear least-square method (Levenberg-Marquardt algorithm) also by Igor Pro (7.04). The errors were calculated considering the error propagation theory with the χ 2 , which are listed in Supplementary Tables 1, 3, 4 and 7.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Source data for Figs. 2, 3, 6a, b are included in Supplementary Data 1-4, respectively. Any other data in the supplementary materials are available from the authors upon reasonable request.