Automated Pipeline for Purification, Biophysical and X-Ray Analysis of Biomacromolecular Solutions

Small angle X-ray scattering (SAXS), an increasingly popular method for structural analysis of biological macromolecules in solution, is often hampered by inherent sample polydispersity. We developed an all-in-one system combining in-line sample component separation with parallel biophysical and SAXS characterization of the separated components. The system coupled to an automated data analysis pipeline provides a novel tool to study difficult samples at the P12 synchrotron beamline (PETRA-3, EMBL/DESY, Hamburg).


S1. Automated SAXS data collection and analysis.
For automatic data collection and analysis, the pipeline currently in operation at P12 1 was extended with additional modules specifically for SEC-SAXS/TDA experiments (Fig. 1c). User intervention for the individual runs is, thereby, kept to a minimum. The user simply loads the sample to the chromatography system (recording the volume), abides to the standard safety measurements, and sets the interlock system of the experimental hutch before starting the data collection (SYNC). The required input parameters primarily consist of the name of the sample, the desired exposure time (default 1 sec) and number of frames (typically between 1000-4000 frames), which depends on the chromatographic separation step, i.e. volume of the column and reduced delivery flow-rate (typically 0.2-0.3 ml/min).
Submission of the run parameters to the beamline meta server (BMS) results in the instant collection of TDA data. The user can decide, if the SAXS data should be collected immediately or closer to the time point at which the protein of interest is expected to elute. For the latter, the possibility to tell the BMS to wait for the detection of a rise in RI signal before triggering data acquisition has been implemented.
During the SEC-SAXS/TDA run, the BMS checks a configurable file system location for incoming data files and commences the radial averaging of two-dimensional images (2D) to one-dimension curves (1D) with incorporation of the information for the header (txt). Once a run is completed and all the 1D files are generated, the data processing pipeline is started. The first step in data processing is the subtraction of the solvent blank to obtain the scattering from the macromolecules. For this, frames comprising only buffer components are identified (BUFFER). Frames collected at the beginning of the run are evaluated in terms of their probability of similarity by comparison of the respective correlation maps (CORMAP). 2 In some cases, frames closer to the elution peak are more suitable for buffer subtraction and can be determined by inspection of the RI signal, which is very sensitive for changes in buffer during the SEC run. Statistically similar data frames are averaged (DATAVER) and then subtracted (DATOP) from each acquired data frame to produce reduced scattering profiles (Reduced). Running AUTORG for each frame allows the determination of the forward scattering I(0) and Rg, which can be plotted against the respective frame number from the automatically generated I(0) vs frame csv file. This plot can be used to evaluate the successful outcome of the experiment (e.g., stability of Rg across an elution peak (Fig.   2b)). This plot is also used to correlate the SAXS data with the RITDA data (CORR). The TDA elution profile data is adjusted to the same volume scale as the number of SAXS data frames by shifting the trace along the x-axis to obtain an overlay of the I(0) with the RI elution profile which is directly proportional to the concentration of the eluting components. At this point it should be noted that the TDA output file can be automatically generated with the Omnisec software (Malvern Instruments Ltd., Malvern, UK). It is, however, recommended that the user check and process the TDA data with the available Omnisec software, as the accuracy of the molecular weight estimation (MWRALS) and concentration improves by manually setting baselines and integration limits. Correlation of the SAXS and TDA data allows RI concentration to be extracted for each frame, so that the respective scattering profile can then be

S2. Comparison of different methods for molecular weight estimations.
The emphasis of this research is to provide a method to separate components of polydisperse systems into their respective monodisperse components so as to increase the confidence in MW determinations and the subsequent analysis of measured SAXS data. Assessing MW is a key step for the interpretation of biological solution SAXS data. 9 The MW estimates of the components of a sample can be derived and cross-correlated with the estimates from SAXS by collecting additional biophysical data of the analysed components using the TDA. It is important to understand on what assumptions these MW estimates are made and define their accuracy.

S2.1. SAXS-based MW estimates.
The molecular weight (MW) of a component can be estimated from the SAXS data via:

S2.1.1. The forward scattering intensities at zero angle, I(0).
The scattering intensity at zero angle I(0) obtained from the Guinier approximation 3,4 or from the realspace distance distribution, p(r) 10 , is directly associated with the volume and scattering length density of a particle and can thus be used to determine the molecular weight of monodisperse samples. The determination of the MW from I(0), MWI(0),can be performed by calibrating the sample scattering relative to a standard with the same scattering length density. For example, the I(0) determined from standard proteins with known molecular weights such as lysozyme 11 , bovine serum albumin 12 or glucose isomerase 13 can be used for calibration and estimation of the molecular weights of protein samples. This estimation is, however, strongly dependent on accurate determination of the concentrations of the protein sample as well as the standard protein used for the calibration. The accuracy of the MW determination using this method has been estimated to be around 10-15%. 14 Similar, Lupolen 15 or water 16 can be used for an absolute calibration of the scattering intensities (I(q), cm -1 ), however, here the estimation of the partial specific volume of the protein (that can be calculated from the primary amino acid sequence, e.g., using NucProt 17 may incur a source of error. For the data collected with the SEC-SAXS/TDA set-up described here, I(0) values automatically determined from each processed frame were normalized based on a batch measurement of BSA at known concentration. MWI(0) was then determined by deriving I(0) from the final PEAK scattering profile combined with the corresponding concentration determined from RI measurements from the TDA.

S2.1.2. From excluded volume MWVp and MW3D
The MW estimations based on Porod volume (Vp, MWVp) or from a derived ab initio bead model (MW3D) are computed without the necessity to normalize the SAXS data against a known standard.
Consequently these MW estimates are not dependent on accurate concentration estimates. However, a number of other factors such as particle anisometry and flexibility influence the relationship between MW and the excluded volume. 3

S2.2. MW estimates from standardized SEC and with the described TDA set up.
The molecular weights estimated from the SAXS data can be validated against the MW estimates from SEC, either using a standardized SEC-column or from RI(UV)/RALS data obtained from the TDA.

S2.2.1. Standardized SEC column MW estimates (MWSEC).
The separation of components with size exclusion chromatography relies on different migration behaviour of molecules through a media of porous beads. 19 Thus, small and globular proteins penetrate these pores more easily, and elute from the column with an increased retention time compared to larger molecules. Any SEC column can be calibrated with 4-5 standard proteins and comparison of the elution volumes can provide a rough estimation of the MWSEC, without the need for RALS measurements. However, separation relies on the hydrodynamic volume and not MW and the migration behaviour can be altered through interactions of the column matrix with the mobile phase.
Consequently, the MW estimation from a calibrated SEC column, based only on retention time, can be erroneous. For example, PlaB, analysed in this study, displays an increased retention volume (main peak at 10.8 ml). The elution volumes of the different oligomeric components of BSA with this set-up are: monomer (66 kD) at 12 ml, dimer (132 kD) at 10.5 ml, and trimer (197kD) at 9.45 ml. Thus, according to the PlaB elution profile, one would draw the conclusion that it is a dimer (Fig. 2a). However, when the MW is determined from both SAXS data and from RI/RALS measurements from the TDA, PlaB is unambiguously determined as a tetramer.

S2.2.2. MW estimates from the TDA data (MWRALS)
The inclusion of the TDA for molecular weight validation of the components of a sample eluting from a SEC column overcomes the issues of relying on column retention times to estimate the molecular mass of a sample. For particles in the size range of proteins, Rayleigh light scattering principles essentially apply so that scattering intensities recorded at 90° (RALS) relate to the size of a macromolecule in solution. Similar to SAXS, these RALS intensities have to be normalized to particle concentration and calibrated against a MW standard (e.g., BSA) with a known concentration and injection volume to obtain a MWRALS estimate. With the correlation between the RALS signal and RI (or UV) concentration MWRALS can be estimated across an elution peak independently from the retention volume and the stability of this MW estimate can be assessed across the peak. The error in the MW determination depends on the stability of the RI(UV)/RALS correlation and the chosen range used to perform the calculations. The range of error is between 5-10% in our experience.

Summary of various methods for MW estimation Basic principle
Major source of error MWI(0) Forward scattering at zero angle is proportional to the volume squared and concentration of the solute and thus relates to the molecular weight of the solute.
Requires accurate estimation of solute concentration and knowledge of the partial specific volume. Contaminations, especially large species, contribute to the scattering.

MWVp
For globular proteins MW is proportional to the excluded volume of hydrated particles, which can be determined from the Porod invariant. 3 Ratio between volume and MW is not always consistent, especially for anisotropic or unfolded/flexible particles. Quality of MW estimates are dependent on the angular range of the collected data.

MW3D
For globular proteins MW is proportional to the volume of hydrated particles, which can be determined from 3D reconstructions.
Ratio between volume and MW is not always consistent. The V3D/2 = MW3D relation becomes inaccurate if a particle is comprised of a scattering length density different to a protein or a mixture of scattering length densities (e.g., protein/DNA complex).

MWSEC
Comparison of migration through a mobile phase consisting of porous particles.
Separation based on hydrodynamic volume. Influenced by attractive or repulsive interactions between the column matrix and mobile phase.     OD absorption was determined with Thermo Scientific Nanodrops (ND-1000) and correlated with RI measured with a table top RUDOLPH Research Analytical J357 Refractometer for a number of proteins. The correlation constant was derived with linear regression analysis .Future measurements with "tricky samples" will be required to conclude if an assessment of the alteration of this correlation constant leads to valuable information about the sample composition.