Introduction

Membrane proteins (MPs) are crucial for the normal functioning of an organism as they are involved in essential processes including signalling, nutrient and ion transport, maintaining biological membrane structure and integrity1,2,3,4. They are encoded by about 30% of the human genome5 and are major targets for modern therapeutic drugs. This makes structural studies of MP solutions extremely important both scientifically and medically6,7,8,9,10.

The defining structural feature of an integral MP is the presence of a large contiguous apolar surface, facilitating the insertion of the protein into the hydrophobic lipid alkyl chain interior of a biological membrane. In order to characterize MPs in vitro, one needs to substitute the natural phospholipid bilayer with an artificial amphiphilic environment that supports the functional state. At present the majority of investigators studying MPs utilize readily available and inexpensive surfactant systems or detergents (small amphiphilic molecules)6,8. However, a common problem observed is the destabilization of protein structure following exposure to detergent molecules11,12. Fortunately, the development of milder detergents and detergent stabilized bilayer systems (bicelles) has helped to overcome this problem7,8,13,14. Importantly, the solution behaviour of many commercially available detergents has been well characterised using a range of techniques, including small angle X-ray scattering (SAXS)15. Micellar solutions of such detergents have been shown to adopt characteristic sizes, which can be reliably separated from detergent-protein complexes of interest using chromatographic techniques, eg. high performance liquid chromatography (HPLC).

Recently, MPs have become increasingly popular objects for small angle scattering investigations16,17,18,19 thanks to the availability of improved online size-exclusion chromatography (SEC-SAXS) set-ups at dedicated SAXS synchrotron beamlines. Integrated HPLC facilitates simultaneous sample purification and X-ray data acquisition, separating the scattering signal from protein, detergent and protein-detergent complexes (PDCs) that appear in the elution profile. Direct measurement of the mobile phase (ie. buffer) and elution peaks corresponding to the macromolecule(s) of interest enable a rapid and robust sample characterization following a well-defined sequence of data reduction procedures20. These procedures are amenable to automation and can be further coupled to advanced analysis and modeling approaches for MP systems, such as those detailed below.

Reconstruction of protein shape from a detergent solubilised MP is hampered by the strong scattering signal of bound detergent, typically forming a corona around the lipophilic trans-membrane region. The corona consists of hydrophobic and hydrophilic components, with significantly different scattering/electron densities. Thus the results of standard SAXS modelling procedures are, in such cases, non-realistic and ambiguous. Using complementary data from small-angle neutron scattering (SANS) including contrast matching approaches to isolate scattering from each solution component, the ambiguity of models calculated from detergent solubilised MPs can be reduced21,22,23,24. It was recently shown that SANS data recorded at several contrast points together with an a priori estimate of detergent aggregation number, allow one to reliably reconstruct the shapes of MPs in detergent solutions24. In this publication it was suggested that a minimum of two SANS curves, collected under different contrast conditions, are required to obtain a stable solution. This requirement is necessary to isolate the signals from the two principal contributions to the scattering intensity, from the protein itself and from the hydrophillic part of the detergent corona. For this purpose the author developed an ab initio algorithm with a four-phase bead model for a protein: (1) protein, (2) detergent heads, (3) detergent tails, and (4) solvent24. An additional set of geometrical and symmetrical constraints was also required.

In comparison to shape reconstruction from multi-contrast SANS data, reconstruction of MPs from a single SAXS curve is significantly more challenging as the contributions from different phases are difficult to separate. Nevertheless, in view of the relative ease of access to X-ray facilities and lower sample consumption, compared to that of neutron sources, SAXS-based approaches to MP characterization are increasingly popular. A recent work on the tetrameric water-channel protein Aquaporin025 showed that if an atomic structure of the protein is available, it is possible to reconstruct the entire complex by combining online SEC-SAXS with refractometry measurements. In that case, it was essential to use the available a priori information about the PDC in order to provide sufficient restraints on the model for a meaningful reconstruction. In the present paper we further formalize this approach and develop an automatic pipeline to build models of PDC utilizing a priori information taken from a dedicated database.

Given the rapidly growing interest in the applications of SAXS to study MPs, systematic storage of the relevant information is urgently required. The ISPyB information system26, a database dedicated to biological SAXS (BioSAXS) and crystallographic experiments on synchrotron beamlines, offers an appropriate framework. ISPyB is already employed at major European synchrotrons (Petra-III, ESRF, SOLEIL, DIAMOND, MAX IV) to facilitate the data flow beginning with the sample tracking and ending with its structural characterization and visualization of the generated models.

In order to be used in this project, the ISPyB database was extended to store information on the available high-resolution structures of MPs (e.g. homologs or crystallographic models), amino-acid sequences in FASTA format and chemical formulae of the given detergent tails and heads. The latter are used for the calculation of electron densities and approximate sizes of the detergent corona in the modelling. The electron density values can also be provided directly by the user for the cases when the automatic calculations are impossible or the values are not considered to be sufficiently accurate. Depending on the amount of a priori information, the pipeline decides whether it is possible to automatically reconstruct the model and, if so, builds it utilizing either the multiphase ab initio program MONSA27 or the hybrid modelling program MEMPROT28. The starting parameters for the relevant fitting procedures are evaluated based on the available data stored in ISPyB, with the hybrid approach25 executing upon detection of a high resolution protein component. In the absence of a high resolution protein structure the ab initio reconstruction is executed utilizing information about the amino-acid sequence of the MP, chemical formula of the detergent and additional data on the other solvent/solute components to assess the shape of the PDC.

The automated membrane protein pipeline (AMPP) described herein, is implemented at the P12 beamline of the EMBL at Petra-III storage ring (synchrotron DESY, Hamburg, Germany) as an extension of a standard SEC-SAXS pipeline SASFLOW. It is integrated into the control software BECQUEREL29 operating the beamline. If the MP-related fields in the ISPyB database are lacking, BECQUEREL runs a standard pipeline mode for soluble proteins, using the program DAMMIF30 for a single-phase ab initio shape determination. The MP-related extension starts by default if the connection with ISPyB is established by the control software, and the membrane-specific fields in the ISPyB database are filled. The system has been optimized through extensive testing on different MPs in detergent solutions, and several examples of its applications are presented below.

Materials and Methods

Materials

The published SAXS data of the tetrameric α-helical membrane channel, Aquaporin0 solubilized by n-Dodecyl β-D-Maltoside25,28 was used for the pipeline testing. This sample corresponds to a reliably reconstituted PDC with known geometrical parameters of a detergent corona providing a test case with the modeling program MEMPROT28. The model of the Aquaporin0-detergent complex24,25,28 was used as a reference for the models generated by AMPP. Additional pipeline tests were conducted using the experimental SEC-SAXS data for the mechanosensitive T2 channel solubilized n-Dodecyl β-D-Maltoside31 downloaded from the SASBDB database (www.sasbdb.org, ID SASDCY6)32.

Computational methods

The AMPP is implemented as an extension of SASFLOW33, an automated data reduction and analysis pipeline currently running at P12. AMPP utilizes several modules and modeling programs available in the ATSAS package34 and, in addition, calls the program MEMPROT28. The primary data analysis is executed directly upon completion of a SEC-SAXS data acquisition and proceeds in the following workflow (Fig. 1):

  1. (1)

    Radial integration of 2D SAXS images by RADAVER33 (typically 1000–3000 frames are collected for a standard SEC-SAXS run set with one second exposure per image/frame).

  2. (2)

    Determination of the buffer and sample ranges from the elution profile using CHROMIXS35, an important analysis component of the pipeline specifically designed for SEC-SAXS analysis. CHROMIXS automatically identifies both buffer and sample regions in the elution profile, and performs frame averaging and subtraction to produce the SAXS curve of the purified component for subsequent analysis (Fig. 2). It is worth noting that CHROMIXS is capable of identifying multiple elution peaks, providing the users with a set of distinct SAXS curves from individual fractions, if present. Here we consider the simple case, where a single peak corresponding to PDC is identified automatically by CHROMIXS.

  3. (3)

    Assessment of the overall quality of the SEC-SAXS data and useability of the sample. Reliable SEC-SAXS data from a PDC suitable for further analysis should fullfil several requirements: i) the main peak in the CHROMIXS elution profile should correspond to a monodisperse fraction only, yielding similar scattering profiles across the entire sample region of the chromatogram; ii) the final buffer subtracted 1D SAXS profile from the PDC must satisfy the standard Guinier criteria (eg. 0.5 < sRg < 1.3)36,37. These important points have been integrated into a single figure-of-merit parameter K:

    $$K=100 \% \cdot {c}_{v}^{aver}\cdot \langle {\chi }^{2}\rangle $$
    (1)

    Here cvaver is the coefficient of variation (\({c}_{v}=\frac{\sigma }{\mu }\), where µ is the mean value and σ is the standard deviation) of the Rg value derived from the curve generated by CHROMIXS (averaged and buffer subtracted) employing Guinier approximation; 2 > denotes the mean square weighted deviation of the frames across the chromotographic peak with respect to the peak frame with maximum integral intensity:

    $$\langle {\chi }^{2}\rangle =\frac{1}{m}\mathop{\sum }\limits_{l=1}^{m}\frac{1}{n-1}{\mathop{\sum }\limits_{k=1}^{n}\left(\frac{{I}_{p}({s}_{k})-{c}_{l}\cdot {I}_{l}({s}_{k})}{\sqrt{{\sigma }_{p}^{2}({s}_{k})+{c}_{l}^{2}\cdot {\sigma }_{l}^{2}({s}_{k})}}\right)}^{2}$$
    (2)

    where n is the number of points in the frame and m is the number of frames in the peak. Ip(s) and σp(s) denote the intensity and standard deviation of the frame with maximal integral intensity, and Il(s) and σl(s) are intensities and standard deviations of all the other frames across the candidate monodisperse region; cl is the scaling coefficient; \(s=\frac{4\pi \,\sin \,\theta }{\lambda }\,\)(where θ is the scattering angle and λ is the wavelength) is the scattering vector. For the statistically identical scattering profiles, the 2 > value tends to unity, whereas the cvaver value for the good-quality averaged SAXS profiles tends to zero. The K value integrates the two contributions into a single criterion to characterize the overall quality of a SEC-SAXS dataset. In practice, K values < 10% indicate reliable data quality; K ≥ 30% indicates that the data is of low quality and should be treated with caution. AMPP generates a corresponding notification and/or warning message based on this analysis to aid the user in data interpretation. In the case of the elution of multiple peaks, for example those corresponding to the PDC and also that of empty detergent micelles, a K value is determined for each SAXS profile generated.

  4. (4)

    Calculation of the overall parameters (radius of gyration Rg, maximum size Dmax, excluded volume VPorod, molecular weight MW) and the generation of relevant plots (Guinier plot: ln I(s) vs s2; Kratky plot: I(s)∙s2 vs s; and the real-space distance distribution function: P(r) vs r) for all subtracted curves. An example of these plots for the mechanosensitive T2 channel data is shown in the generated summary table (Fig. 3). These parameters are employed for the evaluation of the search volume to conduct the ab initio reconstruction using MONSA or for the hybrid MEMPROT fitting procedures.

  5. (5)

    Querying and collecting the available a priori information from the ISPyB database via dedicated Simple Object Access Protocol (SOAP) web services. Depending on the results, one of the three modeling paths is executed for the subsequent modeling (Fig. 1).

Figure 1
figure 1

Workflow of the pipeline extension for MPs. The pipeline utilizes a priori data from the ISPyB database, calculates overall SAXS parameters and decides, which software to employ for the subsequent analysis.

Figure 2
figure 2

An elution profile of T2 membrane protein, built by CHROMIXS from 1600 consequent SEC-SAXS data sets (average intensity versus frame number). The frames corresponding to the buffer and sample are automatically identified (marked by red and green, respectively), averaged and subtracted one from the other. The resulting curve is displayed in PRIMUS interface (inset).

Modeling Path 1: High resolution model of the protein and chemical formulae of the detergent heads and tails are available

Initial estimates of the length of the detergent tail and head components are done using the Tanford formula for the detergent tail length: lc = 1.5 + 1.265nc, where nc is the total number of carbon atoms. The starting parameters for the evaluation of the corona geometry using MEMPROT28 are determined as follows:

For a spherical (ellipticity, e = 1) detergent torus with the height (a) and cross-sectional axes (minor = b/e, major = b∙e) equal to the length of a detergent tail (Tanfordtail), the geometrical parameters are set as: a = b = t = Tanfordtail28 (Fig. 4), where t is the thickness of the hydrophilic detergent head group. Although the hydrophobic tail of a detergent is typically slightly longer than its hydrophilic head, this assumption appears to be a reasonable first approximation for the further refinement.

Figure 3
figure 3

An example of the automatic SAXS analysis of T2 in detergent solution at the P12 beamline (PETRA-III). The sample and buffer ranges of frames on SEC-SAXS chromatogram as determined automatically by CHROMIXS are shown in the Frame# field. In this study, a high resolution model was available in the ISPyB database and the modeling was performed by MEMPROT. A thumbnail of the obtained model is displayed in the DAM field.

Figure 4
figure 4

Starting geometrical parameters of a corona for MEMPROT modeling are calculated based on the SAXS curve and a priori information from ISPyB. The hydrocarbon tails of the detergent are modelled as an elliptical hollow torus of the height a and cross-sectional minor and major axes b/e and be, where e is an ellepticity of the torus (adapted from [Perez et al., 2015]). The figure is generated using the PyMol Molecular Graphics System, version 1.7.4. Schrödinger, LLC (www.pymol.org).

AMPP employs the MEMPROT minimization procedure, refining the starting parameters within a limited range, estimated empirically based on several tests conducted on different MPs. The a, b and t parameters are searched in the ±1/3(Tanfordtail) range, while ellipticity and rotation parameters are kept fixed in order to speed up the computations. The hydrophobic radius of the model is taken as a + t = 2∙Tanfordtail, which is in a good agreement with the statistical mean value of transmembrane thickness DTransmemb ≈ 30 ± 3 Å, obtained from the OPM database38 (https://opm.phar.umich.edu/types/1). The maximum value of DTransmemb is checked during the modelling such that it does not exceed 45 Å.

Modeling Path 2: Electron densities of a protein and chemical formulae of detergent heads/tails are known, but the high resolution model of the protein is not available

The electron densities can be either provided directly via the ISPyB interface or calculated from the protein primary sequence (in FASTA format) or from the chemical formula using the built-in look-up tables39,40. Knowledge of the detergent chemical formula is mandatory as it provides additional information to constrain the geometry of the MP model. When the high resolution models are not available, ab initio low resolution bead modeling is performed by MONSA27, allowing for three phases with distinct contrasts corresponding to the protein (phase 1), detergent tails (phase 2) and detergent heads (phase 3). For each phase the contrast is calculated and the search space confined to a cylinder and two coaxial spherical hemi-tori (Fig. 5A), thus avoiding contacts between the hydrophobic regions and solvent in the final model. A spherical region is fixed for the protein phase in the vicinity of the origin of coordinates, while the rest of the cylindrical region is free to become either protein or solvent. Penalties for the model discontinuity and looseness are applied during the simulated annealing procedure as described in the original paper27. Three additional boundary regions between the phases (Fig. 5A) have a thickness of a few Angstrom allowing MONSA to select the optimum phase assignment during the fitting. The boundary beads (eg. between phases 1 and 2) are allowed to be assigned to either phase 1 or phase 2 (Fig. 5A). The model is sought within a spherical search volume of diameter d = Dmax, where the protein can be surrounded by two-stacked tori (detergent tails and heads) with the lengths evaluated from the Tanford formula (Fig. 5B). The transmembrane diameter is calculated as:

$${{\rm{D}}}_{{\rm{memb}}}={{\rm{D}}}_{{\rm{\max }}}-2\cdot {{\rm{Length}}}_{({\rm{head}})}-2\cdot {{\rm{Tanford}}}_{({\rm{tail}})}$$
(3)

where Length(head) is considered to be the same as the Tanford(tail), given that for the most common detergent molecules it is typically smaller or equal to that of the tail. The expected values for toroidal volumes and Rg values of detergent phases are calculated on the fly from geometrical considerations41 and serve as additional constrains while modelling. The total computed volume and Rg of the particle is then compared to the experimental VPorod and Rg, obtained from step 2. If available, information from complementary methods (e.g. phase volumes or phase Rg) can be used to improve the reliability of the final model. Using a spherical 40 beads-per-diameter MONSA grid, together with the above mentioned geometrical restrictions, low resolution models were obtained in a reasonable processing time about 10–15 minutes on a desktop PC (Intel Xeon CPU 3.7 GHz 8 Cores, 12.5 GB RAM). This algorithm is generally applicable to the reconstruction of both transmembrane and membrane-associated proteins. In the case of membrane-associated proteins one can simply shift the fixed “protein” phase to the edge/surface of the detergent disc and extend the protein/solvent phase cylinder outside the entire detergent disc region.

Figure 5
figure 5

(A) Three-phase grid with boundary regions between phases (marked by cyan) and a model that is sought taking into account phase volumes and radii of gyration. (B) Dmax is considered to be orthogonal to transmembrane region as depicted at the picture. Radius of sphere, exterior and interior radii of tori are calculated from Dmax and the Tanford length. The figure is generated using the PyMol Molecular Graphics System, version 1.7.4. Schrödinger, LLC (www.pymol.org).

Modeling Path 3: A priori information from ISPyB is not sufficient to generate the required constraints

In this case neither MONSA nor MEMPROT modeling is employed. The pipeline works in the standard SEC-SAXS mode and the low-resolution model is built using a single-phase approximation (phase corresponding to either PDC or solvent). The pipeline reconstructs low-resolution models from the low-s region (typically smax < 1.0 nm−1) of the experimental SAXS data ab initio, minimizing the contribution from the inhomogeneous internal structrure and describing only the overall particle shape. The value of smax for the initial reconstruction is automatically estimated from the first local minimum of the experimental SAXS curve. A standard ab initio modeling procedure is then executed using DAMMIF (as described e.g. in33,34): DAMMIF generates 10 independent models and the normalized spatial discrepancy (NSD) is computed to find the model with the lowest variance in NSD with respect to the other members of the ensemble. This model is then used for both alignment/superposition of the ensemble and for the computation of an average model volume; finally, the pipeline employs this averaged model as input for a final refinement in DAMMIN. Therefore, the output of the pipeline in this regime is a single DAMMIN model that fits the data up to an automatically determined smax value. We would like to stress that this approach has its limitations and the model represents a low-resolution approximation of the molecule’s overall shape and possible high-resolution details on the surface should not be taken as experimental evidence. With these caveats, such ab initio modelling still offers a good low resolution estimate of the overall size and shape of the PDCs of interest in the absence of additional information.

  1. (6)

    As soon as the overall SAXS parameters and MP models have been computed, the pipeline builds a final summary file in XML format, gathering together all the results in a compact and human/machine-readable form (Fig. 3). For the models generated, if the quality of the achieved fit is poor (χ2 > 2.0), the pipeline flags this with a warning. This provides the user with a measure of the reliability of the on-line model(s) generated.

Upon completion the pipeline compresses the calculated 1D SEC-SAXS profiles into a single HDF5 file and reports them back together with the final model into the ISPyB database. The data from the buffer and sample regions, subtracted curves and overall SAXS parameters for each detected peak in the elution profile are displayed in a convenient tabular form (Fig. 3). It is worth noting that the summary table generated is applicable to all SEC-SAXS runs conducted at the P12 beamline and is not restricted to the measurement and analysis of MPs. For example, it is often the case that the sample exists in solution as a mixture of different states (eg. multiple oligomeric states of a protein or complex), which are separated chromatographically. In this case, AMPP will determine the shapes of individual components in the mixture using the modeling path 3. When chromatographic peaks corresponding to unloaded/empty detergent micelles are analyzed, their overall hollow appearance will be reconstructed by the DAMMIF single-phase modeling procedure, whereas the multi-phase distance-restricted MONSA model will fail to fit the data (see Supplementary Fig. S1). The pipeline results can therefore provide a hint to users who mistakenly measure empty/unloaded micelles at the beamline.

The results generated can be inspected online using the ISPyB web-server26. The ISPyB database and web services were extended to accommodate the information on chemical formulae and electron densities of detergents, as well as FASTA amino acid sequences and electron densities of the MPs and buffers. The SEC-SAXS elution profile and the final model can be visualized in the existing ISPyB GUI, as the pipeline output files are compatible with the required format.

Results and Discussion

To test the modelling components of the AMPP, we performed an automatic structural characterization of a tetrameric α-helical membrane channel Aquaporin0 and mechanosensitive channel T2 in DDM solution. The AMPP was utilized in all three pipeline regimes: i) with the high resolution model available ii) with detergent composition and the protein primary structure (FASTA sequence) known iii) plain ab initio modelling (no a priori information). The estimated K-factor (Eq. 1) for the T2 sample was 8.7% <10% indicating a good data quality. The subtracted published SAXS curve was taken for Aquaporin0 and thus no K-factor was determined. The models generated from these three modelling trajectories are displayed in Fig. 6 (as the protein was known to be tetrameric, P4 symmetry was utilized in MONSA and DAMMIF).

Figure 6
figure 6

Examples of automatically reconstructed Aquaporin0 PDC models by the AMPP pipeline utilizing MEMPROT (hybrid modeling with known protein model (PDB:2B6P) and unknown detergent belt) (A), MONSA (multi-phase ab initio, the model was built from precalculated searching volume; fourfold symmetry around z-axis is imposed) (B) and DAMMIF (C) (single phase ab initio also with P4 symmetry applied). The first row shows the side view cross-section of the PDC components that is represented with a section inside the detergent corona; protein beads are green for ab initio models and colored according to atom type for the hybrid model, detergent tails and heads are blue and red, respectively. The second and third rows represent the top and the side view of the models. The figure is generated using the PyMol Molecular Graphics System, version 1.7.4. Schrödinger, LLC (www.pymol.org).

Expectedly, the automatically generated Aquaporin0 models display more adequate structural details with increasing amount of a priori information. The single phase based model generated ab initio by DAMMIF (Fig. 6C) provides only the overall shape roughly approximating that of the protein-detergent complex. This reflects the limitation of the ab initio shape determination procedure, which must assume constant density inside the particle. The more information-rich multi-phase MONSA and MEMPROT models (Fig. 6B,A) display a good level of similarity, with the hybrid model generated using the latter program being more detailed and arguably having a higher resolution. For the multi-phase MONSA modeling path the starting cylindrical search volume used for protein-solvent phases was considerably larger than the final model and contains a spherical core with the fixed protein phase (green area shown in Fig. 5B). However, after the minimization procedure the volume occupied by the protein phase becomes only slightly larger than the central fixed spherical volume, and the MONSA models do display protrusions resembling the four α-helical “legs” expected for Aquaporin0. These models are similar to those generated using the hybrid modeling path with MEMPROT, where the protein structure is assumed to be known.

The automatically generated models using the three modeling paths of the pipeline provide good fits to the experimental data at low angles, thus the overall particle shapes coincide and are well described by the data. A numeric evaluation of the proximity of the obtained models was performed using the measure of normalized spatial discrepancy (NSD)42, and the NSD comparisons indicate that the models are similar at low resolution (Table 1). The deviations observed at higher angles (Fig. 7) reflect contributions from the detergents including the complex contrast situation, which is not fully captured by the simplified modeling procedures used. In particular, one of the simplifications we implemented for the MEMPROT modeling was to assume a horizontal detergent torus with a circular geometry, and a fixed ellipticity e = 1. Overall, the models generated by the pipeline do adequately describe the gross appearance of PDCs but caution is of course required regarding the internal structure. Further analysis is recommended in order to obtain more accurate and detailed models, and the initial parameters suggested by the pipeline can serve as a good starting point.

Table 1 Standard normalized spatial discrepancy (NSD) between the reconstructed models with differing levels of a priori information.
Figure 7
figure 7

SAXS data from Aquaporin0 (A) and T2 (B) samples and the corresponding DAMMIF, MONSA and MEMPROT model fits.

Similar results were obtained from the mechanosensitive channel T2 experimental data (Figs. 7b and 8). There is a high degree of similarity between the three types of models generated (Table 1), and interestingly, MONSA does reconstruct models with a void in the protein phase (green beads in Fig. 8). This void is maintained also when an increased penalty for the bead discontinuity during minimization is introduced. Upon inspection, the high resolution model of a homologous structure, the MscS ion channel of T. tengcongensis (PDB: 3T9N) also displays a void in the extracellular domain of the structure. Thus the multiphase ab initio reconstruction reliably generates features consistent with those expected for the T2 ion channel. The MONSA model of the T2 channel protein fits the data reasonably well up to s = 0.25 Å−1. The on-the-fly generated MEMPROT model provides a satisfactory fit up to s = 0.1 Å−1, corresponding to a real space resolution 2π/s ~ 60 Å.

Figure 8
figure 8

Automatically reconstructed T2 models by the AMPP pipeline utilizing MEMPROT (hybrid modeling: 3T9N pdb model with found shape of detergent corona) (A), MONSA (multi-phase ab initio with P7 symmetry) (B) and DAMMIF (single phase ab initio, P7 symmetry) (C). The first row shows the side view cross-section of the PDC components that is represented with a section inside the detergent corona; protein beads are green for ab initio models and colored according to atom type for the hybrid model, detergent tails and heads are blue and red, respectively. The second and third rows represent the top and the side view of the models. For T2, the K value was calculated from the reduced SEC-SAXS chromatographic data as follows: K = 100% ∙ Cvaver∙<χ2 > = 100∙ (0.2/5.65) ∙ (2.46) = 8.7 < 10% (Eq. 1). K was not determined for Aquaporin0 in the present case, as the published data does not include the chromatographic SEC-SAXS data frames. The figure is generated using the PyMol Molecular Graphics System, version 1.7.4. Schrödinger, LLC (www.pymol.org).

It is also worth noting that atomistic models are often incomplete, missing eg. density for flexible loops which could not be resolved at high resolution. In general, to reliably reconstruct a PDC it is recommended to utilize both methods (full ab initio with MONSA, and hybrid with MEMPROT) and compare the results. Thus, the ab initio modelling is an important component of the automated MP modelling procedure, facilitating a direct comparison with the hybrid modelling and giving an insight into the possible structural organization of the PDC in solution.

In summary, an automated SAXS-based data analysis pipeline for multi-contrast systems has been developed and implemented at the P12 beamline of the EMBL (Petra-III, Hamburg). The pipeline facilitates the determination of the overall parameters and automated construction of tentative low-resolution models of solubilized PDCs. Given that the interpretation of the SAXS data from membrane proteins is by far non-trivial task, these results and models are expected to be further refined interactively. However, as demonstrated in the provided examples, AMPP offers at least a good starting point for the subsequent analysis. The relevant data and workflows are fully integrated into the ISPyB data curation system for SAXS. Future development of the pipeline is planned to include handling of other more complicated membrane assemblies such as nanodiscs19,43 and amphipols44 and also analysis of inhomogeneous samples such as mixed micelles.