ATP binding cassette (ABC) transporters play critical roles in maintaining sterol balance in higher eukaryotes. The ABCG5/ABCG8 heterodimer (G5G8) mediates excretion of neutral sterols in liver and intestines1,2,3,4,5. Mutations disrupting G5G8 cause sitosterolaemia, a disorder characterized by sterol accumulation and premature atherosclerosis. Here we use crystallization in lipid bilayers to determine the X-ray structure of human G5G8 in a nucleotide-free state at 3.9 Å resolution, generating the first atomic model of an ABC sterol transporter. The structure reveals a new transmembrane fold that is present in a large and functionally diverse superfamily of ABC transporters. The transmembrane domains are coupled to the nucleotide-binding sites by networks of interactions that differ between the active and inactive ATPases, reflecting the catalytic asymmetry of the transporter. The G5G8 structure provides a mechanistic framework for understanding sterol transport and the disruptive effects of mutations causing sitosterolaemia.
Cholesterol is an essential component of vertebrate cell membranes. Animals maintain sterol balance by limiting dietary sterol uptake from the gut and promoting sterol secretion from hepatocytes into bile. These physiological processes are mediated by a heterodimeric ABC transporter, consisting of G5 and G8 (refs 1, 2, 3) polypeptides, which is embedded in apical membranes of bile ducts and intestinal enterocytes4,5. Mutations in G5 or G8 that block sterol secretion into bile and the gut lumen cause sitosterolaemia, an autosomal recessive disorder in which sterol accumulation leads to premature coronary atherosclerosis.
ABC transporters constitute a ubiquitous protein superfamily that utilizes energy derived from ATP hydrolysis to translocate substrates across membranes6. Family members share a common architecture that comprises two transmembrane domains (TMDs) and two nucleotide-binding domains (NBD; specified herein as the contiguous polypeptide domain contributed from each subunit, whereas NBS denotes a composite nucleotide-binding site made up of elements from both subunits). Humans have 48 ABC transporters that are classified into seven subfamilies (A–G)7. In the G5 and G8 half-transporters, the NBD is amino (N)-terminal to the TMD, which consists of six transmembrane helices (TMHs) (Fig. 1a). The molecular mechanism by which G5G8 effluxes sterol from plasma membranes remains poorly defined.
Lipid-driven three-dimensional crystallization is a powerful method to determine structures of integral membrane proteins8,9. The only ABC transporters that have been crystallized in lipid bilayers are the bacterial polypeptide processing and secretion transporter (PCAT1, an ABCB homologue)10 and the maltose transporter–EIIA complex11. No ABCG family member has been structurally characterized. To obtain diffraction-quality crystals, human G5 and G8 were coexpressed in Pichia pastoris12, and tandem affinity chromatography was used to purify stable, monodisperse G5G8 heterodimers that retained ATPase activity (Extended Data Fig. 1). The protein was reconstituted into di-myristoyl-phosphatidylcholine (DMPC) bicelles13, and growth of optimal bicelle crystals required the presence of cholesterol to obtain diffraction higher than 3.9 Å resolution (Extended Data Fig. 2).
The G5G8 structure (Fig. 1b) was solved using tungsten-derived single-wavelength anomalous dispersion (Extended Data Table 1 and Extended Data Fig. 3a). The G5G8 crystals comprise two-dimensional layers in which two heterodimers in the asymmetric unit pack in an anti-parallel fashion related by twofold non-crystallographic symmetry (Extended Data Fig. 3b). Native diffraction data were averaged from 19 crystals to 3.9 Å resolution, and the structure was refined to R/Rfree = 0.242/0.328 (representative electron density maps for selected regions are shown in Extended Data Fig. 3c). G5 and G8 share 28% amino-acid identity, and show a high degree of structural conservation with a root mean squared deviation (r.m.s.d.) of 2.0 Å (Fig. 1c).
Our G5G8 structure adopts an inward-facing conformation, which is analogous to some other nucleotide-free ABC exporters10,14,15 and importers16,17. The packing of the TMHs and interfacial contacts of G5 and G8 differ from other ABC transporter structures, including the type I (for example, ModBC-A)16 and type II (for example, BtuCD-F)17 importers, and type I exporters (for example, TM287/288)14 (Fig. 1d). No TMH from either subunit crosses over into the other half-transporter’s TMD. In the extracellular domain (ECD), the regions between TMH5 and TMH6 form distinct α-helical structures (Fig. 1b). Three missense mutations causing sitosterolaemia, R419P and R419H in G5 and G574R in G8 (refs 2, 3), are located near the apices of TMH2 and TMH5, respectively (Extended Data Fig. 4a), and both residues are involved in contacts with the ECDs (for example, R419 forms hydrogen bonds with E578 on the G5 ECD). These mutations would be predicted to interfere with the native positions of the ECD helices, suggesting their importance for sterol exit from the TMDs.
The lack of electron density for nucleotide and the spatial separation between opposing Walker A and Signature motifs indicates that the G5G8 structure represents a nucleotide-free state (Extended Data Fig. 4b, c). Nonetheless, the two NBDs contact each other at the extreme cytoplasmic end to form a closed conformation through a pair of NPXDF motifs (G5: NPFDF; G8: NPADF; Extended Data Fig. 4d), which are conserved in the ABCG family and are required for cholesterol efflux by ABCG1 (ref. 18).
How does G5G8 move sterol out of the energetically favourable environment of the plasma membrane? We identified features in our electron density map that may represent cholesterol. These features are located at symmetrical ‘vestibules’ on opposing faces of the TMD dimer, which open to the bilayer and extend into the centre of the dimer interface (Extended Fig. Data 5a, b). Each vestibule is flanked by TMH1-2 of one TMD and TMH4-6 of the other TMD, with a ‘ceiling’ formed by an α-helix from the ECD extending into the membrane. Several residues in these vestibules are conserved throughout eukaryotic evolution and may represent binding surfaces or entryways for sterols to access the core of the heterodimer interface. To test this hypothesis, we performed an in vivo functional reconstitution assay using adenoviruses to express recombinant G5 and G8 in G5G8 knockout mice5,19. Expression of wild-type (WT) G5 together with WT G8 resulted in a ~30-fold increase in cholesterol transport into bile. In contrast, expression of G8 with G5 containing a substitution (A540F) that occludes the putative cholesterol-binding site failed to restore biliary cholesterol transport despite forming WT levels of the mature G5G8 heterodimer (Extended Data Fig. 5c).
ATP-dependent sterol translocation across membranes requires allosteric communication between catalytic NBSs and substrate-exporting TMDs. In models derived from bacterial transporters, TMD conformational changes result from engagement of the NBDs with ‘coupling helices’ (CpH) in the intracellular loops of the TMDs20. The TMDs of G5 and G8 each contain a prototypical CpH in the linker between TMH2 and TMH3 that is analogous to the position of coupling helices in other ABC exporters (Fig. 2). Each half-transporter also contains an orthogonal α-helix that we have named the ‘connecting helix’ (CnH), which is interfacial to the membrane bilayer and connects the NBD to the TMD. The CnH packs against a short cytoplasmic helix containing a conserved glutamate (designated the E-helix). The CpHs and CnHs are in proximity (~10–15 Å) to the consensus Signature motifs (orange) of the same half-transporters, which form the NBSs with the Walker A (red) and Walker B (green) motifs of the opposing half-transporters (Fig. 2a, b). In G5, R374 on the CnH forms a buried salt bridge with E452 on the CpH, while R381 interacts with the conserved glutamate (E146) that defines the E-helix. A disease-causing missense mutation at this residue (E146Q)3 is predicted to alter crosstalk between the TMD and NBD. The CnH, CpH, and E-helix elements on G5 form a stabilized three-helix bundle. In contrast, the CpH in G8 is rotated such that E481 (corresponding to E452 in G5) points away from R405 (homologous to G5-R374 on the CnH), instead packing against the G8 carboxy (C) terminus that loops back into the heterodimer interface.
The distinct architectures for the two TMD/NBD interfaces reflect the asymmetry in catalytic sites of G5G8. The inactive G5 Signature motif (part of NBS1)21, which binds but does not hydrolyse ATP, is adjacent to the CnH/CpH/E-helix bundle of G5, whereas the active G8 Signature motif (part of NBS2) is near the three-helix bundle of G8 (Fig. 2). We hypothesize that the stable three-helix bundle in G5 acts as a rigid body, whereas the three-helix bundle of G8 exhibits greater flexibility and transitions between different conformations. With the proximity of G8-CpH to the catalytically active NBS2, these conformational changes could allosterically link ATP hydrolysis to sterol transport.
What are the likely consequences of ATP binding and hydrolysis on TMD conformation and sterol binding? A network of conserved polar residues in both G5 and G8 forms hydrogen bonds and salt bridges that extend from the CnH and CpH to the proximal part of the TMD interface (Fig. 3). The hydrogen bonds connecting the TMD polar relay may render this network more deformable than a buried hydrophobic core, allowing this region to serve as a flexible hinge for subunit motions with a low energy barrier. Involvement of a TMD polar relay may be a general feature of ABC transporters: the bacterial PCAT1 (ref. 10) exporter and maltose importer22 structures also contain clusters of conserved polar residues in the TMD, which rearrange in the transition to a nucleotide-bound state (Extended Data Fig. 6a, b). We performed a 100-ns molecular dynamics simulation and vibrational mode analysis of our G5G8 structure in an explicit POPC/cholesterol bilayer and water. In these calculations, we found that the CnH and CpH elements of both subunits move inwards, bringing opposing Walker A and Signature motifs into closer contact (Extended Data Fig. 6c). Inward movement of the NBDs was coupled to upward movement of a subset of TMD elements (Extended Data Fig. 6d). R543, which is embedded in the core of the G8 TMD (part of the TMD polar relay), interacts with E503; both residues participate in the upward movement during the molecular dynamics simulation. The sitosterolaemia-causing mutation R543S3 would be predicted to disrupt interaction with E503 and thereby destabilize the TMD polar relay.
We performed coevolution analysis23 on ABCG TMD sequences to predict positions that could form close contacts at the TMD interface during the transport process (Extended Data Fig. 7a, b). The highest scoring co-evolving positions highlighted three potential interactions between subunits. As these surfaces are distant (>8 Å apart) in the current structure, we infer that they may come into contact at another stage of the transport cycle with inward conformational changes of the TMDs. Y432 on G5 (TMH2) and N568 on G8 (TMH5) are among these pairs, and Y342 is connected to the NBDs through the TMD polar relay (Fig. 3). Accordingly, the mutation G5-Y432A disrupted cholesterol transport in our in vivo assay without affecting heterodimer maturation (Extended Data Fig. 7c). Together, these results support a model in which recruitment of ATP stabilizes inward movement of the NBDs and inward/upward movement of the TMDs. We propose that these movements, which would reshape the TMD interface and outer membrane-facing surfaces of the transporter where we observe electron density for cholesterol (Extended Data Fig. 5), contribute to sterol transport across the membrane.
The TMDs of G5 and G8 adopt a tertiary fold that differs from those of known ABC transporter structures20 (Fig. 1d), which have different topologies, longer cytoplasmic extensions of TMH segments and differing placement of coupling helices. Previous classification of ABC exporters suggests that TMDs have arisen at least three independent times, giving rise to the ABC1, ABC2, and ABC3 superfamilies24. G5G8 is a member of the ABC2 exporter superfamily25, which includes the ABCA and ABCG eukaryotic subfamilies and a diverse group of prokaryotic transporters for substrates ranging from polysaccharide-containing teichoic acids and lipo-oligosaccharides (components of bacterial cell walls and outer membranes) to the antibiotic peptides mutacin and bacitracin (Fig. 4a). The ABCG subfamily includes the largest family of ABC transporters in the plant kingdom (Fig. 4b)26 and the historically important pigment transporters that determine eye colour in Drosophila (white, brown, and scarlet)27. Using our G5G8 structure as a template, we used the program Modeller28 to create a homology model of the white/brown heterodimer that confers brown eye colour to Drosophila (Extended Data Fig. 8). Residues at the end of TMH5 of white (G588, G589, and F590) have been identified as potentially interacting with the dye substrate29,30. These amino acids are conserved in G5G8 at the interface between the TMDs, at a site homologous to the G574R sitosterolaemia mutation in G8. This model indicates a possible conserved site of substrate exit for ABCG transporters and shows that the structure of G5G8 can serve as a powerful platform for structure–function studies of this large and functionally diverse set of membrane proteins.
No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.
Cloning and expression of recombinant human ABCG5 and ABCG8
P. pastoris expression vectors (pSGP18 and pLIC) were derived from pPICZB (Invitrogen) as described12,31. The cDNAs for human ABCG5 (NCBI accession number NM_022436) and ABCG8 (NCBI accession number NM_022437) were obtained from the National Institutes of Health collection (M. Dean). A tag encoding a rhinovirus 3C protease site followed by a calmodulin binding peptide (CBP) was added to the C terminus of G8 (pSGP18-G8-3C-CBP). A tandem array of six histidines separated by glycine (His6GlyHis6) was added to the C terminus of G5 (pLIC-G5-H12). The plasmids were linearized using PmeI and co-transformed into Pichia strain KM71H by electroporation. To select for plasmid integration, cells were plated on YPD plates containing sorbitol (1.0 M) and Zeocin (0.5 or 1 mg ml−1), and incubated at 30 °C for 2–5 days. A total of 10–20 yeast colonies from each plate were selected and grown in minimal glycerol yeast nitrogen base (MGY) medium (10 ml), and then induced by adding methanol (0.5%) in minimal methanol (MM) medium. Crude microsomal membranes were prepared, and 30 μg of protein was resolved by SDS–PAGE. Protein expression was analysed by immunoblotting using monoclonal anti-RGSH4 antibodies (Qiagen) to detect G5 and polyclonal anti-hABCG8 antibodies (see below) to detect G8. The clones expressing the highest level for both G5 and G8 were selected and stored in 15–20% glycerol at −80 °C.
Cell culture and microsomal membrane preparation
A starter yeast culture was prepared by growing transformed yeast in MGY medium (10 ml) to an absorbance A600 nm of 10. The culture was used to inoculate a litre of MGY medium in a 2.8-l Fernbach flask and grown in a refrigerated Innova shaker (New Brunswick) at 250 rpm for 24 h (28–30 °C). To maximize yield, the acidity of the culture was monitored with the pH maintained at 5–6 using 10% (w/w) ammonium hydroxide (NH4OH). To induce protein expression, cells were incubated with 0.1% (v/v) methanol for 6–12 h. The methanol concentration was increased to 0.5% (v/v) by adding methanol every 12 h for 36–48 h. Cell pellets were collected and re-suspended in lysis buffer (0.33 M sucrose, 0.3 M TrisCl, pH 7.5, 0.1 M ε-aminocaproic acid, 1 mM EDTA, and 1 mM EGTA) to a concentration of 0.5 g ml−1, and stored at −80 °C. Approximately 30 ± 5 g of cell mass was typically obtained from 1 l of cultured cells.
To prepare microsomal membranes, frozen cells were thawed and reducing agent and protease inhibitors were added (final concentration: DTT (10 mM), leupeptin (2 μg ml−1), pepstatin A (2 μg ml−1), and PMSF (2 mM)). Cells were passed through an ice-chilled microfluidizer (Microfluidics) three to five times at 25,000–30,000 psi. The unbroken cells, nuclei and organelles were spun down at 3,500–4,000g for 15 min followed by 15,000g for 30 min, all at 4 °C. Microsomal membrane vesicles were pelleted by ultracentrifugation using a Beckman 45Ti rotor at 40,000–45,000 rpm (maximum 200,000g) at 4 °C for 2 h and re-suspended in buffer A (50 mM Tris pH 8.0, 100 mM NaCl, and 10% glycerol) using a dounce homogenizer, and stored at −80 °C.
Protein purification and pre-crystallization treatment
Frozen microsomal membranes were thawed and the protein concentration was adjusted to 4–6 mg ml−1 using solubilization solution (50 mM Tris-HCl, pH 8.0, 100 mM NaCl, and 10% glycerol, 1% (w/v) β-dodecyl maltoside (β-DDM, Inalco Pharmaceuticals), 0.5% (w/v) sodium cholate (Sigma-Aldrich), 0.25% (w/v) cholesteryl hemisuccinate Tris (CHS-Tris, Anatrace), 5 mM imidazole, 5 mM β-mercaptoethanol (β-ME), 2 μg ml−1 leupeptin, 2 μg ml−1 pepstatin A, 2 mM PMSF). Insoluble membranes were removed by centrifugation using a Beckman 45Ti rotor at 30,000 rpm (100,000g) for 30 min at 4 °C, and a final concentration of 20 mM imidazole and 0.1 mM Tris (2-carboxylethyl) phosphine (TCEP) was added to the solubilized supernatant.
Tandem affinity chromatography was performed. First, the soluble membrane proteins were bound to a nickel-nitrilotriacetic acid (Ni-NTA) column (Qiagen; 1°Ni-NTA) that was pre-equilibrated with buffer A, and washed with 10 column volumes of buffer B (50 mM HEPES (4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid), pH 7.5, 100 mM NaCl, 0.1% (w/v) β-DDM, 0.05% (w/v) cholate, 0.01% (w/v) CHS (Steraloids), 0.1 mM TCEP) with 25 mM imidazole. The column was then washed with 10 column volumes of buffer B with 50 mM imidazole, and eluted with buffer C (buffer B with 200 mM imidazole, 1 mM CaCl2, 1 mM MgCl2). Peak fractions from the Ni-NTA eluates were mixed with equal volume of buffer D1 (buffer B plus 1 mM CaCl2, 1 mM MgCl2), and loaded onto a CBP column (Agilent; 1°CBP) that was pre-equilibrated with buffer D1. To exchange the detergent, the CBP column was washed serially with buffer D1 and buffer D2 (buffer B plus 1 mM CaCl2, 1 mM MgCl2, 0.1% (w/v) decyl-maltose neopentyl glycol (DMNG, Anatrace), but no β-DDM) in a step-wise fashion: 3 column volumes of D1, 3 column volumes of D1:D2 (3:1, v/v), 3 column volumes of D1:D2 (1:1, v/v), 3 column volumes of D1:D2 (1:3, v/v), and 6–10 column volumes of D2. Finally, the G5G8 heterodimers were eluted with buffer E (50 mM HEPES, pH 7.5, 300 mM NaCl, 2 mM EGTA, 0.1% (w/v) DMNG, 0.05% (w/v) cholate, 0.01% (w/v) CHS, 1 mM TCEP). Divalent cations were added to the CBP eluates to a final concentration of 10 mM MgCl2 and 10 mM CaCl2 to quench residual EGTA. No detergent exchange was performed for proteins used for ATPase assays. The N-linked glycans and the CBP tag were cleaved by endoglycosidase H (Endo H, ~0.2 mg per 10–15 mg purified protein) and HRV-3C protease (~2 mg per 10–15 mg purified proteins) for 6–12 h at 4 °C. The CBP tag-free proteins were collected from the flow-through fraction of a second CBP column (2°CBP). G5G8 was concentrated to a volume of 1–2 ml and separated from aggregates, impurities, and enzymes by gel filtration chromatography using an ÄKTA Purifier and a Superdex 200 30/100 GL column (GE Healthcare Life Sciences) in buffer F (10 mM HEPES, pH 7.5, 100 mM NaCl, 0.1% (w/v) DMNG, 0.05% (w/v) cholate, 0.01% (w/v) CHS). The peak fractions were pooled together, and additional HEPES and TCEP were added to final concentrations of 50 mM and 1 mM, respectively.
Purified G5G8 dimers were treated by reductive methylation. Briefly, proteins were incubated twice with 20 mM dimethylamine borane (DMAB) and 40 mM formaldehyde for 2 h at 4 °C on an oscillatory shaker and then 10 mM DMAB was added. After 12 h, the reaction was stopped by 100 mM TrisCl, pH 7.5. For protein relipidation, the methylated proteins were loaded onto a second Ni-NTA column (2°Ni-NTA) that was pre-equilibrated with 100 mM TrisCl, pH 8.0, and 100 mM NaCl. The column was washed slowly with 10 column volumes of buffer G (10 mM HEPES, pH 7.5, 100 mM NaCl, 0.5 mg ml−1 DOPC:DOPE (3:1, w/w), 0.1% (w/v) DMNG, 0.05% (w/v) cholate, 0.01% (w/v) CHS), and the lipidated proteins were eluted using buffer H (10 mM HEPES, pH 7.5, 100 mM NaCl, 200 mM imidazole, 0.5 mg ml−1 DOPC:DOPE = 1:1 (w/w), 0.1% (w/v) DMNG, 0.05% (w/v) cholate, 0.01% (w/v) CHS). TCEP (1 mM) and MgSO4 (10 mM) were added to the 2°Ni-NTA eluates, and the protein was treated with 5 mM and 2 mM iodoacetamide for 1 h on ice and then passed through a PD-10 desalting column (GE Healthcare Life Sciences) that was equilibrated with buffer I (10 mM HEPES, pH 7.5, 100 mM NaCl, 200 mM imidazole, 0.1% (w/v) DMNG, 0.05% (w/v) cholate, 0.01% (w/v) CHS). Finally, the precipitants were removed by ultracentrifugation using a Beckman TLA120.2 rotor (150,000g) for 10 min at 4 °C. The supernatants were concentrated to a final protein concentration of 25–50 mg ml−1 using a 100 kDa cutoff Vivaspin concentrator (Sartorius), and used within 1 week of purification for crystallization.
Protein crystallization and crystal sample preparation
All crystals were obtained by reconstituting G5G8 proteins into DMPC/cholesterol/CHAPSO or DMPC/cholesterol/DHPC (Anatrace) bicelles. First, 10% bicelle stock solution was prepared by mixing lipids and detergents (CHAPSO or DHPC) in a ratio of 3:1 (w/w), where the lipids contained 5 mol % cholesterol (Sigma-Aldrich) and 95 mol % DMPC. Immediately before preparing the protein/bicelle mixture, the concentrated proteins were incubated with 10 mM ATP (sodium salt) for 30 min at 4 °C. The proteins and 10% bicelles were then gently mixed in a ratio of 1:4 (v/v), such that the final protein concentration was 5–10 mg ml−1.
The protein/bicelle mixture was incubated on ice for 30 min. The crystallization was set up in a hanging-drop vapour diffusion format at 20–22 °C by using VDX48 trays and mixing protein/bicelle preparation with equal-volume crystallization reservoir solution containing 1.7–2.0 M ammonium sulfate ((NH4)2SO4), 100 mM MES pH 6.5 (or 100 mM HEPES pH 7.0), 2–5% PEG400 (or 2–5% PEG350 MME), and 1 mM TCEP. Crystals suitable for data collection appeared in 3 days to 2 weeks and reached a maximum size of 75–150 μm × 40–60 μm × 10–20 μm in 1–2 months, and decayed after 3 months of crystallization. Crystals used for structural analysis were harvested within 1–2 months. Crystals of similar morphology were also obtained when nucleotide was omitted or when a non-hydrolysable analogue of ATP (either AMPPNP or TNP-ATP) was added. We also tried to grow crystals of a catalytically deficient mutant consisting of WT G5 and G8 mutation (G216D)32, attempting to solve a nucleotide-bound structure. No crystal growth was observed, and further optimization in crystallization will be necessary.
For experimental phasing, crystals were derivatized by adding 1 mM sodium phosphotungstate (PW12O403−, Sigma-Aldrich) to the crystallization drops for 12 h before harvesting. Crystals were cryo-protected in a solution containing 25% glycerol, 2 M (NH4)2SO4, 100 mM MES, pH 6.5, and 2–4% PEG400. The crystals were harvested in cryoloops (Mitegen), flash-frozen and stored in liquid nitrogen.
Data collection, structure determination and refinement, final model validation, and uncertainty
X-ray diffraction data sets were collected at the Advanced Photon Source beamlines 19-ID and 23-ID-D. HKL3000 was used to process both the partial and full diffraction data sets used for the ABCG5/ABCG8 heterodimer structure solution33,34. Computational corrections for absorption in a crystal and for imprecise calculations of the Lorentz factor resulting from a minor misalignment of the goniostat were applied35,36. Anisotropic diffraction was corrected to adjust the error model and to compensate for a radiation-induced increase of non-isomorphism within the crystal37,38,39. Typical, well-behaving native crystals diffracted anisotropically to a resolution of ~3.9 Å in the x direction, ~4.0 Å in the y direction, and ~4.5 Å in the z direction, while sodium phosphotungstate (PW12O40Na3, Sigma-Aldrich) derivatized crystals diffracted to a resolution of ~5.5 Å in the x and y directions, and ~6.5 Å in the z direction. The data processing statistics are presented in Extended Data Table 1.
Initial phases were obtained in a single-wavelength anomalous diffraction experiment with two crystals derivatized with sodium phosphotungstate, with data collected at the L-III edge of tungstate (λ = 1.21 Å). The estimated level of anomalous signal was ~6.5% of the native intensity. The diffraction data set was processed to a resolution of 5.0 Å, while the search for heavy atom positions was performed to a resolution of 7.0 Å. The eight positions of the tungstate cluster were identified using SHELXC/D40, run within HKL3000, with correlation coefficients CCAll = 45.88%, CCWeak = 22.52%, and PATFOM = 21.56. The handedness of the best solution was determined with SHELXE, in which the radius of the sphere of the variance map calculation was redefined from 2.42 to 4.84 Å. The eight tungstate cluster positions were refined anisotropically to 5.2 Å with MLPHARE41, with the final FOM reaching 0.220 for all observations. Twofold NCS was identified by PROFESS from CCP4 (ref. 42) NCS-averaging and solvent flattening was performed by DM43 and later with PARROT44. The procedure produced a clean electron-density map to a resolution of about ~6.5 Å which showed alpha helical features, but had insufficient resolution to for model building. Therefore, we attempted to improve the phases by combining the phasing signal of the tungstate derivative with the phasing signals of a lead derivative (trimethyl lead chloride, Pb(CH3)3Cl, Sigma-Aldrich) and tantalum derivative (hexatantalum tetradecabromide, Ta6Br14, Jena Bioscience), for which the estimated levels of anomalous signal were below 1% of the native intensity, but which diffracted to better resolution. Three positions of Pb2+ were identified using anomalous difference Fourier maps phased with a solvent-flattened tungstate derivative. They were then introduced to MLPHARE together with the previously identified tungstate clusters positions, and refined together to a resolution of 4.2 Å, while the tantalum derivative served as a native data set.
Although the multiple isomorphous replacement phase combination improved the maps, the quality and resolution were still insufficient to build and refine an atomic model automatically. Therefore, we first located the NBD domains using a homologous model (3D31.pdb) and positioned them manually. Then, we placed the alpha helices using the ‘Place Helix Here’ option in the ‘Other Modeling Tools’ in Coot45. The topology of the transmembrane domains was identified manually and that defined the directionality of the α-helices, which at this resolution were initially placed in two possible directions by Coot. The resulting assembly was used to redefine the solvent mask to improve NCS averaging and then the model was further rebuilt and corrected by iterative application of BUCCANEER46, Coot and REFMAC47. To proceed with the model building, we had to improve the resolution of the data, which was done by merging together 19 full and partial native data sets to a final resolution of 3.9 Å. Although the data were essentially complete in terms of Bragg’s law, the anisotropy of diffraction resulted in an uneven distribution of information in the reciprocal space. Correcting for anisotropy retained informative observations of intensity in ellipsoidal resolution shells; however, all downstream procedures reported completeness in the spherical shells. Therefore, the completeness in refinement is lower than in scaling. The merged native data set was combined with the PARROT-modified phases obtained during an earlier step and was used in the model building with BUCCANEER and in the refinement with REFMAC. The initial model served as the starting point for BUCCANEER, which rebuilt it to ~65% of completeness and partly assigned it to the sequence, with ~60% of its side chains docked. Intermediate models from different cycles of BUCCANEER were combined into a more complete model and rebuilt manually. The restrained, phase-stabilized refinement was performed with the Hendrickson–Lattman coefficients from PARROT down-weighted by factor of 0.5 and blurred with a B-factor of 200 Å2. Additionally, the ProSMART48 option ‘to generate H-bond restraints (e.g. secondary structure restraints)’ was used to stabilize the model geometry during refinement, together with REFMAC’s local NCS restraints, jelly body refinement, and B-factor values restrained with a weight of 0.2 and to a range of allowable values 20–400. The resolution cutoff of 3.94 Å in the refinement was selected on the basis of multiple refinements in which the Rfree values in the last resolution shell were inspected and the features of electron density maps were analysed. All these parameters were chosen to stabilize the refinement and reduce bias. The model quality was validated with Molprobity49 and assessed as satisfactory with a Molprobity score of 3.47, which corresponds to the 71st percentile in comparison with the set of 342 Protein Data Bank deposits solved at resolutions from 3.25 to 4.18 Å.
Although nominal resolution of diffraction data extends to 3.94 Å, high anisotropy results in uneven quality of maps in real space, with some electron density features being very well defined while other features have less definition. Specifically, the amino-acid register of the NBDs, having predominantly β-sheet structure, is highly uncertain when based on electron density alone. We used Robetta models50, sequence homology, and analysis of plausibility of chemical interactions to validate the amino-acid register, but we expect that registry errors are possible in these domains, in particular in regions that do not have reliable alignments to homologous structures. In the case of transmembrane domains, we do not expect major registry problems; however, minor mistracings of loops are possible owing to an uncertainty that is related to the C-caps of helices, which can have multiple alternative conformations that are not discernible at this resolution. Finally, although the fragment that connects the NBD with the transmembrane domains (from residues ~320 to ~395 both in ABCG5 and in ABCG8) can be traced for ABCG5 (chains A and C), it is less well defined in the ABCG8 subunits (chains B and D). For chain B, a part of this fragment is stabilized in the α-helical conformation by crystal lattice interactions, but the linker sequences connecting this fragment to the NBD on one side and the transmembrane domain on the other are not visible in the electron density, and therefore the register for this fragment is highly uncertain.
The ATPase activity of purified G5G8 was determined as described12,51. Briefly, 4–10 μg proteins (final concentration 27–67 μg ml−1) were mixed with 100 μg liver polar lipids (Avanti) and 20 mM DTT for 10 min at room temperature (22 °C), and then left on ice (used within 2 h). Reactions were performed in a final volume of 150 μl containing 50 mM Tris/MES (pH 7.0), 30 mM KCl, 5 mM MgSO4, 2 mM 32P-γ-ATP, 4.5 mM sodium azide, and 1% sodium cholate at 37 °C. Released inorganic phosphate was extracted by molybdate and monitored by 32P radioactivity to measure specific activity in three independent experiments. As a negative control, we paired a catalytically deficient G8 mutation (G216D)32 with WT G5 in the assay.
Generation of anti-human G5 and G8 antibodies
A DNA fragment encoding the N-terminal region of human G8 (residues 2–400) was PCR-amplified from the G8 cDNA and cloned into the pET30a+ vector (Novagen). The peptide was expressed in BL21 (DE3) competent Escherichia coli cells (Novagen) and then isolated from inclusion bodies, solubilized in 8 M urea, and purified using a Ni-NTA column in the presence of 8 M urea. Rabbits were injected every 2 weeks with 0.1 mg of the peptide to generate anti-G8 polyclonal antibodies. To generate monoclonal anti-human G5 antibodies, splenic B lymphocytes were isolated from female BALB/c mice (n = 2) that had been immunized eight times with 50 μg of purified G5G8 (see above). Cells were incubated with SP2/mIL-6 mouse myeloma cells and the resultant hybridomas were screened using an ELISA assay. Positive clones that recognized denatured G5 by immunoblotting were selected for subcloning. Class 1 immunoglobulin-γ (IgG1) was purified from the supernatant of cultured hybridoma cells using gravity-flow affinity chromatography with Protein-G Sepharose-4 Fast Flow beads.
In vivo functional reconstitution cholesterol transport assay
Point mutations were introduced into the human G5 cDNAs using QuickChange II site-directed mutagenesis kits (Agilent). The recombinant adenoviruses expressing human WT or mutant G5 and G8 were generated using an AdenoVator adenoviral vector system (QBioGene). Total knockout (KO) (Abcg5/Abcg8−/−)19 and liver-specific KO (L-Abcg5/Abcg8−/−)52 mice were maintained on a regular chow diet. Adenoviral particles (5 × 1012 particles per kilogram), containing no external gene (RR5), WT, or mutant human G5G8, were injected into the tail veins of the mice. After 72 h, the mice were fasted for 4 h, anaesthetized with halothane, and killed by exsanguination. Bile was collected, and neutral sterol levels were measured using gas liquid chromatography and mass spectrometry as described4. Liver tissue was snap frozen in liquid nitrogen and stored at −80 °C.
G5G8 coevolution analysis
Information about the covariance of residue substitution patterns in multiple sequence alignments (MSAs) or co-evolution can provide a basis for predicting structure contacts53,54, and has been successfully applied to predicting residue–residue interactions across protein interfaces23. We employed the GREMLIN server53 to predict the top L/3 co-evolving residue pairs in the TMD of ABCG5 and ABCG8 (see Supplementary Information). We mapped the co-evolving pairs that were separated by at least four residues in the primary sequence to the corresponding structures and compared them with the ABCG5 and ABCG8 TMD structure residue–residue contact maps using CMview55. GREMLIN co-evolving residue pairs from the ABCG5 and ABCG8 TMD that were closer than an all-atom cutoff of 8 Å represent the majority of co-evolving residue pairs and are depicted as lines connecting Cα atoms of the residues in contact in the respective TMD structures. The remaining residues that are spatially distant within the G5 or G8 molecules are candidates for forming interfacial contacts between G5 and G8 (ref. 23). For those remaining Gremlin pairs in TMH1, TMH2, and TMH5, we used the Gremlin alignment to map residues from G5 (or G8) to the corresponding residues on the opposite molecule. The highest scoring interfacial Gremlin pairs (three pairs with > 0.95 probability of co-evolving, corresponding to six potential symmetry-related pairs) were between residues in TMH2 (G8 L459, F461, and Y465 or the corresponding G5 T430, Y432, and L436) and residues in TMH5 (G5 A535, I539, S538 or the corresponding G8 N564, N568, and Y567, respectively) (Extended Data Fig. 7).
Molecular dynamics simulation
CHARMM-GUI (http://www.charm-gui.org) was applied to add DMPC lipid bilayer, cholesterol, counter ions, and water molecules. The entire system consisted of one copy of the G5G8 biological unit (chains A and B), 16 cholesterol (CHL), 304 DMPC, 128 Na+, 180 Cl−, and 47,789 TIP3P water molecules56. In total, there were 163,088 atoms in the simulation box for the whole system. For force field parameters, the partial atomic charges of CHL were derived by RESP57 to fit the HF/6-31G electrostatic potentials generated using the GAUSSIAN 09 software package (revision D.01). The other force field parameters came from GAFF in AMBER12 (ref. 58). The residue topology of CHL was prepared using the ANTECHAMBER module in AMBER12 (ref. 59). The force fields of AMBER FF12SB60 and LIPID14 (ref. 61) were used to model proteins and lipids, respectively.
All molecular dynamics simulations were performed with periodic boundary condition to produce isothermal–isobaric ensembles using the PMEMD.CUDA program in AMBER12. The particle mesh Ewald method62 was used to calculate the full electrostatic energy of a unit cell in a macroscopic lattice of repeating images. All bonds were constrained using the SHAKE algorithm63 in molecular dynamics simulations. Temperature was regulated using Langevin dynamics64 with a 5 ps−1 collision frequency. Pressure was regulated using the isotropic position scaling algorithm with the pressure relaxation time set to 1.0 ps. The integration of the equations of motion was conducted at a time step of 1 fs for the relaxation and equilibrium phases and 2 fs for the sampling phases. Before molecular dynamics simulations, the systems were relaxed to remove any possible steric crashes by a set of 10 thousand-step minimizations with the main chain atoms restrained. The harmonic restraint force constants decreased from 20 to 10, 5, and 1 kcal mol−1 Å−2, progressively. At last, the systems were further relaxed by a 10,000-step minimization without any constraint or restraint. There are three phases in a molecular dynamics simulation: the relaxation phase, the equilibrium phase, and the sampling phase. In the relaxation phase, the system was gradually heated up from 50 K to 300 K in steps of 50 K. At each temperature, molecular dynamics simulation was run for 1 ns. In the following equilibrium phase, the system was further equilibrated for 2 ns at 298 K. In the sampling phase, 50,000 snapshots were collected at an interval of 2 ps for post-analysis. All the post-analysis was performed using the CPPTRAJ module of AMBER12.
The stability of the molecular dynamics trajectory was monitored with the backbone r.m.s.d. along the simulation trajectory, showing that the current crystal structure was very stable along the whole simulation course with an overall r.m.s.d. of 3.0 Å, which is reasonable for a structure resolved at 3.9 Å. A subset of 10,000 snapshots was evenly selected for the solvent-accessible surface calculation and cluster analysis. We found no obvious trend for the solvent-accessible surface areas of TMHs. We then performed fixed radius clustering analysis (radius was set to 2.0 Å) for the subset using the clustering toolkit of MMTSB (http://www.mmtsb.org). The biggest cluster, 5,793 members and the shortest distance to the cluster centre, was selected to compare with the crystal structure. Only slight narrowing in the TMD could be observed, but the NBD had larger deviations, probably owing to the missing residues not resolved in this study.
The quasi-harmonic analysis was performed for the main chain atoms using all 50,000 snapshots collected every 2 ps. First, an average molecular dynamics structure was calculated by aligning the molecular dynamics snapshots to the crystal structure using the main chain atoms; each snapshot was then realigned to the average structure to generate the mass-weighted covariance matrix. Vibrational frequencies and modes, which are the eigenvalues and eigenvectors of the covariance matrix, were then obtained. For each vibrational mode, the contribution of each residue was calculated by adding up the vectors of the main chain atoms. The low-frequency modes usually correspond to the global movements of a protein related to its biological function65. Among the lowest 20 vibrational modes, modes 9 and 10 were selected for further examination (Extended Data Fig. 6). Under current quasi-harmonic analysis, we were not able to observe a mode describing opening or closing at the TMD interface. Experimental structures of different catalytic or substrate-bound states will be necessary.
The homology model of the white/brown heterodimer that confers brown eye colour to Drosophila was generated using Modeller28 with the G5G8 crystal structure as the template. The sequence alignment for homology modelling was generated by PROMALS3D66. One hundred homology models were generated and the one with the best DOPE score was selected as the final model. The final model was further evaluated using the PROTABLE module of SYBYL software (https://www.certara.com) and no severe violation was identified in the Ramachandran plot.
G5G8 sequence analysis
To collect ABCG family sequences, we ran PSI-BLAST (three iterations, E value cutoff 0.001) against the NCBI NR database using the G5 TMD as a query sequence (gi|11692800, residue range 368–651). Collected sequences were clustered using CLANS26. Sequences that did not cluster with the eukaryotic ABCG family were excluded to make a subset of ABCG sequences. To establish residue conservations for the G5 and G8 subfamily subsets, sequences from our initial PSI-BLAST against the NR database that clustered together with the human G5 and G8 sequences were kept, and an MSA was generated for each subfamily using MAFFT67. MSA results of G5 and G8 are shown in the Supplementary Information. Redundant and incomplete sequences were manually removed from the MSA. The MSA was used to generate positional conservations for each subfamily and they were mapped to the last line of the MSA using a scale ranging from variable to conserved (scale 0–9), to the structure B-factors in G5 (scale −1.82 to1.52), and in G8 (scale −2.04 to 1.41) using the program AL2CO68. Residues with conservation values of 0.6 or greater in the B-factors (MSA conservation value ≥ ~7) were considered highly conserved.
To gain a broader view of all sequences related to the helical domain of TMD, we collected all related eukaryotic sequences by searching the SWISSPROT sequence database with the same G5 query. We also collected related prokaryotic sequences by initiating PSI-BLAST with representative queries from all COGS that belong to the ABC2 exporter superfamily clan (COG1277, COG1682, COG2386, COG559, COG3694, COG4200, and COG4587). To cut down on redundancy of the NR database, we searched a subset of 122 high-quality curated bacterial genomes designated by NCBI as reference genomes (http://www.ncbi.nlm.nih.gov/ genome/browse/reference/). The sequences, identified in the SWISSPROT database, were combined with those identified in the prokaryotic reference genomes and were clustered in two dimensions using CLANS26 to visualize the ABC2 exporter superfamily sequence relationships. Export substrates by prokaryotic transporters in the ABC2 exporter superfamily include polysaccharide teichoic acids, which are found within cell walls of Gram-positive bacteria (TagG)69; lipo-oligosaccharides, which are located in outer membranes of Gram-negative bacteria (NodJ)70; haem, which is required for cytochrome C biogenesis (CcmB)71; and bacterial-killing lantibiotic peptides such as mutacin (MutG)72 and bacitracin (BcrB)73.
Protein Data Bank
Atomic coordinates and structural factors for the reported crystal structure have been deposited in the Protein Data Bank under the accession number 5DO7.
We thank the University of Texas Southwestern Structural Biology Laboratory and the staff of the Advanced Photon Source (beamlines 19ID and 23ID) for support during data collection. We thank C. Zelasko, L. Donnelly, F. Xu, L. Nie, Z. Wang, Y. Ma, and C. Zhao for technical assistance. We also thank S. Wilkens for comments, and L. Rice, Y. Jiang, and R. Hibbs for providing reagents and equipment. This project was supported by grants from the American Heart Association South Central Affiliate- (0825285F; J.-Y.L.), the American Heart Association Texas Affiliate Beginning Grant-in-Aid (0463130Y; I.L.U.), the Welch Foundation (I-1770; D.M.R.), the Packard Foundation (D.M.R.), the Howard Hughes Medical Institute (H.H.H., N.V.G.), and the National Institutes of Health (HL72304 and P01-HL20948 (H.H.H., J.C.C., X.-S.X.), GM094575 (N.V.G.), GM053163 and GM117080 (Z.O.), and GM113050 (D.M.R.)). The Advanced Photon Source is a US Department of Energy Office of Science User Facility operated for the Department of Energy Office of Science by Argonne National Laboratory (DE-AC02-06CH11357).
Extended data figures
Extended data tables
This file contains the uncropped gels and blots.
This file contains Multiple sequence alignment (MSA) of ABCG5 (page 1) and ABCG8 (page 2). MSA of ABCG5 and ABCG8 subfamily members is colored by Jalview in bluescale according to identity. The sequences are labeled to the left according to the NCBI GI and accession number and numbers to the left and right correspond to the first and last residue number in the corresponding sequence line. The AL2CO conservation index for each position is mapped below the alignment. The first sequences are human ABCG5 (gi|11967969) and ABCG8 (gi|11967971). Top GREMLIN L/3 co-evolving residue pairs in the TMD of ABCG5 (pages 3-6) andABCG8 (pages 7-10).