Structure of putative tumor suppressor ALDH1L1

Putative tumor suppressor ALDH1L1, the product of natural fusion of three unrelated genes, regulates folate metabolism by catalyzing NADP+-dependent conversion of 10-formyltetrahydrofolate to tetrahydrofolate and CO2. Cryo-EM structures of tetrameric rat ALDH1L1 revealed the architecture and functional domain interactions of this complex enzyme. Highly mobile N-terminal domains, which remove formyl from 10-formyltetrahydrofolate, undergo multiple transient inter-domain interactions. The C-terminal aldehyde dehydrogenase domains, which convert formyl to CO2, form unusually large interfaces with the intermediate domains, homologs of acyl/peptidyl carrier proteins (A/PCPs), which transfer the formyl group between the catalytic domains. The 4′-phosphopantetheine arm of the intermediate domain is fully extended and reaches deep into the catalytic pocket of the C-terminal domain. Remarkably, the tetrameric state of ALDH1L1 is indispensable for catalysis because the intermediate domain transfers formyl between the catalytic domains of different protomers. These findings emphasize the versatility of A/PCPs in complex, highly dynamic enzymatic systems. Tsybovsky et. al. report cryo-EM structures of tetrameric rat ALDH1L1 elucidating its architecture and domain interactions important for its functions. These findings emphasize the versatility of acyl/peptidyl carrier proteins in complex, highly dynamic enzymatic systems.

A LDH1L1 (10-formyltetrahydrofolate dehydrogenase), an enzyme of folate metabolism, regulates the availability of one-carbon groups for folate-dependent biochemical reactions 1 . The importance of this regulation is emphasized by high abundance of the enzyme in the liver, the main organ of folate metabolism, as well as by tight control of the ALDH1L1 expression during embryonic development and by the role of the protein as a pan-astrocyte marker [1][2][3] . The regulatory role of ALDH1L1 is linked to its catalytic reaction, the NADP + -dependent conversion of 10-formyltetrahydrofolate (10-fTHF) to tetrahydrofolate (THF) and CO 2 , which irreversibly removes onecarbon groups from the folate pool, thus diminishing the anabolic capacity 1,4 (Fig. 1a). It has been proposed that this reaction interferes with rapid cellular proliferation, but at the same time supports homeostasis in non-proliferating cells by supplying THF for the reaction of conversion of serine to glycine and for the formate metabolism 1,4,5 . The role of the enzyme in supporting glycine production has been recently demonstrated in the Aldh1l1 knockout mouse model, with livers of ALDH1L1deficient mice having decreased levels of THF, glycine and glycine conjugates 5 . Lately, the enzyme's function has been linked to NADPH production and oxidative stress 6 . ALDH1L1 is also considered a putative tumor suppressor 4 . This role is supported by findings that the protein is strongly and ubiquitously downregulated in malignant tumors and cancer cell lines 7,8 , the effect associated with hypermethylation of the ALDH1L1 promoter [9][10][11][12][13] . Of note, expression of ALDH1L1 in cancer cell lines produces strong antiproliferative effects by activating specific apoptotic pathways 7,[14][15][16][17][18] . In further support of the suppressive effect of ALDH1L1 on proliferation, the enzyme is strongly downregulated in S-phase of the cell cycle through proteasomal degradation but is elevated in quiescent cells 19 . Although the knockout of Aldh1l1 in mice did not cause the initiation of malignant lesions, it promoted the growth of larger liver tumors initiated by a chemical carcinogen 20 .
ALDH1L1 originated from a natural fusion of three unrelated genes, the phenomenon defining the structural organization of the ALDH1L1 enzyme 1 . The protein exists as a homotetramer, with each 902 amino acid-long protomer organized in three distinct functional domains (Fig. 1b, c). The N-terminal domain (N t , aa 1-310) carries the folate-binding site and has sequence and structural similarity to methionyl-tRNA Met -formyltransferase (FMT), the enzyme involved in translation initiation in mitochondria 21 . FMT formylates the initiator Met-tRNA Met by transferring the formyl group from 10-fTHF, thus using the same substrate as ALDH1L1 [22][23][24] . The C-terminal domain (C t , aa 405-902) belongs to the family of aldehyde dehydrogenases (ALDHs), the group of enzymes catalyzing the conversion of a large variety of aldehydes to corresponding acids using NAD + or NADP + as the electron acceptor 25 . The C t domain shares up to 50% sequence similarity with members of this family and has a typical ALDH fold, which includes NAD(P) + -binding, catalytic, and oligomerization sub-domains 26,27 . The C t domain contains all critical catalytic residues conserved in ALDHs, including Cys707, which plays the role of the catalytic center nucleophile 27,28 . Accordingly, the C t domain catalyzes the conversion of short-chain aldehydes to corresponding acids in vitro, but it is not known whether ALDH1L1 participates in aldehyde oxidation in vivo 1 . Finally, the intermediate domain (Int, aa 314-397) linking the N t and C t domains is a homolog of a group of small, structurally closely related carrier proteins involved in fatty acid, polyketide, and non-ribosomal peptide biosynthesis 29 . A characteristic feature of these acyl/peptidyl carrier proteins (A/PCPs) is the 4′-phosphopantetheine prosthetic group (4′-PP) covalently attached to a serine residue through a phosphoester bond 30,31 . This prosthetic group serves as a flexible arm enabling the transfer of building blocks between subunits of multi-enzyme complexes 30,31 .
Functional studies of ALDH1L1, its numerous mutants and engineered constructs together with structural and functional characterization of the individual domains provided insight into the enzyme catalytic machinery 21,[26][27][28][29][32][33][34][35][36][37][38][39][40][41] . Overall, in the ALDH1L1 catalysis, the 4′-PP arm of the Int domain transfers the formyl group cleaved from 10-fTHF in the folate-binding N t domain to the C t domain, where it is oxidized to carbon dioxide 1 (Fig. 1a). To execute this mechanism, in addition to the flexibility of the 4′-PP moving arm, sufficient mobility of ALDH1L1 domains relative to each other is necessary. This complicates the structural analysis of the full-length ALDH1L1 protein. Indeed, while several crystal structures of N t and C t domains were reported 21,27,[38][39][40][41] , and an NMR structure of a synthetic Int domain devoid of the 4′-PP arm is available (PDB 2cq8), the structure of the full-length enzyme has not been resolved so far.
Here, we report the structures of ligand-free and NADP + -bound full-length ALDH1L1 at resolutions of 3.7 Å and 2.9 Å, respectively, obtained by cryo-electron microscopy (cryo-EM). This study provides insights into the ALDH1L1 structure and function by (i) demonstrating high mobility of the N t domains, which form transient complexes with other structural units; (ii) describing the unusual mode of interaction between the Int and C t domains, with a large contact interface atypical for A/PCPs, and (iii) revealing unique pairing of the Int and C t domains, which requires the tetrameric organization for catalysis.

Results
Overall architecture of full-length ALDH1L1. To reveal the domain organization of ALDH1L1 we performed negative-stain EM (NS-EM) of the full-length ligand-free protein (Rattus norvegicus ALDH1L1 produced in insect cells using a baculovirus expression system) as well as of its individually expressed C t domain, which forms the rigid tetrameric core of the full-length enzyme. The C t core was clearly visible in the 2D class averages (Fig. 2a, b), enabling the identification of all four Int domains. All Int domains in full-length ALDH1L1 sat closely to the C-terminal core and were arranged in an apparently symmetrical manner. The high contrast provided by negative staining also allowed, in many cases, to resolve the N t domains both in raw micrographs (Supplementary Fig. 1) and 2D class averages (Fig. 2b). The N t domains assumed variable positions with respect to the rest of the protein, indicating their high mobility. Focused 2D classification revealed that the N t domain sampled the entire range of positions between the one oriented away from the protein central core and the one in contact with the C t domain (Fig. 2c). We did not observe any obvious coordination in positioning of the four N t domains in the ALDH1L1 tetramer in NS-EM experiments, a finding suggesting asynchronous movement of these domains in the full-length protein. However, we noticed that a conformation in which the N t domain is tightly packed against the C t core was encountered repeatedly (Fig. 2b). The high mobility of the N t domain illustrated by the NS-EM data is likely required for the multi-step catalytic mechanism of the enzyme. Subsequently, we used cryo-EM to characterize the structure of ligand-free ALDH1L1 at high resolution (Fig. 2d). A 3.7-Å map was obtained from a dataset of 86,276 particles when D2 symmetry was imposed. The rigid C t core and four Int domains were clearly resolved in this structure, while the highly mobile N t domains were not visible in the symmetrical map. To elucidate the potential effect of NADP + on the structural organization of ALDH1L1, we also prepared cryo-EM grids after adding 1 mM NADP + to the full-length protein. Single-particle analysis of the new dataset containing 202,398 particles produced a 2.9-Å resolution map (Fig. 2e). The arrangement of the C t core and Int domains in this structure was identical to that in the ligand-free protein and corresponded well to the configuration of the C t and Int modules revealed by NS-EM (Fig. 2f).
Structures of the Int and C t domains. Both the Int and C t domains were clearly resolved in the cryo-EM maps (Fig. 3), which permitted the building of atomic models for ligand-free and NADP + -bound ALDH1L1. The entire 4′-PP prosthetic group was also well defined in the cryo-EM density (Fig. 3a). As expected based on a previous study 29 , the Int domain exhibits a fold typical of A/PCPs (Fig. 3a, b) 42 . Accordingly, its structure consists of three major α-helices (I, II, and IV) forming a loose  bundle, with another short helix (III) bridging helices II and IV. Helices I and II are connected by a long 19-residue linker that forms a loop and a helical turn. Serine 354, the site of the 4′-PP group attachment, is located in the beginning of helix II. Each C t domain contains the catalytic, NADP + -binding and oligomerization sub-domains (Fig. 3c). The deep substrate entrance tunnel is located between the catalytic and NADP +binding sub-domains and leads to the catalytic cysteine 707. The NADP + binding site is situated on the side opposite to the substrate entrance tunnel. In the cryo-EM maps of ALDH1L1, the structure of the C t domain was very similar to the crystal structures of individual C t domains previously reported 27 . The root mean square deviation (r.m.s.d.) between the protomer of the C t domain solved by cryo-EM and the protomer of the corresponding X-ray structure (PDB 2o2p or 2o2q 27 ) was 0.46 Å and 0.75 Å, respectively, for ligand-free and NADP + -bound proteins. The r.m.s.d. between the entire tetramers was slightly higher (0.621 Å and 1.08 Å, respectively), suggesting slight differences in protomer positions in full-length ALDH1L1, which may be a consequence of the interactions between the C t and Int domains. It is also possible that the conformation of the protein in the crystal structures was affected by crystal packing.
We found that the cryo-EM structures of ligand-free and NADP + -bound ALDH1L1 were highly similar (r.m.s.d. = 0.86 Å) (Fig. 3d). A small difference was observed in the NADP + -binding site, where the C-terminal end of the helix formed by residues 652-666 was better ordered in the presence of the coenzyme. Only the AMP portion of bound NADP + was clearly visible in the cryo-EM map, while there was no density for the rest of the cofactor. Of note, weak density for the nicotinamide riboside of NAD + and NADP + is commonly observed in aldehyde dehydrogenases 27,43,44 .
Unique pairing of Int and C t domains. In both ligand-free and NADP + -bound ALDH1L1, each Int domain formed contacts primarily with one of the four C t domains (Fig. 4a). The Int and C t domains of the same protomer were separated by a 20-Å-long extended linker consisting of residues 397-404 (Fig. 4b) and did not interact with each other. Instead, the Int domain of chain A was docked at the substrate entrance tunnel of the C t domain of chain C, whereas the Int domain of chain C was paired to the C t domain of chain A (Fig. 4c, d). An identical arrangement was found for protomers B and D. Of note, the tetrameric C t core of ALDH1L1 is composed of two homodimers formed by protomers A/B (dimer 1) and C/D (dimer 2). Therefore, each Int domain of ALDH1L1 is paired to a C t domain of the opposite dimer ( Fig. 4d). Out of the eight residues composing the linker in rat ALDH1L1, five are negatively charged amino acids, which makes the linker highly hydrophilic. Sequence alignments showed that this property of the linker is preserved across species as well as between ALDH1L1 and ALDH1L2 proteins (Fig. 4b), suggesting that hydrophilicity and negative charge of the linker are important for the function of the enzyme. Of note, although there was continuous cryo-EM density for the linker in the unsharpened maps ( Fig. 4a), sharpening weakened this density to the extent where reliable placement of the main chain was not possible, indicating that the linker retains some flexibility.
Interactions between Int and C t domains. We found that the Int and C t domains form a relatively large contact interface (Fig. 5). The base of the Int domain (including the end of helix I, the beginning of the loop connecting helices I and II, and a large part of helix II) fits into the orifice of the substrate entrance tunnel of its partner C t and is simultaneously flanked by the tip of the oligomerization sub-domain of a second C t domain, which contacts the end of the loop between helices I and II and, to a lesser extent, helix III ( Fig. 5a, b). This secondary interaction also occurs with a C t domain of the opposite dimer (e.g., Int of protomer A is docked into C t of protomer C and interacts with the oligomerization sub-domain of C t from protomer D). The total contact area between the Int domain (excluding the 4′-PP prosthetic group) and the two C t protomers is 653 Å 2 . Calculation of electrostatic potentials revealed that the surface of the Int domain is mostly negatively charged, including the contact interface (Fig. 5c). This is in agreement with the known acidic nature of many A/PCPs 42 . In contrast, the corresponding contact area of the C t domain that accommodates 4′-PP is charged predominantly positively. The Int and C t domains form multiple  Bound NADP + is shown in sphere representation. d Superposition of the C t domains of ligand free (gray) and NADP + -bound (light blue) ALDH1L1. The binding of NADP + induces local structural changes in one helix forming the binding cleft for the adenine moiety (dashed circle). Cryo-EM density for NADP + is shown as semi-transparent surface.
interactions, with the closest contacts between the main chain carbonyl of Gly351 (loop connecting helices I and II) and the amide of Gln693 (2.8 Å), the guanidinium group of Arg359 (helix II) and the side-chain oxygen of Asn745 (3.2 Å), as well as a side-chain oxygen of Glu366 (helix II) and the guanidinium group of Arg742 (3.3 Å). Alignment of the cryo-EM structures of ALDH1L1 and available X-ray structures of the C t domain revealed no significant alterations in positions of amino acid side chains in the regions that contact the Int domain. Similarly, structural superposition of the Int domain with the NMR structure of individual human Int without the 4′-PP prosthetic group (PDB 2cq8) revealed no major differences (r.m.s.d. = 0.83 Å). Therefore, complex formation between the Int and C t domains relies on the shape and charge complementarity.
Interactions of the 4′-phosphopantetheine prosthetic group. The 4′-PP group covalently linked to serine 354 of the Int domain was found in the fully extended conformation, penetrating deep into the substrate entrance tunnel of the C t domain and making multiple contacts with residues forming the tunnel (Fig. 6a). Two lysine residues of the C t domain, Lys520 and Lys865, formed ion pairs with the phosphate of 4′-PP. The main chain carbonyl oxygen of Asn864 and the hydroxyl of Thr521 were within the hydrogen-bonding distance from the hydroxyl and carbonyl oxygens, respectively, of the pantothenic acid moiety. The amide group of asparagine 706 was positioned 3.5 Å from the carbonyl of the β-alanine moiety of 4′-PP.
Curiously, we found that the sulfur atom of the catalytic nucleophile Cys707 in the ALDH active center was positioned closely to the sulfur atom of the 4′-PP, and the cryo-EM density between the two atoms appeared continuous (Fig. 6b). This suggested that a disulfide bond formed between the two atoms. To verify the presence of such a bond, we conducted trypsin digestion of ALDH1L1 followed by liquid chromatography-mass spectrometry (LC/MS). Analysis of the LC/MS data revealed readily detectable ions at m/z = 1094.5 ([M + 2H] 2+ ) and m/z = 730.0 ([M + 3H] 3+ ) that corresponded to a tryptic digestion product of nominal mass 2187 Da, identical to the theoretical mass of 4′-PPcrosslinked peptides 350 S-R 359 and 704 G-R 712 (Fig. 6c). The subsequent collision-induced dissociations of these ions resulted in a pattern of MS peaks, the interpretation of which allowed unequivocal identification of the chemical structure of the parent ions (Fig. 6d). The most intense peaks in the MS 2 spectra resulted from the break of the labile phosphate moiety followed by the neutral loss of the phosphate group, either as phosphoric (H 2 PO 4 ) or meta-phosphoric (HPO 3 ) acid (Δ mass 98 and 80 Da, respectively) 45 . The presence of a series of b-and y-ions unambiguously confirmed the amino acid sequences and sites of the 4′-PP covalent modification in the crosslinked peptides. While it was not possible to quantify the prevalence of the crosslinked peptide, its presence confirmed that the disulfide bond formed between the 4′-PP prosthetic group and Cys707 in a population of ALDH1L1 molecules. The inclusion of the disulfide bond in the molecular models improved the fit of the two sulfur atoms to the cryo-EM density.
To investigate the contribution of the 4′-PP arm to the interactions between the Int and C t domains, we designed a shorter version of ALDH1L1, containing only the Int and C t domains (termed int-C t ), and expressed it in E. coli. It was shown previously that ALDH1L1 produced in bacteria lacks the 4′-PP prosthetic group 29 , and therefore only protein-protein contacts could contribute to the interactions between the Int and C t domains in int-C t . NS-EM of int-C t resolved the C t core and, in some cases, Int domains adjacent to it (Fig. 5d). However, the sites of the C t core that had been invariably occupied by the Int domains in full-length ALDH1L1 (Fig. 2a) were predominantly vacant in the int-C t protein lacking 4′-PP.
In agreement with such an arrangement, only the tetrameric C t core was resolved in the 2.7-Å crystal structure of int-C t expressed in bacteria, with no electron density present for the Int domains. This indicates that the interactions of the 4′-PP arm with residues of the substrate entrance tunnel are critical for the formation of a stable complex between the Int and C t domains of ALDH1L1. Int domains are highly mobile when not docked into C t domains. During the initial step of the ALDH1L1 catalysis, 4′-PP arms must be accessible to interact with the N t domains. With the arm placed outside of the C t domain active site, the Int domain is expected to disengage the C t domain, leaving the substrate entrance tunnel vacant. However, global 3D classification did not produce classes with vacant substrate entrance tunnels of the C t domains. To determine the fraction of free (not occupying substrate entrance tunnels) Int domains, we performed local 3D classification within a mask encompassing each Int domain and calculated the total number of particles that contributed to empty versus occupied classes ( Supplementary Fig. 2). This analysis produced estimated average occupancies of 76% and 83% for the Int domain in the ligand-free and NADP + -bound structures, respectively. This indicates that a fraction of Int domains is present in the free form and is available to shuttle the substrate between the N t and C t catalytic centers. However, since neither NS-EM nor cryo-EM experiments resolved Int in any other state than docked at the C t substrate entrance tunnel, the Int domains do not seem to assume strictly defined positions while shuttling between the two catalytic domains in the ligand-free and NADP + -bound enzyme. These results also indicate that the state with the Int domains docked at the C t substrate entrance tunnels and with the 4′-PP arms reaching into the ALDH active sites is the most favorable conformation for the resting (substrate-free) enzyme. The partial disulfide bond between the 4′-PP arm and Cys707 could serve to support this conformation. enforcing symmetry. The resulting 4.4-Å-resolution map was very similar to its symmetrical counterpart, except that weak density for a single N t domain became visible (Fig. 7a). This N t domain appeared to interact with the NADP + -binding sub-domain of one of the C t domains. Low r.m.s.d. values between the chains of this structure (0.67-0.74 Å) indicated that this interaction did not induce large structural rearrangements in the protein core. A subsequent local 3D classification of the same dataset isolated a conformation with two clearly visible N t domains in the same orientation, corresponding to 5517 particles. 3D refinement of this smaller dataset resulted in a 6.8-Å map with strong density for the two N t domains. (Fig. 7b). The middle section of each N t domain was positioned within 8 Å from several secondary structure elements of the C t domain, and the amino-terminal portion of the N t domain sat directly above the α-helix (residues 653-664) of the C t domain that forms one side of the cleft accommodating the adenine moiety of NADP + 27 . Notably, although most α-helices of the C t core were well resolved in the map, there was no cryo-EM density for this key helix.
In another 3D class, containing 17,499 particles and refined to 7.0 Å resolution, a single N t domain was found to straddle the C t core between two Int domains, with both ends of N t in contact with the ALDH1L1 core (Fig. 7c). As expected, the carboxylterminal region of the N t domain was positioned close to the amino-terminal end of the Int domain of the same protomer. Interestingly, the amino-terminal portion of the N t domain interacted with the linker connecting the Int and C t domains of a different protomer. Although the resolution of the map is insufficient for interpreting this interaction at atomic details, it is clear that residues 55-61 of the N t domain, composing a loop and a short beta-strand, were supported by the Int-C t linker. Of note, while the linker is negatively charged (Fig. 3B), the complementary interface of N t contains a positively charged patch along the contact interface (Fig. 7c).
To confirm the formation of transient complexes between N t domains and the ALDH1L1 core, we performed chemical crosslinking of the full-length protein with 0.1% glutaraldehyde followed by NS-EM and 2D classification ( Supplementary Fig. 3a). This treatment resulted in gradual disappearance of 2D classes displaying N t moieties not in contact with the protein core ( Supplementary Fig. 3b), indicating that glutaraldehyde crosslinking stabilized the transient complexes formed by the N-terminal domains. Of note, N t domains attached to the core were reliably resolved in the 2D class averages even after prolonged glutaraldehyde treatment, suggesting that crosslinking occurred at specific positions.

Discussion
ALDH1L1 has two catalytic centers located in separate domains and utilizes a carrier protein (evolutionarily incorporated as a domain) to transfer the reaction intermediate between these centers. This carrier protein domain is highly similar to A/PCPs employed in the biosynthesis of fatty acids, non-ribosomal peptides and polyketides, reactions performed by large and complex multi-enzymatic molecular machines [46][47][48] . This type of modular organization implies extensive domain movements accompanying the transport of substrate between the active sites. Likewise, we found that the tetrameric aldehyde dehydrogenase module of ALDH1L1, located at the C terminus, forms the rigid core of the enzyme, whereas the N-terminal hydrolase domains assume a continuum of positions apparently constrained mainly by the length of the inter-domain linkers. In our cryo-EM structures of ALDH1L1 in the resting state (i.e., in the absence of substrate), the Int (carrier) domains were resolved docked at the substrate entrance tunnels of the C t core, but the incomplete occupancy of these anchored carriers indicates that they operate as highly mobile units. The complex between the N t and Int domains was  Fig. 4). Of note, in vitro both the N t and C t domains, either expressed individually or within the full-length enzyme, are capable of independent catalysis, 10-formylTHF hydrolysis or small chain aldehyde oxidation, respectively 26,34 . It is not clear whether such independent activities take place in the cell since the hydrolase catalysis in vitro requires high concentrations of nonphysiological sulfhydryls while putative substrates for the ALDH reaction are unknown. Thus, the complex mechanism enabled by the merging of the three domains is likely the only catalytic function of ALDH1L1. Most aldehyde dehydrogenases are known to exist as either homodimers, typified by the members of the ALDH3 family, or homotetramers, represented mainly by ALDH1/2 families, which also include ALDH1L1 50 . Such homotetramer is organized as a dimer of homodimers formed by protomers A/B and C/D as schematically presented in Fig. 4d. Although the enzymatic mechanism of ALDH1L1 does not dictate a specific quaternary organization of the enzyme, the 3-dimensional structure revealed that the tetrameric state of the C-terminal ALDH module is indispensable for the enzyme function. In our cryo-EM structures, the carrier domains of protomers A and B were paired with the ALDH domains of protomers C and D, respectively, while the carrier domains of protomers C and D interacted with the C t domains of protomers A and B. This pairing scheme can only be realized in a tetrameric enzyme. Furthermore, we found that the length and composition of the linker connecting the Int and C t domains are preserved in cytosolic (ALDH1L1) and mitochondrial (ALDH1L2) enzymes as well as across multiple species, suggesting that this intricate domain pairing is a universal characteristic of 10-formyltetrahydrofolate dehydrogenases. Of note, although multiple studies analyzed the oligomeric state of ALDH proteins (recently reviewed in Shortall et al. 51 ), the physiological significance of oligomerization is unclear for most of these enzymes. One possible exception is tetrameric ALDH from Thermus thermophilus, which has a~30 amino-acid-long C-terminal extension that interacts with the N-terminal region of a protomer from a different homodimer 52 . Other ALDHs, including fatty aldehyde dehydrogenase (FALDH) and ALDH7A1, also feature short C-terminal extensions, but they interact with the protomer within the same homodimer 53,54 . In contrast, ALDH1L1 is the example of an ALDH with two additional domains spanning 400 amino acids at the N-terminus of the enzyme, with the tetrameric state being a prerequisite for its complex function.
The acyl and peptidyl carrier proteins evolved to shuttle catalytic intermediates between reaction centers, which requires interaction with multiple partner proteins. This functional versatility necessitates that the nature of such interactions is transient, which is crucial for the uninterrupted action of molecular machines that employ A/PCPs 49 . Accordingly, the contact area between the A/PCP and the partner protein is usually small, with most interacting residues confined to helix II of the carrier protein and, to a lesser extent, helix III and the part of the linker between helices I and II that is close to helix II 42,[55][56][57] . The small size of the contact interface often requires cross-linking to enable structural investigation [57][58][59] . In contrast, the relatively large contact interface between the Int and C t domains of ALDH1L1 was resolved in its native, non-cross-linked form (Fig. 5a-c). In addition to helix II, this interface also involves the base of helix I. Moreover, distinct to other A/PCPs, the beginning of the loop connecting helices I and II protrudes towards the C t domain to interact with two helices forming the orifice of the substrate entrance tunnel (Fig. 5e). Thus, the structure of ALDH1L1 illustrates, to our knowledge, a new mode of interaction between an A/PCP-like carrier protein and its catalytic partner. Importantly, while this interaction favors the state with Int docked at the ALDH substrate entrance tunnel, the incomplete occupancy of this anchored Int domain indicates that this interaction is reversible and does not preclude the shuttling of free (undocked) Int between the catalytic domains during catalysis. Importantly, the 4′-PP arm of the carrier domain was fully resolved in our cryo-EM maps. It spanned the entire 12-Å-deep substrate entrance tunnel of the ALDH domain, extending towards the catalytic cysteine. We found that the contacts formed by the 4′-PP prosthetic group are critical for the formation of a stable complex between the Int and ALDH domains of ALDH1L1. During catalysis, the extended 4′-PP conformation would place the formyl group transported from the N t domain precisely in the ALDH active site, allowing the nucleophilic attack by Cys707. Curiously, in the absence of the substrate, a partial covalent bond formed between the sulfur atoms of 4′-PP and Cys707. While formation of this disulfide link is likely prevented by the formyl group attached to the 4′-PP arm during catalysis, it could be hypothesized that in the resting state of the enzyme such a bond prevents irreversible oxidation of both the catalytic cysteine and 4′-PP sulfur atoms. This disulfide could be reduced by cellular glutathione accessing the active center through the NADP + binding site. Of note, in the individually expressed C t domain, Cys707 was shown to form a transient covalent adduct with the C4 atom of the nicotinamide ring of NADP + 27 , which suggests that this cysteine is highly reactive beyond the immediate catalytic step. It could also be hypothesized that maintaining the 4′-PP arm within the substrate entrance tunnel prevents the entrance of small aldehydes into the ALDH catalytic center in vivo, thus preserving the enzyme for the 10-formylTHF dehydrogenase catalysis. Alternatively, we cannot exclude the possibility that the observed disulfide bond is the result of oxidation in our experimental setting.
The high mobility of the N t domains suggests that ALDH1L1 catalysis is driven primarily by stochastic domain movements. However, the cryo-EM maps of the states with N t domains resolved in fixed positions provide evidence of non-random interactions of these functional modules, which could play a role in the enzymatic mechanism. In one such cryo-EM map, two N t domains were shown to interact with the NADP + binding regions of the ALDH domains (Fig. 7b). Since a key helix forming the NADP + binding site was disordered in each involved ALDH domain, it is possible that in this conformation the N t domain interferes with the binding of NADP + . In support of such a possibility, this conformation was not detected in the ALDH1L1-NADP + dataset. Based on these results, we hypothesize that the hydrolase domains of ALDH1L1 may be involved in regulating the enzymatic reaction performed by the ALDH domains. It has to be noted, however, that full-length ALDH1L1 and the individually expressed ALDH domain displayed similar affinities for NADP + (K d of 0.3 µM versus 0.2 µM, respectively) 26 , suggesting that the proposed effect is likely small. In the second cryo-EM map, the N-terminal moiety of the hydrolase domain rested on the linker connecting the carrier and ALDH domains of a different protomer (Fig. 7c), with a remarkable charge complementarity between the linker, carrying a strong negative charge (Fig. 4b), and the positively charged region of the N t domain contacting it (Fig. 7c). We surmise that in this ALDH1L1 conformation the N t domain may be involved in the extraction of the Int domain from the substrate entrance tunnel of the C t core, with the positively charged patch acting as a hook. Alternatively, this domain arrangement may create a scaffold for the formation of the complex between the N t and Int domains. In the latter scenario, a large-scale rotation and shift of the Int domain would be necessary to bring together the sulfhydryl of the 4′-PP arm and the N t active site residues, which are~50 Å apart. While the above interpretations are speculative, the existence of scarcely populated states with firmly positioned hydrolase domains alludes to an intricate mechanism of catalysis that may involve various auxiliary inter-domain interactions guiding the overall random domain movements during catalysis.
In summary, in this study cryo-EM revealed the unusual architecture of the multi-domain enzyme ALDH1L1, which enables the complex catalytic mechanism. Protein oligomerization and multidomain organization are common phenomena in eukaryotes [60][61][62][63] . While a modular organization can expand the enzyme functionality 61 , oligomerization provides benefits such as efficiency, regulation and stability 63 . In some cases, oligomerization is required because catalytic centers are formed by residues from different protomers or because oligomers enable additional non-catalytic regulatory sites. Metabolic enzymes can also form structures of higher degree of order like filaments, which might not directly affect the catalysis within a single unit 64 . Of note, all such examples were reported in folate metabolism where ALDH1L1 belongs [65][66][67][68] . Here we uncovered another mechanism in which tetrameric organization allows modular catalysis bypassing spatial restrictions within a single protomer. Thus, the tetrameric state of ALDH1L1 is indispensable for the enzyme functionality, which also involves transient domain interactions and large-scale domain movements. Finally, the complex between the intermediate and aldehyde dehydrogenase domains of ALDH1L1 demonstrates, to our knowledge, a new mode of interaction between an A/PCP-like carrier protein and a catalytic domain, emphasizing the versatility of A/PCPs.

Methods
Protein expression and purification. Full-length rat ALDH1L1 was expressed following a previously developed protocol 69 . Specifically, High Five insect cells (Invitrogen) grown as monolayer (Grace's insect medium supplemented with 10% fetal bovine serum/175-cm 2 cell culture flasks) at 27°C were infected with a high titer recombinant baculovirus stock produced as previously described 69 . Five days after infection, the culture medium was collected, and detached cells were removed by centrifugation (10,000 × g, 10 min). To purify ALDH1L1, the cell culture medium was applied to a column containing 5-formyl-THF-Sepharose affinity resin equilibrated with 10 mM Tris-HCl buffer, pH 7.4, containing 10 mM 2-ME and 1 mM NaN 3 (buffer A). The column was washed with buffer A and then with the same buffer containing 1.0 M KCl; the enzyme was eluted with buffer A containing 1.0 M KCl and 20 mM folic acid. The eluate was concentrated and excess KCl removed using a spin concentrator. Additional purification was then carried out using FPLC/Mono-Q column (GE) chromatography with a linear KCl gradient (0-0.5 M in buffer A) and Sephacryl S-300 (GE) size-exclusion chromatography in buffer A with 0.2 M NaCl. The individual C t domain and Int-C t protein were expressed as 6xHis tagged constructs in E. coli (Invitrogen) from pRSET vectors. Protein expression was carried out at 22°C, and the soluble cell fraction was separated by sonication and centrifugation. The proteins were purified using Ni-NTA or Co-NTA agarose (Qiagen) using a 5-20 mM imidazole gradient to remove impurities followed by elution with 100 mM imidazole in buffer A supplemented with 100 mM KCl. Additional purification was done by size-exclusion chromatography on Sephacryl S-300. The purity of all proteins was confirmed by SDS-PAGE with Coomassie staining. Purified full-length ALDH1L1 was tested for the 10-formylTHF dehydrogenase activity as we previously described 33 . C t domain and Int-C t protein were tested for the aldehyde dehydrogenase activity using propanal as the substrate and NADP + as the cofactor essentially as we described 26 . After purification, all protein preparations used in the present study had specific activities close to previously reported values 26,33 and were stored at −80°C in the presence of 10 mM 2-ME and 20% glycerol.
Liquid chromatography/mass spectrometry. In total, 30 µg of ALDH1L1 (50 µL of protein solution) was combined with 25 µL of 9 M urea and 10 µL of acetonitrile and incubated for 10 min at 42°C. This mixture was diluted with 250 µL of 100 mM ammonium bicarbonate prior to the addition of 5 µg of sequencing grade trypsin (Promega). The proteolytic digestion was carried out for 4 h at 37°C. The resulting peptides were loaded onto a reverse-phase C4 (2.1 mm × 50 mm) column (Thermo Scientific). Peptides were resolved and eluted with a gradient of acetonitrile in water (from 98% H 2 O with 0.1% (v/v) formic acid (A) and 2% acetonitrile with 0.1% (v/v) formic acid (B) to 100% B) developed over 20 min. Separation was achieved at a flow rate of 0.3 mL/min using an Agilent Technology 1100 Series HPLC system. The eluent was directed into an LTQ Velos linear trap quadropole mass spectrometer (Thermo Scientific) equipped with an electrospray ionization source operated in positive ion mode. Parameter settings of the mass spectrometer for peptide detection were as follows: activation type, collisioninduced dissociation; normalized collision energy, 35 kV; capillary temperature, 370°C; source voltage, 5 kV; capillary voltage, 43 V; tube lens, 105 V. MS spectra were collected over a 200-2000 m/z range. The raw MS data were analyzed using Qual Browser for Thermo Xcalibur version 2.1.
Negative-stain electron microscopy. Protein samples were diluted with buffer containing 10 mM HEPES, pH 7, and 150 mM NaCl to~0.02 mg/ml. A 4.7-µl drop of the diluted sample was placed on a freshly glow-discharged carbon-coated copper grid and left for 15 s. Excess liquid was removed using filter paper, and the grid was washed three times with 4.7-µl drops of the same buffer. After the final wash, the buffer drop was removed in the same manner, and the protein was negatively stained by applying a 4.7-µl drop of 0.75% uranyl acetate for 30 s. Excess negative stain was removed using filter paper, and the grid was allowed to dry. Data were collected using SerialEM 70 on a Tecnai T20 electron microscope (FEI, the Netherlands) equipped with a LaB 6 filament operated at 200 kV and a 2k × 2k FEI Eagle CCD camera. The nominal magnification was 100,000x, which corresponded to a pixel size of 2.2 Å. EMAN2 71 was used to semi-automatically select 249,416 particles from 3920 micrographs. The selected particles were extracted into 128 × 128-pixel boxes and subjected to reference-free 2D classification into 256 classes using Relion 2.1 72 . For separate visualization of the N t domains, 30,186 peripheral domains (arms) of negatively stained FDH molecules were selected manually using EMAN2 from 392 micrographs of the same dataset. The selected particles were extracted into 64×64-pixel boxes and classified using Relion 2.1 into 256 classes.
Chemical cross-linking and comparative quantification of mobile N-terminal domains. Full-length ALDH1L1 was diluted to 0.01 mg/ml with buffer containing 10 mM HEPES, pH 7, and 150 mM NaCl, followed by the addition of 0.1% glutaraldehyde. Aliquots were taken before the addition of glutaraldehyde and after 1 min, 5 min, 10 min, 30 min, and 60 min of incubation at 4°C, and negative staining and NS-EM data collection were performed as described above. All datasets were subjected to 2D classification in Relion. After discarding 2D classes that did not represent intact ALDH1L1 molecules, the final NS-EM datasets contained 57,888 (control), 36,538 (1 min), 32,722 (5 min), 39,150 (10 min), 47,877 (30 min), and 30,449 (60 min) particles. 2D class averages displaying ALDH1L1 molecules with at least one arm that was not in contact with the protein core were identified by visual inspection, and their fractions were calculated based on the total number of particles that contributed to these classes. These fractions were used solely for the purpose of comparing the cross-linking time points because not all mobile N t domains could be captured by 2D classification due to their dynamic nature.
Cryo-electron microscopy specimen preparation and data collection. ALDH1L1 was vitrified at a concentration of 0.4 mg/ml in 20 mM HEPES, pH 7.6 (apo-ALDH1L1) or 40 mM HEPES, pH 7, 1 mM NADP + (ALDH1L1-NADP + complex). Cryo-EM specimens were prepared by plunge-freezing in liquid ethane using Vitrobot Mark IV (FEI) at room temperature and 90% humidity. The grids (Quantifoil R2/2 with gold support) were glow-discharged for 30 s at a pressure of 37 mBar and with the current set to 30 mA. The drop volume was 3 µl. Data were collected at the National Cryo-Electron Microscopy Facility (NCEF) at National Cancer Institute on a Titan Krios electron microscope (FEI) operated at 300 kV and equipped with a K2-Summit direct electron detector (Gatan). The detector was used in the super-resolution mode. For apo-ALDH1L1, 2202 movies were collected with a nominal dose of 40 e − /Å 2 equally distributed between 40 frames of a 12-s movie, and the pixel size (super-resolution mode) was 0.66 Å (magnification: 105,000x). The defocus range was −1 to −3 µm. For ALDH1L1-NADP + complex, 2381 movies were collected with a nominal dose of 40 e − /Å 2 equally distributed between 40 frames of a 14-s movie, and the pixel size (super-resolution mode) was 0.532 Å (magnification: 130,000x). The defocus range was −1 to −2.5 µm (Table 1).
Single-particle analysis of cryo-electron microscopy data. Motion correction and dose weighting were performed using MotionCor2 73 . For local motion correction, frames were divided into 25 tiles. Images were binned 2x (apo-ALDH1L1) and 1.5x (ALDH1L1-NADP + ) during motion correction, resulting in pixel sizes of 1.32 Å and 0.76 Å, respectively. Contrast transfer function parameters were estimated using ctffind 4.1 74 . All other image processing steps were performed in Relion 3.0 72 unless stated otherwise. Particles were picked automatically using projections of an X-ray structure of the tetrameric C-terminal domain of ALDH1L1 (PDB 2o2p 27 ) low-pass filtered to 40 Å, resulting in datasets containing 1,082,600 (apo-ALDH1L1) and 1,050,740 (ALDH1L1/NADP + ) particles. The particles were extracted, with 2x binning, into 80 × 80 (apo-ALDH1L1) or 140 × 140 (ALDH1L1/NADP + ) pixel boxes and subjected to reference-free 2D classification into 128 classes with selection of high-resolution classes corresponding to a complete, undistorted tetramer of the C t domain that appeared symmetrical. This selection reduced the size of the datasets to 424,239 and 640,945 particles, respectively. The corresponding particles were re-extracted, without binning, into 160 × 160 (apo-ALDH1L1) or 280 × 280 (ALDH1L1/NADP + ) pixel boxes, and reference-free 2D classification into 128 classes was repeated. Selection of best-looking classes resulted in datasets of 147,837 (apo-FDH) and 594,883 (FDH/NADP + ) particles. 3D classification into 10 classes was performed next with the above-mentioned X-ray structure of the tetramer of the C t domain low-pass filtered to 40 Å serving as the initial model. No symmetry was imposed at this stage. The presence of additional density at the substrate entrance tunnel of the C t domain was obvious in the resulting 3D classes, and in all 3D classes with sufficiently high resolution this density consisted of four α-helices and an arm protruding deep into the substrate entrance tunnel. Additional density consistent with the size and shape of the N t domain of ALDH1L1 (PDB 1s3i 21 ) was observed in several 3D classes. High-resolution 3D classes, as well as 3D classes with density for one or more N t domains, were subjected to 3D auto-refinement with D2 or C2 symmetry imposed as well as without imposing symmetry. Post-processing included automatic B-factor sharpening and detector modulation transfer function correction, and the gold-standard resolution was determined within a soft mask using a 0.143 FSC threshold. Local resolution was estimated using ResMap 75 . Representative micrographs and 2D class averages, FSC curves, and local resolution Table 1 Cryo-EM data collection and single-particle analysis statistics. data are presented in Supplementary Figs. 5 and 6. Supplementary Fig. 7 illustrates the cryo-EM density for 4′-PP and NADP + .
Model building. The crystal structure of the tetrameric C-terminal domain of ALDH1L1 (residues 405-902) in the apo form (PDB 2o2p 27 ) or in complex with NADP + (PDB 2o2q 27 ) and four instances of a homology model of the Int domain of rat FDH (residues 306-402) obtained using the SWISS-MODEL server 76 were fit into the corresponding cryo-EM density using USCF Chimera 77 . This was followed by one round of real-space refinement in PHENIX 78 and alternating rounds of model building in Coot 79 and restrained model refinement in Refmac 80 . Molprobity 81 was used to assess the quality of the atomic models. Map-model correlations were evaluated using phenix.mtriage 82 .
Estimation of occupancy of Int domains. A soft mask was prepared for each of the four Int domains by segmenting the symmetrical ALDH1L1 map in UCSF Chimera (Supplementary Figure 2). Before 3D classification, both apo-ALDH1L1 and ALDH1L1-NADP + maps were refined without symmetry imposed. 3D classification into 8-10 classes without particle alignment was then performed in Relion 3.0 for each Int domain separately using the final map low-pass filtered to 40 Å as the reference. The resulting 3D classes were examined visually, and total particle counts for classes with occupied and vacant Int domain binding sites were determined. Int domain occupancy was calculated as the fraction of the particles contributing to the classes representing occupied sites, averaged across the four sites within the tetramer.
Other methods. Protein structure similarity search was performed with the mTMalign server (35). Figures were prepared in UCSF Chimera, UCSF ChimeraX 83 , and Coot.
Statistics and reproducibility. LC/MS experiments were repeated four times. The cross-linked peptides were detected in all these experiments.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Cryo-EM maps of ligand-free ALDH1L1 and ALDH1L1 in complex with NADP + have been deposited to the EMDB with accession codes EMDB-24540 and EMDB-24547, respectively. Fitted coordinates have been deposited to the PDB with accession codes 7RLT and 7RLU, respectively. All other data are available from the corresponding authors upon request.