Introduction

Arabinose is distinct among the monosaccharides because it is found in nature in both the d- and l-enantiomers. l-Arabinose is a major component of plant cell walls as a component of pectin and is biosynthesized by pathways via UDP-d-glucose1. However, the biosynthetic pathway of d-arabinose is unique and has been identified only in several bacteria2,3,4,5. d-Arabinofuranosides (Arafs) are one of the predominant cell wall components of actinomycetes, such as Mycobacterium, Rhodococcus, and Nocardia species2. Regarding immune modulation and pathogenesis in humans, researchers have been studying the biosynthetic pathways and carbohydrate structures of the cell wall components of lipoarabinomannan (LAM) and arabinogalactan (AG) in acid-fast bacteria, including Mycobacterium tuberculosis, Mycobacterium leprae (the causative pathogen of Hansen’s disease or leprosy), and Mycolicibacterium smegmatis (formerly Mycobacterium smegmatis)6,7,8,9,10,11. While we have successfully synthesized the docosasaccharide d-arabinan motif12, the complex d-arabinan structures of LAM and AG, which comprise multiple branches and both α- and β-d-Araf bonds, have been challenging targets for total organic synthesis13,14,15.

Glycoside hydrolases (GH) that cleave α-l-Araf bonds have been extensively studied, and many families have been classified in the Carbohydrate-Active enZyme (CAZy) database16, including GH43, GH51, GH54, and GH62. However, among the classified GH families until recently, only the α-d-fructofuranosidase (Fruf-ase) of GH172 hydrolyzes the α-d-Araf bond, a recently reported side reaction for a similar glycosidic bond17. A bacterial endo-d-arabinanase that degrades mycobacterial LAM and AG was initially reported in the early 1970s18,19,20, and similar activity was later reported in Cellulomonas21,22 and My. smegmatis23,24. However, an endo-d-arabinanase-encoding gene has not yet been identified. Recently, exo- and endo-d-arabinanases were identified from Dysgonomonas gadei25.

The extracellular endo-d-arabinanase producer18, Aureobacterium sp. M-2 strain isolated by Kotani et al. is currently named Microbacterium arabinogalactanolyticum JCM 9171T (ref. 26). Here, we report the identification of the genes encoding enzymes involved in mycobacterial LAM and AG degradation, and that these enzymes degrade the d-arabinan region cooperatively and completely. Our analysis of endo-d-arabinases from Mi. arabinogalactanolyticum revealed that they belonged to a new GH family (GH183). Among the neighboring genes, we found an exo-α-d-arabinofuranosidase (exo-α-d-Araf-ase) belonging to GH172 and an exo-β-d-arabinofuranosidase (exo-β-d-Araf-ase) belonging to GH116. We examined the substrate specificity of these enzymes using mycobacterial LAM and AG and synthesized oligo-d-arabinofuranoside substrates. Furthermore, we elucidated the reaction processes and substrate specificities of endo-d-arabinanases, exo-α-d-Araf-ase, and exo-β-d-Araf-ase by determining their crystal structures.

Results

Identification and characterization of endo-d-arabinanases from Mi. arabinogalactanolyticum

My. smegmatis AG (MsAG) has been previously reported to induce secretion of endo-d-arabinanase by Mi. arabinogalactanolyticum (EndoMA)19. The endo-d-arabinanase-active fractions were separated and purified using two-column chromatography until a single band representing a native EndoMA (nEndoMA) protein remained (Figs. 1b and 2). My. smegmatis LAM (MsLAM) was degraded into lipomannan (LM), and its enzymatic activity was evaluated using silver periodic acid-Schiff (PAS)-stained SDS-PAGE. High-performance anion-exchange chromatography with pulsed amperometric detection (HPAEC-PAD) analysis indicated that nEndoMA exhibited time-dependent degradation of MsLAM and released soluble arabinooligosaccharides (Supplementary Fig. 1). nEndoMA was also active towards My. tuberculosis LAM (MtLAM) (Fig. 1a, c, d), which has a capping structure different from that of MsLAM6 (Fig. 2a). Our attempts to establish the internal amino acid sequence of the isolated protein using MS/MS analysis were unsuccessful. Therefore, the draft genome of Mi. arabinogalactanolyticum was utilized to identify candidate genes for endo-d-arabinanase (Supplementary Table 1) based on protein size and isoelectric point (pI) (Supplementary Fig. 2). From 3458 coding sequences, we identified three putative extracellular proteins with 300–400 amino acids and theoretical pIs of 4.3–4.6. We focused on MIAR_33220, which has a domain of unknown function27, DUF4185, because two candidates were probably non-enzymatic proteins (OmpA and PBP2). Homologous genes of MIAR_33220 were conserved in the genomes of My. smegmatis and Cellulomonas sp., from which endogenous endo-d-arabinanase activity has been reported22,23 (discussed below). Furthermore, DUF4185 proteins are also present in the order Corynebacteriales (i.e., Corynebacteria, Rhodococcus, Gordonia, and Nocardia), which have cell wall d-arabinan (https://pfam.xfam.org/family/DUF4185). Three additional DUF4185 genes, a putative ABC transporter, two putative GH genes (GH172 and GH116), and putative downstream metabolic enzymes were identified close to the MIAR_33220 gene (Fig. 1e). MIAR_33200 encodes a putative enzyme belonging to GH172, for which we previously found exo-α-d-Araf-ase activity17. Very recently, the CAZy database created a new GH183 family for DUF4185 enzymes, according to a report for d-arabinan-degrading enzymes from D. gadei25. The substrate-binding protein (SBP) component gene of the putative ABC transporter (MIAR_33310) belongs to SBP_bac_8 family (PF13416) in the InterPro (Pfam) database. SBP_bac_8 contains maltodextrin binding protein and many other SBPs of sugar transporters. In the Protein Data Bank, MIAR_33310 showed highest sequence similarity to a sugar transporter ATU4361 (PDB ID: 4QRZ, sequence identity = 26%) and cyclic alpha-maltosyl-1,6-maltose binding protein (PDB ID: 7BVT, sequence identity = 26%)28, suggesting that the putative ABC transporter (MIAR_33290–MIAR_33310) functions as a sugar importer.

Fig. 1: Endo-d-arabinanases identified from Mi. arabinogalactanolyticum.
figure 1

a A structural model of LAM from My. tuberculosis (MtLAM). The d-arabinan domain consists mainly of α-links, but there are β-links on the non-reducing terminal side. Mannose residues cap the β-linked d-Araf in MtLAM. LM, lipomannan domain. b SDS-PAGE of purified native endo-d-arabinanase from Mi. arabinogalactanolyticum (nEndoMA) and stained by Coomassie brilliant blue. M, protein molecular weight marker. c, d Degradation of MtLAM analyzed using SDS-PAGE with silver-PAS staining (c) and HPAEC-PAD (d). MtLAM was incubated with each enzyme in 50 mM sodium phosphate (pH 6.0) at 37 °C overnight. A1-A20 in (d) indicates substrate-released arabinooligosaccharides. Numbers represent the degree of polymerization. e The d-arabinan degradation PUL of Mi. arabinogalactanolyticum.

Fig. 2: Purification steps of native endo-d-arabinanase.
figure 2

a A structural model of LAM from My. smegmatis (MsLAM) which was used as the substrate for enzyme purification. b, c Chromatographies using Toyopearl HW55 (b) and DEAE-Toyopearl 650 (c) columns. Upper panel: Protein elution conditions. Middle: silver-PAS staining of the reaction product following SDS-PAGE (activity assay). Lower panel: silver staining of protein samples following SDS-PAGE. The crude, flow-through, and eluted fractions are shown. M: protein molecular weight marker.

Gene expression of the putative polysaccharide utilization locus (PUL) was investigated (Supplementary Fig. 3). Mi. arabinogalactanolyticum was cultured in an induction medium containing either Glc, d-Ara, or mycobacterial cell wall extracts (MCE) as the sole carbon source, and gene expression of the cells harvested at the middle exponential growth phase was measured by quantitative real-time PCR (qRT-PCR). The four DUF4185 (GH183) genes (MIAR_32220, MIAR_33230, MIAR_33270, and MIAR_33320) and a GH116 gene (MIAR_33170) were strongly induced by d-Ara and MCE (>10-fold increase compared with Glc condition). The gene for an SBP of the putative ABC transporter (MIAR_33310) was also significantly induced by d-Ara and MCE. Induction of the GH172 gene (MIAR_33200) was relatively mild, but it increased 3.5- and 6.0-fold by d-Ara and MCE, respectively, compared with the Glc condition. This result suggests that the gene cluster containing putative enzymes and an ABC transporter constitute a PUL for LAM and AG degradation. The d-arabinan-degradation PUL of Mi. arabinogalactanolyticum seems to be typical for gram-positive bacteria, which is different from that for gram-negative bacteria such as Bacteroides species29.

We heterologously expressed four DUF4185-containing genes that share only moderate amino acid sequence similarity (22–35%) (Supplementary Fig. 4). Although two proteins (MIAR_33220 and MIAR_33320) did not express as soluble proteins in Escherichia coli, pure recombinant proteins MIAR_33230 (EndoMA1) and MIAR_33270 (EndoMA2) were obtained (Supplementary Fig. 5a, b). Similar to that of the native enzyme, EndoMA1 and EndoMA2 showed degradation activity towards MtLAM and MsAG (Figs. 1c, d and 3). HPAEC-PAD analysis revealed that EndoMA1 released shorter fragments compared to EndoMA2 and nEndoMA. EndoMA1 and EndoMA2 showed optimum activity at neutral pH for the MtLAM substrate (Supplementary Fig. 5c, d). EndoMA1 had a higher optimal temperature compared with EndoMA2. We synthesized four compounds of d-arabinan substructures with an acetonide tag to investigate the substrate specificities of EndoMA1 and EndoMA2 (Fig. 4 and Supplementary Methods). A22BβT is the largest docosasaccharide (22-mer) that has Arabinan motif with Branches, β-linkage, and an acetonide Tag12. Linear nonasaccharides, branched octasaccharides, and branched pentasaccharides were designated as A9LT, A8BT, and A5BT, respectively. The degradation patterns of the synthetic substrates by EndoMA1 and EndoMA2 were analyzed using HPAEC-PAD and high-performance liquid chromatography (HPLC) after fluorescent reducing end-labeling (Supplementary Fig. 6). EndoMA1 produced smaller fragments than EndoMA2. For example, EndoMA1 could release d-arabinose by the further hydrolyzation of A8Bβ from A22BβT, A4L from A9LT, and A5B from A8BT. In contrast, these larger products persisted in the EndoMA2 digests, and little d-arabinose was released. These results suggested that EndoMA1 has a loose recognition for subsite +1. The products of A22BβT degradation by EndoMA1 were also analyzed using electrospray ionization time-of-flight mass spectrometry (ESI-TOF MS) (Supplementary Fig. 7). MS analysis detected trace levels of product fragments that were not detected by HPAEC-PAD and HPLC analysis. EndoMA1 could attack the central linear region of A22βT, but not the non-reducing end heptasaccharide containing the β-linkage (Fig. 4a). The α-methyl glycoside was synthesized from A5BT with EndoMA1 by transglycosylation activity in the presence of methanol (Fig. 5a, Supplementary Fig. 8, and Supplementary Table 2), demonstrating that DUF4185 (GH183) endo-d-arabinanases are anomer-retaining GHs.

Fig. 3: Degradation of mycobacterial arabinogalactan by the native and recombinant d-arabinan degrading enzymes.
figure 3

a Structural model of arabinogalactan from My. smegmatis (MsAG). b, c Analysis of arabinooligosaccharides released from the substrate. b TLC analysis. c HPAEC-PAD analysis. MsAG (0.50 mg/mL) in 50 mM sodium acetate (pH 6.0) was incubated with each enzyme (1.0 μg/mL) at 37 °C for 20 h. A1–A3 indicate arabinooligosaccharides with DPs 1–3.

Fig. 4: Synthetic oligo-d-arabinofuranoside substrates used in this study.
figure 4

a Branched docosasaccharide (A22BβT). ESI-TOF MS analysis revealed that the branched heptasaccharide at the non-reducing end (A7Bβ) is resistant to EndoMA1 (dotted box, Supplementary Fig. 7). b Linear nonasaccharide (A9LT). c Branched octasaccharide (A8BT). d Branched pentasaccharide (A5BT). These oligosaccharides are modified with an acetonide tag. The letters A, B, L, β, and T in the substrate names represent arabinose, branched, linear, β-Araf bond, and acetonide tag, respectively. The main cleavage sites for EndoMA1 and EndoMA2 analyzed by HPAEC-PAD are indicated on the left and right, respectively. Names of the degradation products with and without an acetonide tag are indicated below and above the arrows, respectively.

Fig. 5: TLC analysis of transglycosylation and substrate specificity of the D-arabinan degrading enzymes.
figure 5

a Transglycosylation activity of EndoMA1 in the presence of methanol. b ExoMA1 activity towards pNP-α-d-Araf and pNP-α-l-Araf. c Activity of GH172 enzymes (ExoMA1 and αFFase1) towards Me-α-d-Araf and Me-α-d-Fruf. d Activity of ExoMA2 towards pNP-β-d-Araf. e, f Transglycosylation activities of ExoMA1 (e) and ExoMA2 (f) in the presence of methanol. Detailed assay conditions are described in the “Methods” section.

Identification and characterization of exo-d-arabinofuranosidases

After the discovery of arabinooligosaccharide-releasing endo-d-arabinanases, we speculated that the putative d-arabinan degradation PUL contains exo-d-arabinofuranosidases for both the α- and β-bonds (Fig. 1e and Supplementary Fig. 4a). Therefore, we investigated two putative GH genes in the PUL. Recombinant MIAR_33200 (ExoMA1, GH172) and MIAR_33170 (ExoMA2, GH116) proteins were expressed in E. coli and purified (Supplementary Fig. 9a, b). Using pNP-α-d-Araf and pNP-β-d-Araf as substrates, ExoMA1 and ExoMA2 were identified as exo-α-d-Araf-ase and exo-β-d-Araf-ase, respectively (Fig. 5b, d). ExoMA1 had an optimal pH of 5.5 (Supplementary Fig. 9c) and was specific for the α-d-Araf bond (Supplementary Fig. 10a). The Km and kcat values of ExoMA1 at 37 °C were 2.55 mM and 44.4 s−1, respectively, (Supplementary Fig. 10b), which were comparable to those of GH172 αFFase1 (Km = 2.71 mM and kcat = 127.5 s−1)17. Compared with αFFase1, ExoMA1 showed higher specificity for the α-d-Araf bond and weaker α-d-Fruf-ase activity (Fig. 5c). As expected from the anomer-retaining GH172 protein, ExoMA1 exhibited transglycosylation activity (Fig. 5e). ExoMA2 was specific for the β-d-Araf bond and showed no activity against pNP-β-d-Glcp or other pNP-substrates (Supplementary Fig. 11a). ExoMA2 had an optimum pH of 6.0 for pNP-β-d-Araf (Supplementary Fig. 9d), with Km and kcat values of 2.73 mM and 12.0 s−1, respectively, at 37 °C (Supplementary Fig. 11b). ExoMA2 catalyzed transglycosylation of pNP-β-d-Araf with methanol producing Me-β-d-Araf (Fig. 5f and Supplementary Fig. 12a). Additionally, the stereochemistry of glycosidic bond hydrolysis was monitored using 1H NMR (Supplementary Fig. 12b). Under equilibrium conditions, pNP-β-d-Araf was hydrolyzed to d-arabinose as a mixture of α/β-furanoses and pyranoses via mutarotation. After 1 min of hydrolysis, a signal of H-1 of β-d-Araf appeared as the initial furanose without the appearance of the H-1 signal of α-d-Araf. In contrast, GH172 αFFase1 (anomer-retaining α-d-Araf-ase)17 initially produced α-d-Araf. This finding indicated that ExoMA2 is an anomer-retaining GH. We then examined the synergistic action of EndoMA1, ExoMA1, and ExoMA2 on the natural and synthetic d-arabinan polysaccharides MsLAM and A22BβT (Supplementary Fig. 13). ExoMA1 hydrolyzed the terminal α-d-Araf structures of arabinooligosaccharides released by EndoMA1 and EndoMA2 but did not act on MsLAM, A22BβT, and A7Bβ. ExoMA2 hydrolyzed the terminal β-d-Araf structures of MsLAM, A22BβT, and A7Bβ. In summary, ExoMA1 and ExoMA2 synergistically and completely degraded natural and synthetic d-arabinan substrates into monomers when combined with EndoMA1.

Crystal and solution structures of EndoMA1

The crystal structures of EndoMA1 were determined at resolutions of 1.60 and 1.80 Å for the apo and ligand complex forms, respectively (Supplementary Table 3). A complex-form crystal was prepared using a catalytic residue mutant (D51N) for the co-crystallization of A9LT. We focused on the complex structure because the apo and complex structures are nearly identical (Cα root mean square deviation (RMSD) < 0.27 Å for all chain pairs). In the catalytic domain, the active site on the front side of a 5-bladed β-propeller fold contained a linear tetrasaccharide with an acetonide tag (A4LT) (Fig. 6a and Supplementary Fig. 14a). Unexpectedly, a linear tetrasaccharide without the tag (A4L) bound to a putative carbohydrate-binding module (CBM) domain that adopted a β-sandwich fold (Supplementary Fig. 14b) was revealed. The EndoMA1 crystal contains four chains (A-D) in the asymmetric unit, and the A4L molecules on the putative CBM domain were only observed at the interface of chains A and C (Supplementary Fig. 14c). The PISA server30 predicted that the biological assembly of the protein was a dimer of A-B or C-D chains.

Fig. 6: Three-dimensional structures of EndoMA1.
figure 6

ac The crystal structure in complex with linear oligo-d-arabinofuranosides. In EndoMA1 co-crystallized with A9LT, A4LT (tetrasaccharide with acetonide tag) was observed in the active site, whereas A4L (tetrasaccharide without acetonide tag) was observed in the putative CBM domain. a Overall structure. A ribbon model (left) and the molecular surface (right) are shown. The catalytic residues are represented in magenta. b The active site with bound A4LT. c The binding site of A4L within the putative CBM domain. d, e SAXS analysis. d Superimposition of the theoretical (red) and experimental (black) SAXS profiles. The theoretical profile was calculated from the crystal structure in the apo form. The residuals of both profiles are drawn at the bottom of the graph. e The outer shape of the bead model for the solution structure is superimposed on the dimeric crystal structure.

Due to elution retardation, size-exclusion chromatography (SEC) using a Superdex column, a dextran-agarose matrix, could not accurately determine the molecular mass of EndoMA1 (Supplementary Fig. 15). Therefore, we performed experiments of multi-angle static light scattering and refractive index combined with SEC (SEC-MALS/RI) and small-angle X-ray scattering combined with SEC (SEC-SAXS) on EndoMA1 to clarify its quaternary structure in solution. The molar masses estimated by SEC-MALS/RI and SEC-SAXS were 105,200 Da and 110,914 Da, respectively (Supplementary Fig. 16a and Supplementary Data 1). The molar mass estimates of monomeric EndoMA1 calculated by the amino acid sequence and measured by SDS-PAGE were 53,678 Da and 52.4 kDa (Supplementary Figs. 5a and 15b), respectively, suggesting that EndoMA1 formed a dimer in solution. The experimental SAXS profile (Fig. 6d, black open circle) and theoretical SAXS profile calculated from the dimeric crystal structure (Fig. 6d, red line) were in good agreement. The Rg obtained from the Guinier analysis and the pair distance distribution function (PDDF) were 31.2 ± 0.1 and 31.1 ± 0.1 Å, respectively, almost consistent with that calculated from the crystal structure (31.0 Å) (Supplementary Fig. 16b–d). The maximum dimension of the molecule (Dmax) derived from the PDDF was 104 Å, which was also close to that of the crystal structure (98 Å). The normalized Kratky plot in Supplementary Fig. 16e shows a bell-shaped peak with a height of 3 e−1 for QRg = √3, along with a shoulder peak around QRg = 4. The profile in the high-Q region is also asymptotic to zero with gradual oscillation, suggesting that EndoMA1 is a globular protein consisting of two domains31. The outer shape of the bead model for the solution structure obtained by ab initio modeling is displayed in Fig. 6e and superimposed on the high-resolution crystal structure. These two structural models and their respective SAXS profiles shown in Fig. 6d and Supplementary Fig. 16f were almost identical. These results revealed that EndoMA1 forms a dimer in solution.

In the crystal structure, the active site cleft of the catalytic domain of EndoMA1 adopts the shape of a typical endo-type enzyme (Fig. 6a, right). The bound A4LT molecule in the complex crystal form spans subsites −3 to +1, and the arabinofuranose ring at subsite −1 adopts a 3E conformation to hold the α-anomeric scissile bond in the pseudoaxial position (Fig. 6b and Supplementary Fig. 14a). The side-chain oxygen of Asp33 is located 2.8 Å from the anomeric C1 atom, making it suitable for in-line nucleophilic attack. Asn51(Asp) forms a hydrogen bond with the glycosidic bond oxygen between subsites −1 and +1, suggesting that it is the catalytic acid/base residue. Pro141, Arg198, and Leu292 form hydrogen bonds with A4LT, while Tyr293 and Trp138 form stacking interactions above and below the cleft. Glu243 is the third Asp/Glu residue in the active site cleft and supports the side chain position of Arg198, which plays a pivotal role in holding sugars at subsites −3 and −2 (Supplementary Fig. 17a). The activity of the D33N, D51N, E243A, and E243Q mutants toward A22BβT was nearly abolished (Supplementary Fig. 17b), confirming that all three Asp/Glu residues are catalytically essential.

The A4L molecule is bound to the shallow surface of the putative CBM domain (Fig. 6a, right). Among the four subsites identified in the crystal structure (A-D), subsite A has the strongest interactions (Fig. 6c). The arabinose unit at subsite A is sandwiched between the aromatic side chains of Phe379 and Trp468, and hydrogen bonds from Asp365, Arg381, and Gln470 are formed. Asn361, Asn383, and Asp466 form hydrogen bonds with the sugars in subsites B, C, and D, and Tyr382 forms a stacking platform for subsites C and D. Asn366 from the neighboring molecule (chain C) is involved in the interaction with A4L, but this seems to be a crystal packing artifact (Supplementary Fig. 14c).

A Dali structural search showed that the catalytic domain of EndoMA1 was similar to the hypothetical protein BACOVA_04882, N-acetylgalactosamine deacetylase, and GH43 α-l-Araf-ases (Supplementary Table 4). A structural comparison with a GH43 α-l-Araf-ase from Cellvibrio japonicus32 indicated that the nucleophile of EndoMA1 (Asp33) is located in the same topological position as the catalytic base of the inverting GH43 enzyme (Asp41), whereas the acid/base (Asp51) is differently located from the catalytic acid (Glu215) of GH43 (Supplementary Fig. 18a, b). The putative CBM domain is structurally similar to CBM4-2 in GH10 xylanase33 and CBM61 in GH31 α-1,6-glucosyltransferase34 (Supplementary Table 4). The d-arabinan binding site of EndoMA1 is not on the concave side of the β-sandwich fold, where the xylan binding site of CBM4-2 is located (Supplementary Fig. 18c, d). CBM61 has dual binding sites for maltooligosaccharides, one of which (the B61-2 site) corresponds to the putative CBM binding site in EndoMA1. Currently, we lack sufficient evidence to classify this domain as a new CBM. The available data only includes the crystal structure prepared under a high ligand concentration condition, which may have caused potential crystal packing artifacts. Further study is necessary to provide biochemical evidence and to establish this domain as a new CBM family.

Crystal structures of ExoMA1 and ExoMA2

The crystal structure of ExoMA1 was determined at 2.42 Å resolution in the apo form (Supplementary Table 5). ExoMA1 assembles four trimers to form a dodecamer (Fig. 7a). A SEC experiment suggested that more than heptamer were assembled in solution (Supplementary Fig. 19a). ExoMA1 monomer has a double jelly-roll fold, which was found only in GH172 among over 160 GH families (Fig. 7b)17,35. The active site is located at the interface between the two subunits of the basal trimer. One phosphate molecule per monomer was bound to the interface of ExoMA1, which is responsible for dodecamer assembly (Supplementary Fig. 19b). The dodecamer assembly of ExoMA1 was also visible in a negatively stained transmission electron micrograph (Supplementary Fig. 19c). Other GH172 members (αFFase1 and a hypothetical protein BACUNI_000161 from Bacteroides uniformis) had been shown to form distinct quaternary structures depending on the mode of contact of the basal trimer (Supplementary Fig. 20). The dodecameric structure of ExoMA1 is formed by the interactions of four loops above the double jelly-roll fold, whereas the C-terminal helix is mainly involved in the hexamer (dimer of trimers) interactions in αFFase1 and BACUNI_000161.

Fig. 7: Crystal structure of ExoMA1.
figure 7

a Overall structure of the dodecamer in the asymmetric unit. b Monomer structure. One protomer within the trimer unit is depicted in rainbow color. c, d The active sites of ExoMA1 (c) and GH172 αFFase1 (d). A ribbon model (upper panel) and the molecular surface (lower panel) are shown. In (c), the β-d-Araf molecule bound to αFFase1 is superimposed on the ExoMA1 structure and depicted as thin yellow sticks. In (d), β-d-Fruf and β-d-Araf molecules bound to αFFase1 are superimposed and represented by yellow and orange sticks, respectively. Residues from a neighboring protomer are labeled with prime.

Due to the specificity of ExoMA1 and αFFase1 for α-d-Araf and α-d-Fruf, respectively (Fig. 5c), their active sites were compared. The anomer-inverted monosaccharides β-d-Araf and β-d-Fruf have been bound in the active site of α-d-Fruf-ase αFFase1 (Fig. 7d). The β-anomer side of the active site of αFFase1 can accommodate the 1-hydroxymethyl group of Fruf because of the presence of the small amino acid Gly299. The corresponding residue in ExoMA1 is Asn253 (Fig. 7c), and its side chain blocks α-d-Fruf binding, making the enzyme specific for α-d-Araf.

The crystal structures of ExoMA2 were determined at 1.75 and 1.35 Å resolutions for the complex forms with Tris (buffer molecule) and β-d-Araf, respectively (Supplementary Table 5 and Supplementary Fig. 21a, b). The two structures are almost identical (Cα RMSD < 0.19 Å for all chain pairs), and we will mainly describe the complex structure with β-d-Araf. A dimer is present in the asymmetric unit of the ExoMA2 crystal, consistent with the SEC measurement in solution and the PISA server prediction (Supplementary Fig. 21c, d). ExoMA2 has a two-domain structure consisting of N-terminal β-sandwich and C-terminal (α/α)6 barrel domains, the latter being the catalytic domain (Fig. 8a). The two-domain structure resembles that of GH116 β-glucosidase from Thermoanaerobacterium xylanolyticum (TxGH116, Dali Z score = 42.4 and Cα RMSD = 2.5 Å, Fig. 8b)36, whereas the amino acid sequence identity is very low (21%). Compared with TxGH116, ExoMA2 has an additional long loop and a small β-sandwich domain in the N-terminal and C-terminal domains, respectively. The β-d-Araf molecule is bound to subsite −1 of ExoMA2, and the catalytic nucleophile (Glu431) and acid/base (D557) residues are in the appropriate positions for anomer-retaining hydrolysis (Fig. 8c). All hydroxy groups of β-d-Araf are hydrogen-bonded to the amino acid side chains of the protein, explaining the strict substrate specificity of this enzyme. When compared to GH116 β-glucosidase, the two catalytic residues, and several residues are conserved, but substrate recognition at subsite −1 of ExoMA2 is different. Specifically, residues corresponding to Arg483 and His446 in ExoMA2 are His507 and Asp452 in β-glucosidase TxGH116 (Fig. 8d), which is the key difference in recognition of the O2 hydroxy group of Araf and the O3 hydroxy group of glucopyranose.

Fig. 8: Crystal structures of ExoMA2 and GH116 β-glucosidase.
figure 8

a, b Overall structures of ExoMA2 (a) and GH116 β-glucosidase TxGH116 from Thermoanaerobacterium xylanolyticum (b). The N-terminal β-sandwich and C-terminal catalytic (α/α)6 barrel domains are represented by a rainbow-colored ribbon model, with the N-terminus in blue and the C-terminus in red. c, d The active sites of ExoMA2 (green) complexed with β-d-Araf (c) and TxGH116 (cyan) complexed with β-d-Glcp (d). Magenta and yellow sticks, respectively, represent the catalytic residues and the ligands. Hydrogen bonds are shown as yellow dotted lines, and the distance between the anomeric carbon of the ligand and the nucleophile is represented as a cyan dotted line.

Discussion

In this study, we identified the genes encoding DUF4185 (GH183) endo-d-arabinanases in Mi. arabinogalactanolyticum. GH183 family in the CAZy database currently lists 2569 ORFs, all of which have the DUF4185 domain (http://www.cazy.org/GH183.html). DUF4185 (PF13810) in Pfam (InterPro) database currently includes approximately 5,000 members and is distributed among 2225 species of bacteria and fungi. The majority of these 1476 species belong to the phylum Actinobacteria, which includes Microbacterium, Mycobacterium, Corynebacterium, Rhodococcus, Nocardia, and Cellulomonas. Not only Mycobacterium species but also other bacteria belonging to the order Corynebacteriales (i.e., Corynebacteria, Rhodococcus, Gordonia, and Nocardia) have d-arabinan in their cell walls2,37. Interestingly, My. tuberculosis H37Rv, and My. smegmatis MC2 155 have two and five DUF4185 genes in their genome, respectively38,39. Therefore, DUF4815 genes are possibly responsible for producing the endogenous endo-d-arabinanase in My. smegmatis23,40. In addition to the previously described endo-d-arabinanase21,22, Cellulomonas sp. 73–145 also encodes a gene set of d-arabinan degradative enzymes (DUF4185, GH172, and GH116 candidates) (assembly ASM189818v1, Fig. 9). Putative d-arabinan degradation gene clusters were also present in other bacteria in the phylum Actinobacteria. Recently, Al-Jourani et al. reported GH172 exo-α-d-Araf-ases (DgGH172a and DgGH172c) and DUF4185 (GH183) endo-d-arabinanases (DgGH4185a and DgGH4185b) from Dysgonomonas gadei ATCC BAA-286, which belongs to the phylum Bacteroidota25. While they did not find an exo-β-d-Araf-ase, the D. gadei gene cluster does contain a putative GH116 exo-β-d-Araf-ase gene (HMPREF9455_02468).

Fig. 9: Gene clusters of d-arabinan-degrading enzymes in bacteria.
figure 9

a d-Arabinan-degrading PUL from Mi. arabinogalactanolyticum and related gene clusters in other bacteria. b Scheme for degradation of the d-arabinan structures of LAM and AG to d-arabinose, which is then converted to d-ribulose-5-phosphate and metabolized further via the pentose phosphate pathway and Calvin cycle.

A phylogenetic analysis of DUF4185 proteins in the genomes of Mi. arabinogalactanolyticum, mycobacteria and other species are shown in Supplementary Fig. 22. EndoMA1 and EndoMA2 are located in different clades, along with homologs from My. tuberculosis and My. smegmatis, suggesting that members of this clade possess endo-d-arabinanase activity. The C-terminal putative CBM domain in EndoMA1 is conserved only in its closest homologs. Two endo-d-arabinanases from D. gadei (DGGH4185a and DGGH4185b) are present in the same clade as EndoMA2. MIAR_33320 and MIAR_33220, which are not characterized in this study, were located near the DUF4185 proteins in Cellulomonas species. The enzymatic function of MIAR_33220 is elusive because its homolog is not present in the putative d-arabinan degradation PULs of other Microbacterium species (Fig. 9 and Supplementary Fig. 22).

Phylogenetic analysis of ExoMA1 and other GH172 enzymes is shown in Supplementary Fig. 23. The distinct substrate preference for α-d-Araf- and α-d-Fruf-bonds suggests that ExoMA1 belongs to a clade distinct from αFFase1. DgGH172a has α-d-Araf-ase activity25 and is placed in the same clade as ExoMA1. Although DgGH172c is highly similar to αFFase1 (α-d-Fruf-ase), there are no reports of α-d-Fruf-ase activity of D. gadei enzymes. A phylogenetic analysis of ExoMA2 and related GH116 enzymes is shown in Supplementary Fig. 24 according to the subfamily classification proposed by Ferrara et al.41. Most GH116 members are currently characterized as β-glucosidase (EC 3.2.1.21) and acid β-glucosidase/β-glucosylceramidase (EC 3.2.1.45) in bacteria and eukaryotes, including humans, mice, and Arabidopsis, and they are classified in subfamily 1. The archeon Saccharolobus solfataricus has two bifunctional enzymes: β-glucosidase/β-N-acetylglucosaminidase (EC 3.2.1.52) SSO3039 in subfamily 2 (ref. 41) and β-glucosidase/β-xylosidase (EC 3.2.1.37) SSO1353 in subfamily 3 (ref. 42). Ferrara et al. classified bacterial homologs of ExoMA2 in the large subfamily 2, which also included many uncharacterized protein sequences41. However, in the phylogenetic tree of the current study (Supplementary Fig. 24), the large clade, including ExoMA2, is separated from the archaeal enzyme clade. Therefore, we designate the large clade including ExoMA2 as a new subfamily 4. Amino acid sequence alignment indicated that all residues that recognize the substrate are conserved in a subfamily 4 member from Thermobaculum terrenum ATCC BAA-798 (Tter_0211, GenBank accession number: ACZ41133.1) (Supplementary Fig. 25). Furthermore, all GH116 candidates in the putative d-arabinan-degrading gene cluster belong to the same subfamily 4, suggesting exo-β-d-Araf-ase activity. While the nucleophile residues of subfamilies 1 and 4 did not align with the amino acid sequence alignment due to the very low homology in this region, the catalytic residues of ExoMA2 and TxGH116 are in the same location on the three-dimensional structure (Fig. 8).

We propose a d-arabinan-degrading pathway for Mi. arabinogalactanolyticum based on its gene cluster and four characterized enzymes (Fig. 10). EndoMA1 and EndoMA2, two new GH family endo-d-arabinanases with signal sequences, depolymerize mycobacterial lipoarabinomannan and arabinogalactan extracellularly. The putative ABC transporter in PUL (Fig. 1e) possibly imports arabinooligosaccharides. The two exo-enzymes (GH172 ExoMA1 and GH116 ExoMA2) without a signal sequence further degrade arabinooligosaccharides into monosaccharides intracellularly. Furthermore, the PUL contains a putative isomerase and kinase for d-arabinose metabolism (Fig. 9b).

Fig. 10: Schematic drawing of the degradation pathway of d-arabinan by Mi. arabinogalactanolyticum.
figure 10

The two extracellular endo-acting enzymes (EndoMA1 and EndoMA2) depolymerize mycobacterial lipoarabinomannan and arabinogalactan, and the ABC transporter presumably imports arabinooligosaccharides. The two intracellular exo-acting enzymes (ExoMA1 and ExoMA2) further degrade arabinooligosaccharides into d-arabinose monosaccharides, which are metabolized by an isomerase and a kinase.

We also determined the crystal structures of the three enzymes. The substrate-complex structure of EndoMA1 clarified the substrate recognition and catalytic mechanism of a novel GH family (GH183) endo-d-arabinanase. ExoMA1 has a unique dodecameric structure compared to the various quaternary structures of the GH172 enzymes35. The dodecameric quaternary structure of ExoMA1 is distinct from the hexameric assembly of other GH172 members: αFFase1, BACUNI_000161, and DgGH172c from D. gadei (ref. 25). The discovery of ExoMA2 will lead to the establishment of a new subfamily of exo-β-d-Araf-ase in GH116, and its crystal structure with β-d-Araf revealed substrate recognition distinct from GH116 β-glucosidases.

The presence of DUF4185 (GH183) homologs in mycobacteria suggests the importance of degradative enzymes in the remodeling and recycling of cell wall polysaccharides. The antimycobacterial drug ethambutol, which targets the biosynthesis of d-arabinan, has been widely studied as a potential therapy for tuberculosis and other mycobacterial infections43. Data from our genetic, functional, and structural studies on d-arabinan-degrading enzymes will contribute to developing enzymatic tools for the structural and functional analysis of LAM and AG remodeling and recycling in mycobacterial cell walls.

Methods

Materials

MtLAM from My. tuberculosis Aoyama-B was purchased from Nacalai Tesque, Inc. (Kyoto, Japan). MsLAM from My. smegmatis ATCC 700084 was prepared as previously described44. AG from My. smegmatis ATCC 700084 was prepared using the methods outlined in the previous study20. A22BβT was synthesized as reported previously12. The synthesis and schemes of other oligosaccharides and NMR spectra are described in the Supplementary Methods and Supplementary Data 2.

Purification of native endo-d-arabinanase from Mi. arabinogalactanolyticum

Crude endo-d-arabinanase was produced from Mi. arabinogalactanolyticum JCM9171 by inducing with mycobacterial cell wall extracts, as described previously19. Culture supernatant was made up to 1.0 M ammonium sulfate and loaded to a Toyopearl HW55 column (12 × 120 mm; Tosoh Corp., Tokyo, Japan) which equilibrated with 1.0 M ammonium sulfate in 50 mM acetate buffer (pH 6.0) as described previously45. Step and linear gradients of 1.0, 0.8, 0.6, 0.4 M, 0.4-0 M ammonium sulfate in 50 mM acetate buffer (pH 6.0) were used for elution. Each fraction (43 μl) was mixed with 7 μl MsLAM (4 mg/mL) and 50 μL McIlvain buffer (pH 5.0) and incubated at 37 °C for 16 h. Degradation of LAM to LM was analyzed by SDS-PAGE stained with silver-PAS, as described below. The active fractions were combined at the concentration of 0.4–0.2 M ammonium sulfate and then applied to a DEAE Toyopearl 650 M column (18 × 160 mm; Tosoh Corp.). The column was washed with 1.0 M NaCl and equilibrated with 50 mM acetate buffer (pH 6.0). The protein was eluted with a linear gradient of 0–15% n-propanol in 50 mM acetate buffer (pH 6.0), followed by 15% n-propanol in 50 mM acetate buffer (pH 6.0). The active fractions were combined and lyophilized. The molecular mass and the isoelectric point of the purified protein were analyzed using SDS-PAGE and two-dimensional electrophoresis using Auto 2D BM-100 (SHARP Corp., Osaka, Japan).

Draft genome sequencing

Mi. arabinogalactanolyticum JCM 9171 was grown under aerobic conditions in NBRC 802 medium, and genomic DNA was extracted using the FastPure DNA kit (Takara Bio Inc., Shiga, Japan). The genomic library was prepared using the TruSeq DNA Sample Prep Kit and sequenced using the Illumina HiSeq2000 platform at Hokkaido System Science Co., Ltd. (Hokkaido, Japan). Thirteen contigs were assembled using Velvet (version 1.2.08).

Quantitative real-time PCR

Mi. arabinogalactanolyticum JCM 9171 was cultured at 30 °C under aerobic conditions on endo-d-arabinanase induction medium prepared with 0.2% casamino acid, 0.005% K2HPO4, and 0.005% MgSO4 7H2O containing either 0.02% mycobacterial cell wall extracts (MCE), d-arabinose, or glucose. Precipitates of 2 mL culture solutions were harvested at the middle exponential growth phase and stored at −80 °C. The frozen cells were disrupted using a Multi-Beads Shocker MB2200 (Yasui Kikai, Osaka, Japan) operated at 2,000 rpm for 10 s, and were homogenized again using a Multi-Beads Shocker (2,000 rpm, 10 s) after the addition of 1 mL RNAiso Plus (Takara Bio Inc.). The upper phase of the chloroform-extracted samples was purified using an SV Total RNA Isolation System (Promega, Madison, WI). After the removal of DNA contamination by Deoxyribonuclease (RT Grade) for Heat Stop (Nippon Gene, Tokyo, Japan), the total RNA was reverse-transcribed with ReverTra Ace qPCR RT Master Mix (TOYOBO, Osaka, Japan) to synthesize cDNA. Quantitative real-time PCR (qRT-PCR) was performed with Luna Universal qPCR Master Mix (New England BioLabs, MA, USA) using StepOnePlus and StepOne Software version 2.3 (Applied Biosystems, Foster City, CA, USA). Sequences of the specific primer sets used in this PCR were listed in Supplementary Data 3. The relative expression levels of the genes were calculated by ΔΔCt method normalized by RNA polymerase β subunit (rpoB; MIAR_19880).

Expression and purification of recombinant enzymes

Genes of MIAR_33170, MIAR_33200, MIAR_33220, MIAR_33230, MIAR_33270, and MIAR_33320 were amplified from genomic DNA using PrimeSTAR HS DNA Polymerase (Takara Bio Inc.). The primers were designed to amplify the entire gene without a putative signal sequence. Primers used for gene cloning are listed in Supplementary Data 3. The amplicons were cloned into the pET-23d vector (Novagen, Madison, WI, USA) using an In-Fusion HD cloning kit (Clontech Laboratories Inc., Palo Alto, CA, USA), and a C-terminal His-tag was attached. A SKIK peptide tag was fused to the N-terminus of ExoMA2 to facilitate its expression46. E. coli BL21 (DE3) harboring the plasmids were then grown at 25−37 °C using the Overnight Express Autoinduction System (Novagen). After centrifugation, the pellets from the cell culture were resuspended in BugBuster protein extraction reagent (Novagen). The recombinant proteins used for biochemical assays were purified by immobilized metal affinity chromatography (IMAC) using TALON Metal Affinity Resin (Clontech Laboratories Inc.) and desalted using a dialysis membrane.

Enzyme assays

MtLAM, MsLAM, and MsAG were used as substrates for d-arabinan-degrading enzymes. The reaction products were separated using Amicon Ultra YM-10 (10 kDa cut-off, Merck Millipore, Burlington, MA, USA). The oversized fraction was loaded onto 15% polyacrylamide gel, and LAM and LM were visualized by silver-PAS staining using Sil-Best Stain-Neo (Nacalai Tesque Inc.)47. The filtrate was used to analyze oligosaccharides liberated by HPAEC-PAD and TLC. A CarboPac PA-1 column (4 mm internal diameter × 250 mm; Dionex Corp., Sunnyvale, CA, USA) was used for HPAEC-PAD. The fractions were eluted at a flow rate of 1.0 mL/min using the following gradient: 0–5 min, 100% eluent A (0.1 M NaOH); 5–45 min, 0–80% eluent B (0.5 M sodium acetate and 0.1 M NaOH); and 45–50 min, 100 % eluent B. For TLC analysis, the reaction products were spotted on a silica gel 60 aluminum plate (Merck Millipore) using a 7:1:2 (v/v/v) n-propanol/ethanol/water solvent mixture. The sugars were visualized by spraying the orcinol-sulfate reagent (10:1 mixture of 1% FeCl3 in 10% H2SO4 and 6% ethanolic orcinol) on the plate. The plate was then dried and heated at 130 °C for approximately 1 min. Diphenyl amine-aniline-phosphoric acid reagent (1 mL of 37.5% HCl, 2 ml aniline, 10 mL of 85% H3PO3, 100 mL ethyl acetate, and 2 g diphenylamine) was used for the detection of pNP-β-d-GlcNAc hydrolysate.

The reducing power of the reaction products from the enzymatic reaction of EndoMA1 and EndoMA2 was quantified at various temperatures and pH values using MtLAM as the substrate. Under standard assay conditions, 0.50 mg/mL MtLAM in 50 mM sodium phosphate (pH 6.5) was incubated with the enzyme at 37 °C for 20 min. After mixing, the solution with an equal volume of a bicinchoninic acid solution48, the mixture was heated at 95 °C for 15 min. After cooling for 5 min at room temperature, absorbance was measured at 560 nm. The pH profiles were measured in 50 mM sodium acetate (pH 3.5–6.0) or 50 mM sodium phosphate (pH 6.0–8.0).

The substrate specificities of EndoMA1 and EndoMA2 were analyzed by HPAEC-PAD using synthesized substrates with an acetonide tag. Enzymatic hydrolysis results in the formation of reducing end products, which were analyzed by HPLC using the p-aminobenzoic ethyl ester (ABEE) labeling49. The ABEE derivatives were separated using a Cosmosil SugarD column with a mobile phase of CH3CN/water (70/30, v/v) at a constant flow rate (1.0 m/min) at 30 °C. The elution was monitored using a fluorescence detector (FP-202, JASCO) with excitation and emission wavelengths of 305 and 360 nm, respectively.

Transglycosylation reactions of EndoMA1 were performed using A5BT as the donor and methanol as the acceptor. A5BT (0.5 mg/mL) was incubated at 50 °C for 4 h with EndoMA1 (12 μg/mL) in 50 mM sodium phosphate buffer (pH 6.5) and 10–30% methanol. Subsequently, the reaction products were analyzed using TLC, as described above. For structural analysis, the transglycosylation product from the reaction in 30% methanol was purified using activated carbon (Autoprep Fiber; Resonac Corp., Tokyo, Japan).

The substrate specificities of ExoMA1 and ExoMA2 toward pNP-substrates were analyzed as follows. For ExoMA1, each pNP-substrate (5 mM) was incubated in the presence or absence of the enzyme in 50 mM sodium acetate (pH 5.5) at 37 °C overnight (22 h). For ExoMA2, each pNP-substrate (5 mM) was incubated in the presence or absence of the enzyme in 50 mM sodium phosphate (pH 6.0) at 37 °C overnight. The reaction products were analyzed by TLC as described above.

ExoMA1 and ExoMA2 were subjected to transglycosylation reactions using pNP-α-d-Araf and pNP-β-d-Araf as donors, respectively, and methanol as the acceptor. For ExoMA1, 5 mM pNP-α-d-Araf was incubated with the enzyme (40 μg/mL) and 10–30% methanol in 50 mM sodium acetate (pH 5.5) at 37 °C overnight. For ExoMA2, 5 mM pNP-β-d-Araf was incubated with the enzyme (13.6 μg/mL) and 10–30% methanol in 50 mM sodium phosphate buffer (pH 6.0) at 37 °C overnight. Subsequently, the reaction products were analyzed using TLC. The transglycosylation product from the reaction in 10% methanol was purified using activated carbon (Autoprep Fiber) and then subjected to structural analysis.

The pH and temperature profiles and kinetic parameters of ExoMA1 and ExoMA2 were measured by quantifying the released pNP using a UV-visible spectrometer. The absorbance at 400 nm was measured after the assay solution (25 μL) was mixed with 120 mM Na2CO3 (30 μL) to stop the reaction. The standard assay conditions for ExoMA1 were as follows: 1 mM pNP-α-d-Araf as the substrate in 50 mM sodium acetate (pH 5.5) at 37 °C. The standard assay conditions for ExoMA2 were as follows: 4 mM pNP-β-d-Araf as the substrate in 50 mM sodium phosphate (pH 6.0) at 37 °C. The pH profiles were measured in 50 mM sodium acetate (pH 4.0–6.0) or 50 mM sodium phosphate (pH 6.0–8.0).

NMR and ESI-TOF MS analysis

The transglycosylation products, Me-α-A4B for EndoMA1 and Me-β-d-Araf for ExoMA2, were analyzed using NMR on an ECX400 spectrometer (JEOL) at 400 MHz in D2O. NMR analysis of the transglycosylation products Me-α-d-Araf has been previously reported17. Positive ion mode ESI-TOF MS (microTOF II, Berucker Daltonics) analysis was performed on a mixture of crude products resulting from the hydrolysis process of Araf22BT with EndoMA1. The hydrolysis reaction between ExoMA2 and 20 mM pNP-β-d-Araf in D2O was monitored using NMR. A portion (650 mL of a 20 mM solution in phosphate buffer at pH 6.0 exchanged to D2O from H2O via lyophilization) of the substrate pNP-β-d-Araf was mixed with 8.2 mL of the enzyme solution (531 μg/mL in D2O). 1H NMR spectra of the reaction mixtures were recorded at 37 °C using an ECX400 spectrometer (JEOL) operating at 400 MHz. 1H NMR spectra of authentic samples, such as the initial pNP-β-d-Araf and d-arabinose, reached equilibrium and were obtained without adding the enzyme.

Crystallography

Protein samples of EndoMA1 (WT and D51N mutant), ExoMA1, and ExoMA2 used for crystallization experiments were expressed in E. coli and purified using IMAC, anion exchange chromatography, and SEC columns. Details of this procedure are described in the Supplementary Methods section. All proteins were crystallized using the sitting-drop vapor-diffusion method by mixing the protein and reservoir solutions (typically 0.5 μL each). The EndoMA1 D51N mutant co-crystallized with A9LT was prepared using the microseeding method by mixing reservoir, protein (32 mg/mL EndoMA1 and 10 mM A9LT), and seed solutions at a ratio of 6:4:2 and grown at 30 °C. The seed solution was prepared by crushing EndoMA1 crystals with Seed Bead (Hampton Research, Aliso Viejo, CA, USA) and diluting them 1000-fold with the reservoir solution. Crystals of the other proteins were grown at 20 °C without the microseeding method. The conditions of the protein solution, reservoir solution, and cryoprotectant are listed in Supplementary Tables 3 and 5. The crystals were cryo-cooled by dipping them in liquid nitrogen. X-ray diffraction data were collected at 100 K on the beamlines of the Photon Factory of the High Energy Accelerator Research Organization (KEK, Tsukuba, Japan). Preliminary diffraction data were collected at SPring-8 (Hyogo, Japan). The software and servers used for protein crystallography and analysis are described in Supplementary Methods.

Mutant analysis

Site-directed mutants of EndoMA1 were constructed using PrimeSTAR Max DNA Polymerase (Takara Bio Inc.) and the primers listed in Supplementary Data 3. The mutant proteins were prepared, purified, and assayed using the same procedure as for WT EndoMA1 (recombinant protein), as described above.

SEC-MALS/RI and SEC-SAXS of EndoMA1

SEC-MALS/RI was performed using a DAWN HELEOS II (Wyatt Technology) and an HPLC system (Alliance 2695, Waters)50 equipped with a 2414 online differential refractometer (Waters). A KW403-4F column (Shodex) was used for SEC-MALS/RI and SEC-SAXS analyses. The measured data were analyzed using the ASTRA 6.1 software (Wyatt Technology). The SEC-SAXS experiment was conducted at the beamline BL-10C of the Photon Factory (Tsukuba, Japan)51. The SEC of SEC-SAXS was performed using the same column and buffer set up for SEC-MALS/RI on an HPLC system, Prominence-i (SHIMADZU). Data reduction processes were performed using the SAngler software52, and the whole SAXS and UV-visible absorption data were analyzed using the software MOLASS53. The EndoMA1 apo structure was used as the reference crystal structure. The experimental conditions and analysis of the SEC-SAXS experiments are summarized in Supplementary Data 1. Details of the SEC-MALS/RI and SEC-SAXS experiments and analyses are described in Supplementary Methods.

Electron microscopy

Purified ExoMA1 (0.1 mg/mL) in 10 mM Tris-HCl (pH 7.5) and 500 mM NaCl was applied to carbon-coated copper grids (Nisshin-EM, Tokyo, Japan) glow-discharged by HDT-400 (JEOL). The grids were negatively stained three times with 2% uranyl acetate. Electron micrographs were taken using a JEM-1400 electron microscope (JEOL) operated at 120 kV.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.