High-throughput discovery of organic cages and catenanes using computational screening fused with robotic synthesis

Supramolecular synthesis is a powerful strategy for assembling complex molecules, but to do this by targeted design is challenging. This is because multicomponent assembly reactions have the potential to form a wide variety of products. High-throughput screening can explore a broad synthetic space, but this is inefficient and inelegant when applied blindly. Here we fuse computation with robotic synthesis to create a hybrid discovery workflow for discovering new organic cage molecules, and by extension, other supramolecular systems. A total of 78 precursor combinations were investigated by computation and experiment, leading to 33 cages that were formed cleanly in one-pot syntheses. Comparison of calculations with experimental outcomes across this broad library shows that computation has the power to focus experiments, for example by identifying linkers that are less likely to be reliable for cage formation. Screening also led to the unplanned discovery of a new cage topology—doubly bridged, triply interlocked cage catenanes.


Supplementary
: The 29 cages that did not form in this study, shown in their targeted topology. The cage A7 is not shown, as it was not modelled. Cages A13, A14 and A17 were predicted to not be shape-persistent. Tri 2 Di 3 cages are shown with orange carbons, Tri 4 Di 6 in maroon and Tri 4 Tri 4 in teal. Remaining atom colouring is as follows; oxygen (red), bromine (brown), boron (pink), silicon and sulfur (yellow), chlorine (green) and nitrogen (blue). Hydrogens are omitted. Figure 4: The 16 cages that formed in this study, but for which the mixture was impure or there was incomplete conversion of starting material. The two cages B7 and C7 with precursor 7 are not shown, as these were not modelled. Tri 2 Di 3 cages are shown with orange carbons, Tri 4 Di 6 in maroon and Tri 4 Tri 4 in teal. Remaining atom colouring is as follows; oxygen (red), bromine (brown), boron (pink), silicon and sulfur (yellow), and nitrogen (blue). Cages synthesised with 10 are simplified with the external alkyl groups replaced with a methyl group. Hydrogens are omitted. Figure 5: Summary of the overall high-throughput (HT) workflow for the discovery, synthesis and characterisation of organic cages. (a) Overall workflow used for the design, synthesis, and characterisation of organic cages; (b) High-throughput synthetic workflow: i) The 78 possible combinations for cage formation were screened on a Chemspeed Accelerator SLT-100 robotic platform using liquid dispensing with the reactions heated for 3 days at 65 °C with vortexing at 800 rpm; ii) Aliquots of all 78 reactions were taken directly from the high-throughput screen (pre-isolation) and analysed by 1 H NMR spectroscopy, high-resolution mass-spectrometry (HRMS), and FTIR using a thin film deposited on a 96-well silica wafer. For a selection of reactions, blind diffusion NMR was carried out where clear cage formation was visible. This allowed the stoichiometry and size of the formed cages to be determined; iii) Combinations which indicated cage formation were filtered to remove any insoluble precipitate, and isolated by removal of the solvent under reduced pressure on a Combidancer evaporator; iv) For the isolated solid samples, the sample's crystallinity was investigated by powder X-ray diffraction (PXRD), and they were re-analysed by 1 H NMR spectroscopy to determine stability to isolation; (c) Visual comparison of results from the high-throughput screen. Figure 8: Displacement ellipsoid plots from the single crystal structure, 3(B2)•6(CH2Cl2); shown for one B2 cage in entirety (left); and for the three crystallographically distinct cages shown in entirety (right). Ellipsoids are displayed at 50% probability level and CH2Cl2 solvent has been omitted for clarity. The general colouring scheme for the displacement ellipsoid plots are as follows; C = grey; H = white; N = blue; O = red; Br = brown; Cl = green; and S = yellow.  3 and Tri 4 Di 6 cages for the systems where a Tri 4 Di 6 cage was targeted, but a Tri 2 Di 3 was found to form. A-cages have orange carbons, B-cages have maroon carbons and C-cages have teal carbons. Remaining atom colouring is as follows; oxygen (red), bromine (brown), boron (pink), silicon and sulfur (yellow), chlorine (green) and nitrogen (blue). Hydrogens are omitted.

Supplementary Figure 32:
We investigated the gas sorption properties of B11 and B23 after isolation by solvent exchange, which showed that cages can indeed be porous 'as-made'. On comparison to the PXRDs from the HT screen which showed amorphous material on solvent removal, isolation of the cages by solvent exchange leads to more crystalline material (Supplementary Fig. 33 and 34). Nitrogen (a) and hydrogen (b) adsorption (filled shapes) / desorption (empty shapes) isotherms for cages B11[6+4] (red, circles) and B23[4+4] (black, squares) at 77 K and 1 bar. The porosity was comparable to other organic cages reported previously in the literature, with both cages absorbing significant quantities of H2 and N2 at 77 K and 1 bar (B11, N2 2.55 mmol g -1 , H2 2.31 mmol g -1 , SABET = 131 m 2 /g; B23 N2 8.18 mmol g -1 , H2 3.15 mmol g -1 ). 5 10 15 Table 15) showing conversion of terephthalaldehyde (11)* to cage (B11)*.    NB. A/B/C16 and B/C19 -possible cage formation was apparent in the 1 H NMR spectra, but stoichiometry could not be determined by HRMS or diffusion NMR, so classified as red in this study. To compare the geometry of the calculated models to the SCXRD structures, overlays of the models and single molecules extracted from the SCXRD were generated using MacroModel and their root-mean-squaredeviation (RMSD) between all non-hydrogen atoms calculated. To ascertain the contribution to crystal packing effects on the geometry of the molecules, we also geometry optimised, in the gas phase, single molecules from the SCXRD structure using OPLS3, as described in the Supplementary Methods, and calculated the RMSD between these optimised structures and the computed structures. The latter should help remove the effects of crystal packing on the molecular conformations of the cages.   (2,4,6-trimethylbenzene-1,3,5-triyl)trimethanamine (B) and terephthalaldehyde (11). General cage formation screen: Terephthalaldehyde 11 (9.7 mg, 0.072 mmol, 3.0 eq.) and (2,4,6-trimethylbenzene-1,3,5-triyl)trimethanamine B (2.0-3.0 eq.) were dissolved in CDCl3 (1-13 mL) and stirred at room temperature (RT) or 60 °C for 1-6 days. Reaction progress was monitored by 1 H NMR spectroscopy (see Supplementary Fig. 43) to determine whether any cage formation had occurred.

Reaction
No.         1 This nomenclature avoids confusion, frequent in the literature, resulting from naming topologies based on polyhedra such as "tetrahedron", since the latter is subjective and also depends on the shape that structure forms (which can be different in solution and the solid state). Several cages with the same underlying topology can have different shapes and therefore resemble different polyhedra.
Here, each structure is labelled as: where X and Y are the two different component precursors that constitute the cage. X and Y are Di if they are di-topic (two reactive functionalities) and Tri if they are tri-topic (three reactive functionalities). By convention, the first precursor, X, has the highest number of reactive end groups (unless X = Y) and if the underlying topology relates to a polyhedron, the X will lie at the vertices. The second precursor, Y, can have a number of reactive end groups less to or equal to X. When X = Y is equal, then the choice of which functional unit is denoted as X and Y is arbitrary. The superscripts m and n denote the number of each precursor incorporated into the topology for X and Y, respectively. Most of the time, X-type precursors are connected to other X-type precursors through only one Y-type precursor; in this case, no subscript p is given. However, if two X-type precursors are directly connected through links with two distinct Y-type precursors, then p = 2. Hence, the subscript p gives the number of double connections between precursor pairs within a topology. These multiple links result in topologies with multiple ring sizes. For some of the smaller topologies with only two X-type building blocks (BBs), there can be triple or quadruple linking of the precursors; in this case, no subscript is given as there is no alternative connectivity for that topology. An example of a topology with these multiple links is the Tri Di 6 topology, in which each of the tri-topic BBs is doubly connected to one of the neighbouring tri-topic BBs through two di-topic BBs. This contrasts to the more common Tri 4 Di 6 topology, which is related to a tetrahedron and does not have the multiple links (shown in Fig. 2a).

Supplementary Note 2: A priori prediction of cage topologies
Models of the most likely topologies were assembled to determine the intrinsic structural preference for a topology in the representative set of three systems (B1, B11, B23): for B1 and B11 -Tri 2 Di 3 , Tri 4 Di 6 , Tri 6 Di 9 , and Tri 8 Di 12 , and for B23 -Tri 1 Tri 1 , Tri Tri 2 and Tri 4 Tri 4 . The approach as described in the Supplementary Methods was applied. Their likelihood of formation was then determined by comparing relative energies per [2+3] or [1+1] unit. This energetic comparison was possible for these systems as these topology series contain multiples of the same building unit. The equivalent comparison is not possible between molecules from different series (e.g. between a Tri 4 Di 6 and Tri 4 Tri 4 molecule, hence the formation energies were used in those instances, as described in the Supplementary Methods).
As an example, to compare the energies for B1, we compared ETri2Di3 to (ETri4Di6/2) to (ETri6Di9/3) to (ETri8Di12/4). This allows us to compare the relative internal energy of a single [2+3] unit in each of the molecules. We use the word "internal" in reference to the fact that we do not include a description of any other factors such as the solvent effect.
Excluded Topologies: An alternative topology to a Tri 4 Di 6 outcome for a [4+6] reaction is Tri Di 6 . We did not include this possibility here in the simulations because this topology contains two-doubly connected tri-topic building blocks (indicated in the nomenclature by the subscript 2) as well as two singularly-connected building blocks. The double connections result in two small windows in the Tri Di 6 topology, and any such molecule with highly symmetric tri-topic precursors is likely to be highly strained, as found previously in similar systems. 1 The same applies for the Tri Tri 3 topology, which we also excluded from the simulations.

Supplementary Note 3: Model cage formation, reaction optimisation and extension to other topologies
Before translating the synthesis of organic cages onto an automated robotic platform, we first set out to determine robust reaction conditions for each of the three aldehyde types used in this study. Our aim was to develop reaction conditions that would afford reasonable conversion to the cage molecule while keeping it in solution without any crystallisation or precipitation. There were two reasons for this: first, some cages are unstable to isolation in the solid state, especially if they are not shape-persistent, and hence good solubility would allow high-throughput solution analysis to determine whether a cage had formed prior to isolation. Second, working under homogenous conditions avoids any potential problems associated with the handling of suspensions on the robotic platform.
Previously, the reaction of (2,4,6-trimethylbenzene-1,3,5-triyl)trimethanamine (B) with terephthalaldehyde (11) was reported to produce a complex precipitate. 2 However, by carrying out this reaction at a lower concentration than we have used previously for other imine cages, 3,4 and heating, clean albeit incomplete conversion to a cage species was observed by 1 H NMR spectroscopy (see Supplementary Table 15, entries 1 and 2, and Supplementary Fig. 43). Full conversion of the terephthalaldehyde (11) to a cage species was achieved by using a one-molar equivalent excess of triamine (B), although this was limited to an additional equivalent per predicted cage stoichiometry (entry 3), i.e., for a Tri 4 Di 6 topology, which contains 4 molecules of triamine, 5 equivalents were used in the reaction. Using a larger excess of triamine led to the formation of impurities (entry 4). Also, increasing the concentration of the reaction led to increasing amounts of a precipitate and this was avoided to facilitate transfer of the reactions to the robotic platforms (entries 5-7).
Scale-up of the optimised reaction conditions (Supplementary Table 15 With the successful formation of the new Tri 4 Di 6 cage (B11[4+6]), we needed to determine whether the reaction conditions were translatable to other systems before transferring them to an automated platform. Typically, the formation of different organic cages has required small alterations to tune the reaction conditions to successfully synthesise them in reasonable yields -for example, variations in the concentration, temperature, solvent, addition of a catalyst, removal of water during the reaction, rate and order of reagent addition, and layering have all been utilised previously, even for structurally analogous cages. 5 This is difficult to implement when transitioning the synthesis of cages onto an automated robotic platform where, ideally, a single set of robust conditions is required that can be applicable to a range of cage molecules. We therefore investigated the expansion to other aldehyde building blocks (isophthalaldehyde (1) and tris(4-formylphenyl)amine (23)

Supplementary Note 4: Modelling of cages in the high-throughput screen
All molecules were assembled into their targeted topologies, Tri 2 Di 3 for aldehydes 1-10, Tri 4 Di 6 for aldehydes 11-21 and Tri 4 Tri 4 for aldehydes 22-26. We did not include precursor 7 in the computational study because OPLS3 did not reproduce the nitro-group on the aromatic ring well. After initial assembly showed that the alkyl group on precursor 10 was external to the cage, we did not include the alkyl group in subsequent modelling as it was not deemed to be important for cage formation and yet would require significant additional computational cost.

Supplementary Note 5: Structural analysis of the computational models
For the samples that were investigated using diffusion NMR, the inner pore sizes, average diameters, and maximum outer diameters were calculated from the lowest energy modelled structures to allow for comparison to the experimentally measured solvodynamic diameters, with Δ(CALC-EXP) showing the difference between the calculated average diameter (CALC) and the measured solvodynamic diameter (EXP): i) Pore size: Calculated as the distance between the pore centre, assumed to be at the centre of mass of the molecule, and the closest atom. The obtained distance is then corrected for the appropriate van der Waals radius, and multiplied by 2. ii) OPT pore size: The pore size in (i) assumes the centre of mass of the cage is also the central point of the cavity, which is true in the case of highly symmetric cage molecules, but not the case in all systems, for example in elongated, non-spherical cavities. To reflect the latter, we also calculated the pore size by finding the true centre of the cavity and then calculating the largest sphere that could fit in the cavity. The latter we name as the OPT pore size. Practically, to find the centre of the cavity, we used an optimisation step in our in-house software that finds the coordinates that maximise the pore size. iii) The maximum outer diameter is defined as the distance between the two furthest atoms, corrected by the appropriate van der Waals radii. iv) The steps required to calculate the average diameter are as follows: a) A molecule is taken and the maximum outer diameter is determined. Then, a set of sampling points is distributed evenly on a sphere with radius equal to the maximum outer diameter, using Vogel's golden vector approach for an even points distribution on a disc, modified for a sphere. 6 b) Each sampling point is connected to the centre of mass of the molecule by a vector.
The overlap of the vector paths and the van der Waals spheres of the molecule's atoms is then analysed. c) The molecule's outline, as a set of points determined by the vectors crossing through the van der Waals spheres, is created. d) The average of the distances of the molecular outline points yields the molecule's average diameter ( ̅ ) and is given by the equation: where , is equal to the distance of the outline point from the centre of mass.

Supplementary Note 6: Modelling of covalently bridged cage catenane structures
To compare the relative energy of the covalently bridged cage catenane structure to the related Tri 4 Di 6 topology, we took the structure of a single [8+12] molecule from the SCXRD structure and geometry optimised at the DFT level, as described above in the Supplementary Methods.

Supplementary Note 7: Representative worked example of high-throughput characterisation
We include here a representative worked example of determining cage formation for B15, one of the 'hits' from the high-throughput screen: Supplementary Figure 115: Stacked 1 H NMR spectra (CDCl3) of the precursors used, and the crude reaction, for B15, taken directly from the high-throughput screen (pre-isolation) indicating conversion of aldehyde to an imine, accompanied by a 'clean' aromatic region that indicates cage formation. Comparison of the 1 H NMR spectra (lower) after removal of solvent (post-isolation) shows cage still present, with a small amount of decomposition visible. Triamine B Transmission / a.u.

Aldehyde 15
Wavenumber / cm -1 Cage B15 Supplementary Figure 117: Stacked FTIR spectra of the precursors used, and the crude reaction, for B15, taken directly from the high-throughput screen (pre-isolation) showing loss of the C=O stretch, and the formation of an imine C=N stretch.

General methods for computational structure generation
When a cage model was required, the structures were assembled using in-house computational software that works as follows. Firstly, the pair of precursor molecules is generated based on their SMILES code and RDKit software 7 to generate precursor geometries, which are then optimised with the Universal Forcefield (UFF). 8 The precursors are then placed on the vertices and edges of the relevant polyhedra based on the underlying topology of the cage, the reactive end groups of the precursors are then linked and the chemistry corrected such that the aldehyde and amine pairs form imine bonds. The structures of the assembled molecules are then initially geometry optimised using UFF to produce feasible structures, and are then geometry optimised with the OPLS3 forcefield 9 using MacroModel (version 10.3.015, Schrödinger, LLC, New York, NY, 2015-3), which we have previously found to produce reliable structures for porous organic cages. 1 At the next stage, global optimisation searches are carried out in order to find the lowest energy conformations for the cage and to ensure there is no dependence upon the initial starting geometry.

Global optimization calculations
After trialling both low-mode conformer search calculations and sampling molecular dynamics (MD) simulations as an approach to finding low energy conformations, we found that the latter was more efficient for this work. For each cage molecule, we would carry out multiple high temperature MD simulations to locate the lowest energy conformers, and simulations were repeated until no new low energy conformers were found in at least 3 MD runs. Each MD run was performed with OPLS3 in MacroModel at 1000 K for 100 ns with a time step of 0.7 fs and sampling every 10 ps. Each sampled structure was geometry optimised with the Polak-Ribier Conjugate Gradient method and a convergence criteria of 0.05 kJ mol -1 Å -1 . Each individual MD simulation generated 10000 conformers. From the multiple MD simulations, the lowest energy conformation, plus a few selected conformations to represent the key conformations observed within a 20 kJ mol -1 energy gap were chosen for subsequent DFT refinement. A single conformation was chosen if the lowest energy conformation was symmetric; more were chosen if it the molecule was more flexible or asymmetric. Given the large amount of conformations generated, it was not possible to visualise or refine at Density Functional Theory (DFT) level every single conformer. In the future, software that analyses symmetry and cavity size could be applied to extract a selection of the best candidates within an energy range.

Refinement of structures with DFT calculations
For more reliable energetic rankings of the molecules, DFT calculations were conducted on the selected low energy conformations. DFT calculations were run in CP2K 10 with the PBE potential, 11 combined with the TZVP basis set, 12 Grimme D3 dispersion corrections, 13 and a plane-wave cut-off of 350 Ry. We have found previously that this approach gives a reasonable balance between computational cost and accuracy. 1 The convergence criteria was an energy difference of less than 1 x 10 -3 Hartree.

Calculation of formation energies
The formation energies of the cages were calculated at the DFT level using the following equation: where Ecage is the energy of the cage formed, Ewater is the energy of the water produced in the reaction, Ealdehyde is the energy of the aldehyde and Eamine is the energy of the amine. The latter values were calculated at the same level as described in 3, on the lowest energy conformers of the molecules. The number of aldehyde reactants is m, n is the number of amine precursors and xn is the number of imine bonds formed, equivalent to the number of water molecules, x, produced in the reaction.

General synthetic and analytical methods
Materials: Chemicals were purchased from TCI UK, Fluorochem, or Sigma-Aldrich and used as received. Solvents were reagent or HPLC grade purchased from Fisher Scientific, with the exception of chloroform-D which was purchased from Apollo Scientific and used for the highthroughput screen. All chemicals and solvents were used as received, unless specified.
Synthesis: All reactions requiring anhydrous or inert conditions were performed in oven-dried apparatus under an inert atmosphere of dry nitrogen, using anhydrous solvents introduced into the flask using disposable needles and syringes. All reactions were stirred magnetically using Teflon-coated stirring bars. Where heating was required, the reactions were warmed using a stirrer hotplate with heating blocks with the stated temperature being measured externally to the reaction flask with an attached probe. Removal of solvents was done using a rotary evaporator.
High-throughput cage discovery: High-throughput automated synthesis was conducted using a Chemspeed Accelerator SLT-100 automated synthesis platform, and removal of solvents was carried out using a Combidancer evaporator.
TLC and column chromatography: Reactions were monitored by thin layer chromatography (TLC), conducted on pre-coated aluminium-backed plates (Merck Kieselgel 60 with fluorescent indicator UV254). Spots were visualized either by quenching of UV fluorescence or by staining with potassium permanganate. Flash column chromatography was performed on a Biotage Isolera with KP-Sil Normal Phase disposable columns.
Melting points: Obtained using Griffin melting point apparatus and are uncorrected.
IR spectra: Infra-red (IR) spectra were recorded on a Bruker Tensor 27 FT-IR using ATR measurements for oils and solids as neat samples, or for the high-throughput screening, in transmission mode on a 96-well silica wafer deposited as a thin film.
NMR spectra: 1 H Nuclear magnetic resonance (NMR) spectra were recorded using an internal deuterium lock for the residual protons in CDCl3 (δ = 7.26 ppm) at ambient probe temperature using either a Bruker Avance 400 (400 MHz) or Bruker DRX500 (500 MHz) instrument.
NMR data are presented as follows: chemical shift, integration, peak multiplicity (s = singlet, d = doublet, t = triplet, q = quartet, m = multiplet, br = broad, app = apparent) and coupling constants (J / Hz). Chemical shifts are expressed in ppm on a δ scale relative to δCDCl3 (7.26 ppm) and coupling constants, J, are given in Hz.  (10 -5 bar) before analysis, followed by degassing on the analysis port under vacuum, also at 90 °C. Isotherms were measured using Micromeritics 2020, or 2420 volumetric adsorption analyzer. Gas uptake measurements (for N2, H2) were taken at a temperature of 77 K, stabilized using a circulating water chiller/heater.

Diffusion NMR:
All measurements were carried out non-spinning on a 400 MHz Bruker Avance 400 spectrometer, using a 5 mm indirect detection probe, equipped with a z-gradient coil producing a nominal maximum gradient of 34 G/cm. Diffusion data was collected using the Bruker longitudinal eddy current delay (LED) pulse sequence (ledgp2s). A diffusion encoding pulse δ of length 1-7 ms and diffusion delay D of 0.1-0.25 s was used. Gradient amplitudes were equally spaced between 1.70 and 32.4 G/cm. Each FID was acquired using 16 k data points. All experiments were carried out at a nominal probe temperature of 298 K, with an air flow of 800 L/h to minimise convection. All diffusion coefficients were calculated using measurements from multiple peak areas in the 1 H NMR spectra and the numbers quoted represent the mean values, but measurement was only carried out once to generate a high-throughput methodology.
Diffusion coefficients were calculated from signal intensities using the Skejskal-Tanner equation: 14

= !∆# $ % &'
Where I is the signal intensity, I0 is the signal intensity at a gradient strength of zero, g is the gradient strength, and D is the diffusion coefficient (D = m 2 /s). Solvodynamic radii, RS (nm), of solution-phase species were calculated from the Stokes-Einstein equation assuming molecules have a spherical geometry: ( = )* 6,-. / Single crystal X-ray diffraction: Single crystal X-ray data sets were measured using a Rigaku MicroMax-007 HF rotating anode diffractometer (Mo-Kα radiation, λ = 0.71073 Å, Kappa 4circle goniometer, Rigaku Saturn724+ detector); at beamline 11.3.1, Advanced Light Source, Berkeley, USA, using silicon monochromated synchrotron radiation (λ = 0.7749 Å or 1.0332 Å, PHOTON100 CMOS detector); at beamline I19, Diamond Light Source, Didcot, UK using silicon double crystal monochromated synchrotron radiation (λ = 0.6889 Å, Dectris Pilatus 2M, or Rigaku Saturn724+ detector); or using a Bruker D8 Venture Advance diffractometer equipped with IμS microfocus source (Cu-Kα radiation, λ = 1.54178 Å, Kappa 4-circle goniometer, PHOTON100 CMOS detector). Unless started in the refinement details section (Supplementary Methods), solvated single crystals, isolated from the crystallization solvent, were immersed in a protective oil, mounted on a MiTeGen loop, and flash cooled under a dry nitrogen gas flow. Rigaku frames were converted to Bruker compatible frames using the programme ECLIPSE. 15 Empirical absorption corrections, using the multi-scan method, were performed with the program SADABS. 16,17 Structures were solved with SHELXD, 18 SHELXT,19 or by direct methods using SHELXS, 20 and refined by full-matrix least squares on |F| 2 by SHELXL, 21 interfaced through the programme OLEX2. 22 Unless stated, all non-H-atoms were refined anisotropically, and H-atoms were fixed in geometrically estimated positions and refined using the riding model. Supplementary CIF's, that include structure factors and responses to checkCIF alerts, are available free of charge from the Cambridge Crystallographic Data Centre (CCDC) via www.ccdc.cam.ac.uk/data_request/cif.

5-Bromoisophthalaldehyde, 2
To a round bottomed flask, equipped with stirrer bar, was added isophthalaldehyde (50.00 g, 372.77 mmol, 1.0 eq.) followed by concentrated sulphuric acid (200 mL). The resulting mixture was heated to 65 °C, before direct heating was removed for the portionwise addition of Nbromosuccinimide (72.98 g, 410.05 mmol, 1.1 eq.) over 20 min. After complete addition, heating was resumed and the reaction was stirred at 65 °C for 19 h. The reaction was allowed to cool to room temperature, and was poured into ice (~1 L) and stirred. The mixture was left for 1 h before the resulting precipitate was collected by filtration. The collected solid was dissolved in DCM (1 L) and washed with water (2 x 200 mL). The organic layer was dried (MgSO4) and hexane (500 mL) added, before the DCM was carefully removed in vauo to afford a beige precipitate which was collected by filtration. The resulting solid was washed with a 1:2 methanol:hexane mixture (300 mL) and dried in vacuo to afford 5bromoisophthalaldehyde 2 which was used without further purification (43.14 g, 202.5 mmol, 54%).

2,5-Dihydroxyterephthalaldehyde, 12
To an oven dried round bottomed flask equipped with stirrer bar was added 2,5-dimethoxybenzene-1,4-dicarbaldehyde (0.50 g, 2.57 mmol, 1.0 eq.) before the flask was evacuated and refilled with N2 (×3). Anhydrous DCM (74 mL) was added and once fully dissolved, the solution was cooled to 0 °C in an ice-bath. To the cooled pale yellow solution was added a solution of boron tribromide (26 mL,1 M in DCM,25.75 mmol,10.0 eq.) dropwise, at which point a colour change was observed to bright orange. After complete addition, the reaction was allowed to warm to room temperature and stirred overnight for 21 h before any residual BBr3 was quenched by the dropwise addition of water whilst cooling in an ice-bath. Once quenched, water (100 mL) was added and the organic layer separated. The aqueous layer was extracted with DCM (2 × 200 mL) and the combined organic layers dried (MgSO4) and concentrated in vacuo to afford the desired product 12 as an orange crystalline solid which was used without further purification (427 mg, quant.

B11[4+6]
Method A: A solution of terephthalaldehyde 11 (194 mg, 1.45 mmol, 6.0 eq.) and (2,4,6-trimethylbenzene-1,3,5triyl)trimethanamine B (250 mg, 1.21 mmol, 5.0 eq.) dissolved in CHCl3 (260 mL) was heated at 60 °C for 3 days. Reaction allowed to cool to room temperature and solution filtered to remove any insoluble precipitate before concentration in vacuo. The resulting solid was re-dissolved in DCM (100 mL) and hexane (100 mL) was added. The DCM was carefully removed in vacuo to give a cream precipitate which was collected by filtration and dried in vacuo to afford the cage product B11 as an off-white solid (254 mg, 0.18 mmol, 74%).

Method B:
A solution of terephthalaldehyde 11 (194 mg, 1.45 mmol, 6.0 eq.) and (2,4,6trimethylbenzene-1,3,5-triyl)trimethanamine B (250 mg, 1.21 mmol, 5.0 eq.) dissolved in DCM (260 mL) was heated at 40 °C for 2 days. Reaction allowed to cool to room temperature and solution filtered to remove any insoluble precipitate before the addition of hexane (200 mL). The DCM was carefully removed in vacuo to give a colourless precipitate which was collected by filtration and dried in vacuo to afford the cage product B11 as a cream solid (224 mg, 0.16 mmol, 65%).

B1[2+3]
Method A: A solution of isophthalaldehyde 1 (80 mg, 0.60 mmol, 3.0 eq.) and (2,4,6-trimethylbenzene-1,3,5triyl)trimethanamine B (125 mg, 0.602 mmol, 3.0 eq.) dissolved in CHCl3 (130 mL) was heated at 60 °C for 4 days. Reaction allowed to cool to room temperature before concentration in vacuo. The resulting solid was re-dissolved in DCM (30 mL) and any insoluble precipitate removed by filtration before the addition of hexane (30 mL) to the filtrate. The DCM was carefully removed in vacuo to give a cream precipitate which was collected by filtration and dried in the vacuum oven at 50 °C overnight to afford the cage product B1 as a cream solid (44 mg, 0.062 mmol, 31%).

Method B:
A solution of isophthalaldehyde 1 (161 mg, 1.205 mmol, 3.0 eq.) and (2,4,6trimethylbenzene-1,3,5-triyl)trimethanamine B (250 mg, 1.205 mmol, 3.0 eq.) dissolved in DCM (260 mL) was heated at 40 °C for 3 days. Reaction allowed to cool to room temperature and solution filtered to remove any insoluble precipitate before the addition of hexane (200 mL). The DCM was carefully removed in vacuo to give a precipitate which was collected by filtration and dried in vacuo to afford the cage product B1 as a cream solid (139 mg, 0.196 mmol, 49%).

High-Throughput Screen
All triamine and aldehyde precursors dissolved in CDCl3 to make stock solutions (2.5-5 mg/mL) for use in high-throughput screening (Supplementary Table 16). Over 3 runs on a Chemspeed Accelerator SLT-100 platform, the required volume of each triamine stock solution, followed by the required volume of each aldehyde stock solution, was added to jacketed reactors (27 mL maximum volume) via liquid dispensing, followed by additional CDCl3 to make each total volume up to 13 mL (Supplementary Table 17-19). The resulting solutions were vortexed at 800 rpm and heated to 65 °C for 3 days before being allowed to cool to room temperature. All reactions removed from reactor vessels and filtered through a small cotton wool plug to remove any insoluble precipitate prior to analysis.

Scale-up of cage hits
General procedure: A solution of aldehyde  and triamine (A-C) was stirred in either DCM or CHCl3 (4.6 mM with respect to mmol of triamine) at 40 °C or 65 °C respectively, for 1-4 days. The reaction was allowed to cool to room temperature and filtered to remove any insoluble precipitates, before 40 mL was removed for direct use in a crystallisation screens. For DCM: to the remainder of the filtrate was added an equivalent volume of hexane and the DCM carefully removed in vacuo. For CHCl3: to the remainder of the filtrate was added an excess of hexane and the solution concentrated to ~100 mL, before an additional excess of hexane was added and re-concentrated to ~100 mL. The resulting precipitate was collected by filtration, dried in vacuo and the cage product characterised.

Targeted Cage
Aldehyde (eq.) Triamine (eq.) For cages that were not scaled up, proved unstable to isolation, and/or gave a mostly or completely insoluble product, the cage was characterised in solution and the data presented is that from the robot screen where the reaction had proceeded cleanly. For all other data see Supplementary Tables 21-23

A1[2+3]
Synthesised according to the general scale-up procedure using benzene-1,3,5triyltrimethanamine A (100 mg, 0.61 mmol, 3.0 eq.) and isophthalaldehyde 1 (81 mg, 0.61 mmol, 3.0 eq.) in DCM (130 mL) for 3 days to afford A1 as a colourless solid (75 mg mass recovery). Isolated product proved to be poorly soluble suggesting some decomposition of cage product had occurred while the 1 H NMR spectrum indicates that residual triamine A is present.

B1•0.82(C4H8O2)•(CH2Cl2)•(H2O)
The single crystal structure, B1•0.82(C4H8O2)•0.18(CH2Cl2)•0.2(H2O), crystallised from a CH2Cl2/EtOAc solution in the triclinic space group P1 . For this phase, the asymmetric unit comprises one complete B1 cage and disordered solvent molecules. The solvent was modelled as a mixture of EtOAc, CH2Cl2, and H2O. In the structure, EtOAc and CH2Cl2 were disordered over one position and site occupancies for each solvent was determined using a free variable. The disordered CH2Cl2 in the structure was refined with C-Cl bond distance restraints (DFIX in SHELX). The occupancy of the disordered H2O molecule was determined using a free variable. For this H2O molecule, it was not possible to accurately determine H atom positions, H atoms were therefore placed in estimated positions (PLAT415_ALERT_2_B checkCIF alert). For a displacement ellipsoid plot of the asymmetric unit, see Supplementary  Fig. 7.

3(B2)•6(CH2Cl2)
The single crystal structure, 3(B2)•6(CH2Cl2), crystallised from a CH2Cl2/Et2O solution in the hexagonal space group P6 and the structure was refined with the TWINLAW [010 100 001 ], BASF [0.340 (5)]. For this structure, the asymmetric unit comprises three 1/6 cage fragments of three crystallographically distinct B2 cage molecules. The X-ray data quality was poor and the B2 cage molecules are disordered in the crystal structure (PLAT341_ALERT_3_B checkCIF alert). As a result, the structure was refined with a rigid bond restraint (RIGU in SHELX). In addition, due to solvent disorder, the CH2Cl2 molecules were refined with C-Cl bond distance restraints (DFIX in SHELX). For a displacement ellipsoid plot, see Supplementary Fig. 8.

B9•0.75(CDCl3)•1.99(H2O)
The single crystal structure, B9•0.75(CDCl3)•1.99(H2O), crystallised from a CDCl3/MeOH solution in the triclinic space group P1 . For this phase, the asymmetric unit comprises one complete B9 cage and disordered solvent molecules. Due to disorder, a suitable resolution limit of 0.9 Å was applied during refinement (THETM01_ALERT_3_B checkCIF alert). In the structure, S atoms of the three thiophene groups were disordered over two positions. C-S and C-C bond distance restraints were used to model the disordered parts (DFIX in SHELX). For the disordered parts, free variables were used to determine site occupancies and atoms that shared almost equivalent coordinates were refined with constrained displacement parameters (EADP in SHELX). The disordered, partially occupied, solvent was modelled as a mixture of CDCl3 and H2O. In was necessary to refine some of the disordered solvent with bond distance restraints (DFIX, SADI and DANG in SHELX) and rigid bond restraints (RIGU in SHELX). For a displacement ellipsoid plot, see Supplementary Fig. 9.

B11•4.5(C6H14)•4.5(CH2Cl2)
The single crystal structure, B11•4.5(C6H14)•4.5(CH2Cl2), crystallised from a CH2Cl2/hexane solution in the tetragonal space group P41. For this phase, the asymmetric unit comprises one complete B11 cage. Chirality in the structure arises from the cage molecules packing helically around screw axes, however, absolute configuration could not be meaningfully determined. Due to disorder, X-ray data quality was poor and solvent molecules could not be resolved in the large pores (PLAT602_ALERT_2_A and PLAT049_ALERT_1_B checkCIF alerts). Hence, the SQUEEZE routine in Platon was used during the final refinement cycles. 36,37 SQUEEZE found a 6133 Å 3 void with a disordered electron count of 1681 (e-). As a result, 18 CH2Cl2 and 18 hexane solvent molecules were tentatively added to the unit cell atom count (CHEMW03_ALERT_2_A ALERT, PLAT043_ALERT_1_A, and PLAT051_ALERT_1_A checkCIF alerts). During refinement a suitable resolution limit of 0.9 Å was applied (PLAT027_ALERT_3_A and THETM01_ALERT_3_B checkCIF alerts) and the B11 cage molecule was refined with a rigid bond restraint (RIGU in SHELX). In addition, 1,2 and 1,3 bond distance restraints were used during refinement (DFIX, SADI, and DANG in SHELX). Due to disorder, and the limited resolution of the diffraction data, it was not possible to accurately determine H atom positions. H atoms were therefore placed in estimated positions and refined using the riding model. The estimated positions of the disordered H atoms are unlikely to be correct resulting in close intramolecular H-H contacts (PLAT412_ALERT_2_B checkCIF alert). For a displacement ellipsoid plot, see Supplementary Fig. 10.

B11•1.16(H2O)
The single crystal that refined to, B11•1.16(H2O), was initially crystallised from a CHCl3/MeCN solvent mixture and then thermally desolvated at 400 K, under a dry N2 gas flow. Prior to the data collection being recorded, the single crystal was sat in air, and during refinement, residual electron density, located in the pores, was modelled as disordered, partially occupied H2O. There was no clear evidence of any notable solvent molecules in the structure, despite collecting the single crystal data at 100 K, and is indicative of the majority, if not all, of the solvent being removed from the pores during thermal desolvation of the pores (PLAT601_ALERT_2_A). Due to the limited resolution of the single crystal X-ray data, a resolution limit of 1.0 Å was applied during refinement (PLAT601_ALERT_2_A alert). For a displacement ellipsoid plot, see Supplementary Fig. 11a, and for a diagram of the crystal packing, see Supplementary Fig. 11b.
SQUEEZE found a 23138 Å 3 void with a disordered electron count of 8162 (e-). As a result, 93 CHCl3 and 93 THF solvent molecules were tentatively added to the unit cell atom count (CHEMW03_ALERT_2_A, PLAT043_ALERT_1_A, and PLAT051_ALERT_1_A checkCIF alerts). In the structure, it was not possible to accurately determine H atom positions. H atoms were therefore placed in estimated positions and refined using the riding model. The single crystal structure, C1•(C8H10)•0.25(CH2Cl2)•0.25(H2O), crystallised from a CH2Cl2/mxylene solution in the triclinic space group P1 . The asymmetric unit for this phase comprises one complete C1 cage and disordered solvent molecules. In the crystal structure 1-D channels are full of disordered solvent that was modelled as partially occupied m-xylene. These mxylene molecules were refined isotropically with constrained geometries (AFIX 66 in SHELX), bond distance restraints (DFIX in SHELX), and planarity restraints (FLAT in SHELX). It was not possible to accurately resolve all disorder around the m-xylene molecules resulting in large unassigned q-peaks. Due to disorder, the CH2Cl2 molecule was refined with C-Cl bond distance restraints (DFIX in SHELX) and a rigid bond restraint (RIGU in SHELX). For a displacement ellipsoid plot, see Supplementary Fig. 18.

C2•3(CH2Cl2)
The single crystal structure, C2•3(CH2Cl2), was crystallised from a CH2Cl2/MeOH solution in the triclinic space group P1 . The asymmetric unit for this phase comprises one complete C2 molecule, and four CH2Cl2 molecules. In the structure, one CH2Cl2 is disordered over two positions. This molecule was refined with bond distance restraints (SADI in SHELX) and a rigid bond restraint (RIGU in SHELX). In addition, two CH2Cl2 molecule were refined with C-Cl bond distance restraints (DFIX in SHELX) and rigid bond restraints (RIGU in SHELX). For a displacement ellipsoid plot, see Supplementary Fig. 19.

C7•2.49(CDCl3)•1.33(C4H8O2)•1.6(H2O)
The single crystal structure, C7•2.49(CHCl3)•1.33(C4H8O2)•1.6(H2O), crystallised from a CDCl3/EtOAc solution in the monoclinic space group C2/c. The asymmetric unit for this phase comprises ½ a C7 cage molecule and disordered solvent molecules. Due to disorder, a suitable resolution limit of resolution limit of 1.0 Å was applied during refinement (CHEMW03_ALERT_2_A, THETM01_ALERT_3_A, and PLAT027_ALERT_3_A checkCIF alerts). checkCIF alerts), and D-H groups without acceptors (PLAT420_ALERT_2_B checkCIF alert). In the structure, one of dimethoxy aromatic groups of C13 was disordered and modelled over two positions resulting in a high U(eq) for one of the methyl group C atoms (C71) (PLAT340_ALERT_3_B checkCIF alert). Due to additional disorder, three phenyl rings of C13 were refined with constrained geometries (AFIX 66 in SHELX), 1,2 and 1,3 bond distance restraints were used to model C13 and the disordered solvent molecules (DFIX and DANG in SHELX), and the structure was refined with a rigid bond restraint (RIGU in SHELX). Despite the disorder in the crystal structure, a solvent mask was not used during refinement. For a displacement ellipsoid plot, see Supplementary Fig. 22.

C14•2(C4H8O)
The single crystal structure, C14•2(C4H8O), crystallised from a CH2Cl2/THF solution in the triclinic space group P1 . The asymmetric unit for this phase comprises one complete C14 cage and two THF molecules. Due to disorder, diffuse scatter beyond 0.85 Å was omitted during refinement (PLAT027_ALERT_3_A checkCIF alert), and two -CH2=N-CH-groups were modelled over two positions. In the crystal structure two reasonably well ordered THF molecules were located between C14 cages. Due to slight disorder, one of these THF molecules was refined with a rigid bond restraint (RIGU in SHELX) and for this molecule a -O-CH2-group was modelled over two positions. For a displacement ellipsoid plot see, Supplementary Fig. 23. The single crystal structure, C21-Tri 2 Di 3 •0.5(CH2Cl2)•0.25(C4H10O), crystallised from a CH2Cl2/Et2O solution in the triclinic space group P1 . Due to slight disorder of the cage structure, the C-S bond distances were restrained during refinement (DFIX in SHELX), and one methyl group was modelled over two positions. In the structure, the solvent was disordered and modelled as a mixture of partially occupied CH2Cl2 and Et2O. The CH2Cl2 molecule was refined with C-Cl bond distance restraints (DFIX in SHELX). During refinement, all H-atoms were placed in estimated positions and refined using the riding model. The single crystal structure, C26•12.25(CHCl3)•12.25(C4H10O), crystallised from a CHCl3/Et2O solution in the tetragonal space group I41/a. The asymmetric unit for this phase comprises ¼ of a C26 cage molecule. Due to disorder, C26 was refined with a rigid bond restraint (RIGU in SHELX), and a 1.1 Å resolution limit was applied during refinement (THETM01_ALERT_3_A and PLAT027_ALERT_3_A checkCIF alerts). In addition, the seven aromatic rings were refined with constrained geometries (AFIX 66 in SHELX). Two of these were modelled over two positions and the severely disordered parts being refined isotropically. Additional disorder was modelled using 1,2 and 1,3 bond distances restraints (DFIX, DELU, and SADI in SHELX), and planarity restraints (FLAT in SHELX). Solvent molecules were too disordered to accurately model in the large pores (PLAT602_ALERT_2_A and PLAT049_ALERT_1_B checkCIF alerts). Hence, the SQUEEZE in PLATON was used during the final refinement cycles. 36,37 SQUEEZE found a 19258 Å 3 void with a disordered electron count of 4908 (e-). As a result, 49 CHCl3 and 49 Et2O solvent molecules were tentatively added to the unit cell atom count (CHEMW03_ALERT_2_A, PLAT043_ALERT_1_A, and PLAT051_ALERT_1_A checkCIF alerts). Due to disorder, and the limited resolution of the diffraction data, it was not possible to accurately determine H atom positions. H atoms were therefore placed in estimated positions and refined using the riding model. The estimated positions of the disordered H atoms are unlikely to be correct resulting in close inter-and intramolecular H-H contacts (PLAT410_ALERT_2_A and PLAT411_ALERT_2_B checkCIF alerts). For a displacement ellipsoid plot, see Supplementary Fig. 27.