Biopharmaceuticals are medical products that are manufactured or extracted from biological sources. Biopharmaceutical-related research and industry have been steadily growing over the last decades1,2, and the recent SARS-CoV-2 outbreak has advanced the field substantially. Protein-based products, including therapeutics, vaccines and diagnostics, are key in addressing medical challenges, and their controlled production and engineering lead towards more efficient products compared to naturally derived compounds.

Therapeutic human proteins are typically produced in (fermenter-based) mammalian cell cultures, with Chinese hamster ovary (CHO) cells being the lead manufacturing platform. However, despite an increase in production facilities and expression levels of biopharmaceuticals, the proposed annual need for biotherapeutics, which ranges in double-digit tonnes, cannot yet be met by current manufacturing facilities1,2. Alternatively, plants can be exploited to produce biopharmaceutically relevant proteins, a process known as molecular farming. As higher eukaryotes, plants can generate functionally active, multi-component human proteins with complex post-translational modifications.

In comparison to mammalian cell-based platforms, molecular farming is simple to scale up, cheaper in manufacturing and intrinsically safe. In particular, the advent of transient expression technologies has enabled protein expression within days post-DNA-construct delivery. Moreover, yields are usually in grams per kilogram of leaf material, with a straightforward production scale-up. In addition, plants can be cheaply grown and, in principle, only need water and an energy source such as the sun (or artificial light). Such beneficial features cannot be met by any other eukaryotic system. Various plant-produced recombinant proteins are currently being investigated in clinical trials; however, only a small number are reaching the market partly owing to the time it takes to adapt industrial production workflows to plant-based production. Nevertheless, the recent viral epidemic and pandemic outbreaks (Ebola and SARS-CoV-2), which demanded rapid pharmaceutical measures, proved that molecular farming is an efficient provider of high-quality products3,4.

In this Review, we discuss the design and potential of plant-based biopharmaceutical engineering, describing the evolution of expression tools, with a focus on virally derived elements. In addition, we highlight the engineering of biopharmaceutically relevant products and expression hosts, exemplified by nanoparticles, plant-based antibody production and engineering of post-translational modifications.

General plant-based production

The plant-based expression of proteins was first conducted ~25–30 years ago, and various plant host species, expression vectors and approaches have been explored, including whole plants, cell cultures, transient and stable expression, and nuclear and plastid expression in different tissues and organs, including seeds. These manifold approaches have hampered the development and optimization of common tools. However, common processes and techniques for plant production platforms include the generation or selection of expression vectors (which are transferred into expression hosts by agrobacteria or biolistic tools), harvest and extraction of tissue, and purification and analyses of the product (Fig. 1 and Box 1). Upscaling is then achieved by green-house expansions (vertically or horizontally) or by increasing fermenter capacity (in the case of cell cultures). Purification and analysis techniques are similar across expression platforms; however, expression vectors and tissue extraction are plant specific.

Fig. 1: Plant-based production of recombinant proteins.
figure 1

Expression vectors (based on binary plasmids) are transformed into Agrobacterium tumefaciens, which delivers DNA into plants for stable expression by agro-transformation (for example, using the leaf-disc method) or biolistic tools, or through transient expression by agro-infiltration. In transient expression, suspensions of bacteria (carrying the plasmids) are delivered by infiltration into leaves either manually by a syringe (small scale) or by vacuum infiltration (large scale). Following infiltration, plant expression machinery drives the expression of the genes within days. Plant tissues or cells are then harvested and extracted, followed by purification and analyses of the product.

Expression hosts and vectors

N. benthamiana

Among the more than 100 plant species that have been used for recombinant protein expression, N. benthamiana is particularly suitable for transient expression and is thus, by far, the most used expression host. This wild relative of tobacco has a fast growth rate and large biomass and is particularly amenable to expression vector delivery by agrobacteria owing to a defective RNA silencing system5,6. Protein yields can reach gram levels of product per kilogram of leaves within 5–7 days post-DNA delivery, a key benefit over transgenic procedures and mammalian cell-based expression systems, for which such techniques are not well established7. Good manufacturing process (GMP)-assigned large-scale facilities, for example, at the Fraunhofer Institute for Molecular Biology and Applied Ecology in Aachen, allow processing of up to 200 kg of biomass per batch. Commercial-scale production facilities based on transient expression are currently mainly operated in North America (for example, iBio Bryan (Texas, USA), Medicago (Quebec, Canada), Raleigh-Durham (North Carolina, USA), and Kentucky Bioprocessing Owensboro (Kentucky, USA)) and Europe (Leaf Expression Systems (Norwich, UK), Nomad Bioscience, Icon Genetics GmbH (Halle an der Saale, Germany)). Commercial-scale production facilities have also been established in medium-income and low-income regions (for example, Baiya Phytopharm (Thailand), Cape Bio Pharms (South Africa)). These industrial-scale production facilities encompass a three-digit hectare area, which would allow the rapid production of a billion doses of vaccines or other valuable biopharmaceutical products3,4.

The production speed by transient expression approaches depends on the desired protein yield; for example, Icon Genetics reports a manufacturing time of an individualized cancer vaccine of 12–14 weeks, from biopsy to product delivery8. Medicago Inc. has demonstrated the development of a virus-like particle-based vaccine within 3 weeks post antigen sequence availability, estimating the generation of at least 30 million doses over a 3-month period in a single 5,000-m2 glasshouse facility9,10.

The tools underlying plant-based expression approaches, including expression in various whole plants, different tissues and suspension cultures, are interchangeable (Box 1). Many binary vectors compatible with Agrobacterium can be used for both stable and transient expression as well as to introduce sequences into either whole plants or suspension cultures. Small-scale transient expression in leaves is often used to express a new molecule before embarking on the production of transgenic lines. However, here, we do not discuss stable expression in tissues and organs such as seeds.

Engineering N. benthamiana

Although originally restricted to single genes, transgenic delivery can now be achieved by the simultaneous transfer of multiple genes and sequences using high-throughput vector assembly systems. In plants, multiple genes can be integrated through time-consuming sequential crossing of pre-existing, independent, transgenic lines11. Alternatively, multi-gene vectors are being developed12,13; for example, the GoldenBraid iterative assembly approach provides a versatile modular system to establish a standard for the connection of genetic parts in the design of building circuits, applicable to both plant genome editing of multiple genomic targeting sites and transgene expression12,13.

To facilitate coordinated expression of multiple sequences, genetic circuits are typically comprised of transcription units, which, in eukaryotic cells, consist of three standard biological parts: promoters, coding sequences and terminators. However, synchronized gene expression remains challenging because each expression cassette has to be controlled by different regulatory elements to avoid co-suppression events. Alternatively, a polycistronic expression derived from a single expression cassette allows uniform gene expression and synchronized regulation of multiple transgenes14,15.

Following the delivery of multi-gene circuits into plants, expression levels of the different genes in the circuit must be quantitatively balanced to achieve the intended functions. This is achieved by promoters that quantitatively control gene expression levels and typically serve as the linking piece between the inputs (for example, ligand) and the production of an output (for example, response). In addition, promoters can be designed to fine-tune the expression levels of genetic components to achieve a specific behaviour of circuits16,17. For example, the sialylation pathway can be transferred to N. benthamiana by stable multi-gene delivery.

Viral elements in expression vectors

Adequate expression vectors and efficient host delivery play a crucial role in all production systems. Plant expression vectors are typically derived from binary plasmids that can be used by Agrobacterium tumefaciens to transfer genetic material to host cells, where the target genes are correctly processed. Both nuclear and plastomic (chloroplastic) expression have been established for recombinant protein production in plants. Although nuclear-based expression is more common, plastid transformation offers some advantages, in particular regarding expression levels and the use of edible plant species18,19. Nuclear-based expression involves the delivery of foreign DNA to plant cells using Agrobacterium or biolistic methods, of which the latter is mainly used for stable transformation.

The design of efficient transient expression vectors (based on binary transfer DNA plasmids) has greatly advanced nuclear expression, largely replacing the stable transgenic approach. In transient expression, A. tumefaciens delivers recombinant DNA (Fig. 2a) into leaves, a procedure called agro-infiltration (Fig. 2b). Following DNA transfer to the plant cell nucleus, the plant expression machinery takes over, driving mRNA and subsequent protein production within days. By contrast, stable transformation depends on the generation of homozygous plants that permanently integrate the transgene into the genome, which is a time-consuming process, particularly for the production of multi-component proteins11. However, stable transformation is useful if well-defined products are required for a longer period such as HIV-neutralizing microbicides produced in rice20.

Fig. 2: Transient expression in plants.
figure 2

a, Plant transient expression vectors. A binary plasmid is the standard tool for nuclear expression in higher plants. Binary plasmids are composed of transfer DNA (T-DNA) borders (right border (RB), left border (LB)), cloning sites, a selection marker and replication functions for Escherichia coli and Agrobacterium tumefaciens. Transient expression vectors, including pMIDAS, pEAQ-HT, magnICON and Geminivector bean yellow dwarf virus (BeYDV), use such a plasmid backbone for foreign sequence insertion. However, the vectors differ by regulatory elements that drive gene of interest (GOI) expression. b, Nuclear-based transient expression. A. tumefaciens is used as a vehicle to deliver T-DNA to the plant cell nucleus (agro-infiltration), where mRNA is transcribed (single transcripts or multiple transcripts, depending on the expression cassettes). Prior to transcription, BeYDV-based vectors undergo a DNA amplification step by rolling circle replication in the nucleus (DNA amplification). Once transcripts leave the nucleus, they are either directly translated into proteins (pMIDAS, pEAQ, BeYDV) or undergo a (viral) RNA amplification step (magnICON) prior to translation by a specific RNA-dependent RNA-polymerase (RdRP). Engineered expression vectors, based either on potato virus X or tobacco mosaic virus sequences carry a gene coding for RdRP. Recombinant proteins (green triangles) are transported to final subcellular compartments (in case of secreted proteins to the apoplast). 35S-P and 35S-T, cauliflower mosaic virus 35S promoter and terminator; ER, endoplasmic reticulum; Ex-T, Nicotiana tabacum extensin terminator; Lac-O, lac operon; LIR, long intergenic region; Nos-T, A. tumefaciens nopaline synthase terminator; P, actin 2 promoter or 35S promoter (depending on vector); p19, tomato bushy stunt virus p19 silencing suppressor; SAR, scaffold attachment region (of diverse origins); Select M, selection marker; SIR, short intergenic region; T, either Nos terminator or no regulatory element (depending on vector); TU, transcription unit; UTR, untranslated regions (of diverse origins).

Efficient (transient) expression vectors are, with few exceptions, based on viral sequences (Fig. 2a). Viruses have co-evolved with prokaryotic and eukaryotic genomes, resulting in a substantial portion of the eukaryotic genome being of viral origin21. Therefore, the eukaryotic genome contains viral regulatory genomic elements with strong activity in their host cells, a feature intensively used in plant biology and biotechnology.

35S Promoter

The 35S promoter from the cauliflower mosaic virus (CaMV) has been used for foreign gene expression since the mid-1980s, when the first engineered CaMV 35S promoter (5′-deletion) fragments were applied to direct transgene expression in tobacco22. The modular architecture of the promoter results in synergistic activities. In particular, a ‘minimal 35S promoter’ has been explored for molecular farming. This minimal promoter, which is made of a ~90 bp fragment that contains a TATA-box23, can be designed as duplication (2 × 35S), with enhancer sequences or ligand-activated gene expression23. This minimal 35S promoter has been applied in the majority of transgene expression approaches, making it a key genetic element in plant biotechnology. The Modular Idempotent DNA Assembly (MIDAS) system is one of the most advanced expression vectors, relying on 35S-driven expression (Fig. 2a) and carrying various elements for rapid assembly of multiple genes with yields of several grams per kilogram of leaf material24. The MIDAS system has originally been derived from pTRA vectors and has been optimized to achieve high yields in recombinant protein expression25. These vectors carry 2 × 35S promoter and expression enhancer sequences. Of note, non-viral, plant-derived promoters also seem to work efficiently (for example, the plastocyanin-promoter from Medicago sativa); however, these vectors have been developed by companies and are not commercially available, limiting their widespread use9,26.

5′ and 3′ UTRs

Viral sequences, such as translational enhancers in the form of 5′ and 3′ untranslated regions (UTRs), typically derived from RNA viruses, can be applied to maximize protein yield. For example, a 5′ leader sequence of the tobacco etch virus or tobacco mosaic virus (TMV)27 is part of many expression modules in molecular farming24,25,28. Such elements also provide the basis of a cowpea mosaic virus (CPMV)-based expression system (pEAQ) (Fig. 2a) that uses 5′ and 3′ UTRs of RNA2 of CPMV to direct expression of target genes placed between a 35S promoter and nopaline synthase terminator29. Moreover, expression systems can be improved by rationally designing new synthetic 5′ and 3′ UTRs30. Interestingly, combining terminators in tandem leads to synergistic effects, with an over 25-fold increase in expression31.

p19 Silencing suppressor

Foreign gene expression in plants is limited by the onset of RNA silencing, which negatively affects target gene expression. Plant viruses encode suppressors of RNA silencing and can thus be used as tools to counteract RNA silencing; for example, the tomato bushy stunt virus (TBSV)-encoded p19 element suppresses RNA silencing of foreign genes by sequestering short-interfering RNAs32. This element and engineered versions thereof may also work in mammalian systems33 and are present in many plant expression approaches, either integrated into the vector backbone or separately co-expressed24,34. The p19 elements can achieve a 10–20-fold increase in transgene expression; however, the effect seems to be protein dependent35.

Transient expression vectors

The development of efficient transient expression vectors mainly relies on viral sequences. Here, viral backbones of the RNA viruses TMV, potato virus X (PVX), CPMV and the DNA geminivirus bean yellow dwarf virus (BeYDV)36 are commonly used. Cloned TMV cDNA can generate infectious in vitro transcripts in plants, enabling the expression of foreign sequences; however, to allow economic-scale production, the expression vector had to be optimized, including the design of prototypes of industrial processes that provide an economic yield as well as rapid scale-up and manufacturing cycles. In addition, GMP-certified manufacturing facilities had to be established36,37,38. Here, TMV- and PVX-based magnICON and CPMV-based pEAQ vectors dominate in industrial applications (Fig. 2a).

These viral vectors have been optimized through the sequential elimination of viral elements that are not required for recombinant protein expression and by engineering the remaining viral sequences. For example, the engineered TMV/PVX vector magnICON carries an optimized sequence backbone for nuclear expression (Fig. 2a), allowing the production of up to 80% of total soluble proteins39. The rational design is based on elimination or replacement of nucleotides that may negatively interfere with nuclear expression (such as potential splice sites) because positive-strand RNA viral genomes are optimized for cytoplasmic rather than nuclear replication. Moreover, the co-delivery of engineered non-competing viruses (PVX and TMV) enables the efficient expression of multi-component proteins such as antibodies40. Similarly, in the CPMV vector pEAQ-HT, only the 5′ and 3′ UTRs of CPMV–RNA2 are used to directly express a gene of interest (Fig. 2a). The two regulatory sequences are placed between the 35S promoter and an A. tumefaciens nopaline synthase terminator, resulting in recombinant expression at economically feasible levels (that is, milligrams of product per kilogram of tissue)29,41.

Viral elements can be engineered or replaced by synthetic elements to improve expression levels30. Transient expression vectors are typically based on RNA viruses. In addition, a vector derived from the DNA geminivirus BeYDV (Fig. 2a) has been designed by module combination, resulting in non-competing viral replicons for simultaneous co-expression of up to four proteins at milligram levels per kilogram of biomass28,31,42. Such vectors carry engineered 5′ and 3′ UTRs and multiple expression cassettes arranged as a single large replicon or multiple replicons (here, the gene cassettes are separated by intergenic regions). These vectors can also carry targeting signals, for example, signal peptide sequences for correct subcellular deposition. Correct protein targeting is important because it defines the nature of the final product, including many post-translational modifications. Of note, secretory proteins deposited in the apoplastic space carry the most elaborated post-translational modifications but are often prone to extensive proteolytic degradation43.

Transient expression modules enable the rapid production of new, high-quality biopharmaceutical products, for example, during a viral outbreak, including therapeutics (monoclonal antibodies), vaccines (protein subunits, virus-like particles) and diagnostics (antigens, antibodies)3,4. For example, during the Ebola outbreak in West Africa in 2014–2016, a monoclonal antibody cocktail could be rapidly produced in plants and received FDA approval. In addition, personalized vaccines and therapeutics, such as cancer vaccines, which are difficult to produce because the molecular causes and antigens are often patient specific, can be produced in plants by transient expression. Whole-exome or RNA sequencing allow the rapid isolation of patient-specific antigens for the creation of specialized vaccines. For example, a personalized IgG-based vaccine has been developed against follicular lymphoma using the magnICON expression approach (NCT01022255)8,44. Here, 21 recombinant, personalized immunogens, consisting of a tumour-derived, plant-produced idiotypic antibody hybrid comprising the hypervariable regions of the tumour-associated light and heavy antibody chains, have been genetically grafted onto a common human IgG1 scaffold. The immunogens were produced in N. benthamiana, expressing the light and heavy chains of the idiotypic antibody. The purified antibodies were then chemically linked to carrier molecules to form a conjugate vaccine, demonstrating that the magnICON platform provides the robustness, yield, speed, cost-effectiveness and quality needed for the production and administration of individualized tumour-targeted vaccines in humans8,44.

In conjunction with the development of rapid expression systems, plant cell packs (PCPs) have been established that are prepared from plant cell suspension cultures. PCPs are based on microtitre plates and provide a versatile and scalable screening tool for recombinant protein production, enabling initial formulation and functional testing within 3–4 months post-DNA-construct delivery45. Moreover, the development of transient expression modules in plant cell suspension cultures has allowed the automated, high-throughput testing of expression constructs, including immediate scalability of expression46,47.

Plant-based biopharmaceuticals

Antibody engineering

Various pharmaceutically relevant proteins (>100), including vaccines, hormones, cytokines and growth regulators, have been produced in plants3,4,48,49,50. In addition, plants can be used to produce monoclonal antibodies and antibody-related products (for example, Fc fusions and conjugates), which are among the most relevant products in biopharmaceutical research and industry2; indeed, the top-five bestselling recombinant proteins are monoclonal antibodies1, with demand in the range of tonnes per year2.

Monoclonal antibody production in plants involves the expression of at least two types of polypeptides (heavy and light chain), which need to be correctly folded and assembled. Importantly, antibodies carry complex post-translational modifications such as disulfide bridges and glycosylation. For higher-order molecular forms (such as multimeric IgAs and IgMs), an additional peptide, the joining chain, is required to promote assembly11,51. Monoclonal antibodies can be stably and transiently expressed using established expression modules in various plant species50,52. Plant-based heavy chain and light chain open reading frames are typically inserted between appropriate promoter and terminator sequences, either in individual vectors or in tandem modules in a singular vector53,54. To direct monoclonal antibodies to the secretory pathway, where they are processed, open reading frames need to carry signal sequences, such as the barley α-amylase signal sequence, which enables the secretion of recombinant proteins55,56. Upon co-expression of the heavy and light chains, they assemble into Y-shaped heterodimeric structures (encompassing four polypeptides). IgA antibodies, mainly involved in mucosal immunity, often assemble into dimeric variants that contain nine polypeptides (four heavy chains, four light chains and one joining chain)53,57,58. Of note, the co-expression of IgM heavy chains, light chains and joining chains results in the formation of functionally active pentameric and hexameric variants51, the dominant molecular forms circulating in human plasma. The assembly of such hetero-multimeric IgM requires correct assembly of up to 24 polypeptides, carrying more than 50 glycans and more than 100 disulfide bonds. In mammalian plasma cells, IgMs are assembled in a multistep process along the secretory pathway and require the action of various chaperones and specialized organelles. In this respect, correct in planta production of IgM, which is one of the largest known human proteins, with a molecular weight of >700 kDa, is remarkable51.

Plants can produce correctly folded and functionally active monoclonal antibodies and derivatives, such as camelid nanobodies, tetravalent monoclonal antibodies, bifunctional monoclonal antibodies, single-domain fragments, single-chain variable fragments, diabodies and Fc fusions52,59, for potential applications in anticancer and antiviral drugs, diagnostics, and others. For example, ZMapp, a cocktail of three monoclonal antibodies against the Ebola virus, has been produced in plants using the magnICON expression system60,61. ZMapp was granted temporary authorization by the FDA and was administered to humans in various West African states, after preclinical testing in animals61. The ZMapp antibodies were generated in glycoengineered N. benthamiana plants62 and, thus, carried a targeted (fucose-free) glycosylation profile63. Therefore, ZMapp antibodies show higher efficacy compared to mammalian cell-produced orthologues60. Impressively, ZMapp production, from plant Agrobacterium infection to the final product, can be completed within 10 days38, which has enabled GMP-quality antibody production for emergency use and clinical trials within 1 and 3 months, respectively, after DNA-construct delivery38. The speed and efficiency of ZMapp production demonstrate the power of molecular farming to rapidly produce antibody-based therapeutics.

Monoclonal antibodies have also been produced against SARS-CoV-2, including by transient expression in N. benthamiana4,64, with some having received emergency use authorization by the FDA and the EMA65. The plant-produced antibodies showed equivalent functional potencies as mammalian cell-produced orthologues and some even have additional features; for example, a cocktail of two plant-derived SARS-CoV-2 monoclonal antibodies exhibits synergetic effects66, and an anti-IL-6 receptor monoclonal antibody blocks SARS-CoV-2-induced IL-6 signalling, a typical feature of severe COVID-19 (ref. 67). The majority of SARS-CoV-2 monoclonal antibodies are of IgG1 format. Systematic serological profiling of patients with COVID-19 and convalescent sera showed a discerning appearance of IgG subtypes, with unusually high levels of specific IgG3, an IgG subtype known for its potent induction of effector functions. In addition, IgA antibodies, typical for mucosal immunity, are significantly increased in COVID-19 convalescent sera68. These antibody variants are difficult to express recombinantly owing to their multimeric forms, extended O-glycosylated hinge regions and proneness to aggregation and proteolytic degradation. Nevertheless, diverse SARS-CoV-2 antibody formats have been engineered using isotype and subtype switches69,70,71 (Fig. 3).

Fig. 3: Antibody engineering in plants.
figure 3

A modular cloning approach using magnICON plasmids facilitates the rapid engineering and expression of antibody variants, exploiting the largely independent nature of constant and variable regions (Fab). The N-terminally located Fab is generated by variable heavy and light chains (VH, VL). The C-terminally located constant region is generated by constant heavy and light chains (CH, CL) which, in part make up the Fc region. Two plasmids are generated, carrying the CH, CL, respectively. Both plasmids carry a cloning site for rapid insertion of variable sequences (VH or VL). This design allows the rapid cloning and expression of various monoclonal antibody formats with identical antigen binding but a different constant region69,70. 3′ UTR, 3′ untranslated region of potato virus X; 35S-P, cauliflower mosaic virus 35S promoter; Act2-P, Arabidopsis thaliana actin 2 promoter; Nos-T, Agrobacterium tumefaciens nopaline synthase terminator; seq, sequences.

To study antibody activities, it is interesting to disconnect the two regions, the N-terminally located variable region (Fab) responsible for antigen binding, and the C-terminally located Fc region (fragment crystallizable region), which involves cellular activities, such as antibody-dependent cellular cytotoxicity and induction of phagocytosis. A modular system has been applied for the flexible and rapid expression of recombinant SARS-CoV-2 antibody formats in N. benthamiana. Viral-based transient expression vectors, carrying regulatory elements (promoter and terminator sequences and targeting signals) can be connected with different constant regions representing IgG subtypes and other isotypes69,70,72, allowing the rapid exchange of variable and constant regions and the subsequent transient expression of various monoclonal antibody formats with identical antigen binding but different constant regions (Fig. 3). Using this approach, a series of SARS-CoV-2 monoclonal antibodies, representing all four IgG subclasses (IgG1–IgG4) and IgA in its monomeric and dimeric forms, were generated. Comparative functional activity assays revealed a significantly higher neutralization potency of IgA dimers and IgG3 formats than of the IgG1 orthologue69,70, which serves as the gold standard in therapeutic antibody development, highlighting the ability of plants to rapidly produce difficult-to-express mammalian therapeutic proteins.

Plant glycosylation mutants62,73,74 and recombinant glycosylation enzymes75,76,77 can be applied to engineer post-translational modifications of recombinant glycoproteins, including monoclonal antibodies56,70,72,73,74,78, in plants. Furthermore, rare antibody post-translational modifications, such as tyrosine sulfation, can be engineered to increase antibody activities76. In addition, by combining interchangeable modular tools, glycoproteins with a targeted glycosylation profile and other post-translational modifications can be produced in a relatively short time period (a few weeks)79, allowing the side-by-side comparison of unknown post-translational, modification-mediated structural and functional features. Importantly, such engineered proteins often exhibit enhanced activities60,79.

Protein sialylation in plants

Post-translational modification biosynthetic pathways, such as protein sialylation, can be transferred into N. benthamiana. Protein sialylation is a complex human glycan modification, carrying essential roles in various biological processes, such as cell–cell interactions and protein stability, and is thus a desired feature in a biopharmaceutical production platform80. Although plant cells can perform human-type, complex protein N-glycosylation, these structures usually terminate with N-acetylglucosamine residues (Fig. 4a,b), lacking human-typical diversifications such as galactosylation and sialylation. However, the rather simple plant glycosylation repertoire provides a biotechnological advantage because it typically leads to high glycan homogeneity of recombinant proteins. Such glycan consistency is difficult to achieve in mammalian cell production. Notwithstanding, typical plant N-glycans carry β-1,2-xylose and α-1,3-fucose, which are not found in mammals. Therefore, plant species have been glycoengineered to obtain a humanized N-glycosylation pattern79, including N. benthamiana62,73,74 and Nicotiana tabacum suspension cells81,82, Physcomitrella83, and Oryza sativa84. Such mutants produce the conserved minimal complex N-glycan structure (so-called GnGn structures) (Fig. 4b), well suited for further engineering to obtain a well-defined glycosylation79,85.

Fig. 4: Engineering post-translational modifications in plants.
figure 4

a, Constructs for antibody expression and glycan engineering. The expression of antibodies requires the simultaneous delivery of two (IgG) or three (multimeric antibodies) genes into plants. Although genes of interest (GOIs) can be expressed by highly potent vectors, glycan engineering uses low-expressing to medium-expressing modules, with appropriate regulatory elements16,17,55,87,130. The modules may be co-delivered in single-gene or multi-gene constructs, together with the antibody genes. b, Human sialylation pathway engineering in plant cells. Starting from the sugar nucleotide uridine diphosphate (UDP)-GlcNAc, which is abundantly present in plants, the recombinant expression of six foreign proteins is needed for in planta generation of sialylated N-glycans. The foreign proteins act in the cytoplasm, nucleus and Golgi. Mouse UDP-N-acetylglucosamine-2-epimerase/N-acetylmannosamine kinase (GNE), human N-acetylneuraminic acid phosphate synthase (NANS) and human cytidine monophosphate (CMP)-N-acetylneuraminic acid synthase (CMAS) are required to produce the activated sugar nucleotide precursor CMP-Neu5Ac. Mouse CMP-sialic acid transporter (CST) transports CMP-Neu5Ac to the Golgi, where it is transferred to the acceptor substrate, galactosylated N-glycans, by rat α-2,6-sialyltransferase (ST) to form sialylated structures. Galactosylated structures are generated by human β-1,4-galactosyltransferase (GalT), an enzyme that transfers UDP-galactose to GlcNAc-terminating structures. GlcNAc-terminated glycans and UDP-galactose are plant-intrinsic elements; however, GalT has to be expressed ectopically. c, Model of pentameric IgM. The protein contains complex N-glycans, which are represented in red and blue, and oligomannosidic glycans, which are shown in orange and red (top view51). 35S-P and 35S-T, cauliflower mosaic virus 35S promoter and terminator; 3′ UTR, 3′ untranslated region of potato virus X; 5′ UTR, 5′ untranslated region from tobacco etch virus; Act-P, Arabidopsis thaliana actin; Act2-P, A. thaliana actin 2 promoter; Ags-T, Agrobacterium tumefaciens agropin synthase terminator; g7-T, A. tumefaciens gene 7 terminator; GnGn, N-glycan structure; Ig-HC and LC, immunoglobulin heavy and light chain; JC, joining chain; Mas-P, A. tumefaciens mannopine synthase promoter; Mas-T, A. tumefaciens mannopine synthase terminator; Nos-T, A. tumefaciens nopaline synthase terminator; Ocs-P, A. tumefaciens octopine synthase promoter; Ocs-T, A. tumefaciens octopine synthase terminator; Select M, selection marker; seq, sequences.

Engineering of in planta protein sialylation is a complex procedure. A priori theoretical analysis (Fig. 4a,b), based on the human sialic acid pathway, suggested that plants not only miss important enzymatic elements for the transfer of sugar nucleotides to the appropriate substrate (for example, transfer of galactose and sialic acid by β-1,4-galactosyltransferase (GalT) and α-2,6-sialyltransferase) but also lack the entire biosynthetic pathway for the synthesis of the sugar precursor CMP-5-N-acetylneuraminic acid86,87. Analysis of metabolic capabilities further suggested the requirement that six foreign genes be simultaneously delivered into each plant cell to establish protein sialylation86,87 (Fig. 4a,b). Moreover, the model indicated that the resulting proteins have to act in a highly coordinated manner in different subcellular compartments, including the cytoplasm, the nucleus and the Golgi apparatus. Thus, the genetic parts and analytical methods needed to be carefully selected to facilitate detailed standardized characterization of each individual part. Standardized expression vectors were designed that carry consistent combinations of individual sequences (promoter, coding sequence and terminator) for genetic modules and circuits (Fig. 4a). By placing the coding sequence of each gene between identical promoter–terminator backbone sequences, the sequences were individually tested for expression and activity86,87. Subsequently, the genes were co-expressed in N. benthamiana from these single unit vectors, demonstrating coordinated activity, as shown by the synthesis of sialylated reporter proteins (Fig. 4b,c). However, the concurrent synthesis of unusual glycan structures indicated an undesired interaction of the foreign genetic elements and products with the intrinsic pathway.

Choosing the adequate promoter is key to optimizing gene product interactions and activities; for example, the enzyme GalT acts best when its expression is driven by medium–strong promoters, whereas high expression (as obtained by strong 35S-mediated expression) may negatively impact galactosylation16. Importantly, the use of multi-gene vectors, in which the coding sequence of the six genes is placed between different promoter–terminator sequences, allowed the generation of stably transformed plants that can efficiently sialylate proteins16,17 (Fig. 4b,c). Additional optimization strategies, that is, engineering of glycosylation enzymes, resulted in up to 80% sialylation of target proteins16,17. Notably, transgenic, glycoengineered N. benthamiana lines grew and expressed recombinant proteins in a similar manner to wild-type plants. However, an unwanted side effect was a substantial reduction in seed production. A similar strategy can be applied to other plant-based expression systems, such as Physcomitrella patens and N. tabacum but with less success, indicating the complexity of the process and the importance of proper circuits85,88.

Sialic acid can also form polymeric structures, with its most complex form, polysialic acid (polySia), reaching a degree of polymerization of up to 400 (ref. 89). This sugar polymer fulfils different, important functions across different species, such as neural cell regeneration and anti-inflammatory activity90,91. Thus, the design of such sugar polymers is of great interest for disease diagnosis and therapeutic applications (such as treatment of neurodegenerative diseases or cancer). To engineer polySia into N. benthamiana, two human enzymes that catalyze polysialylation (that is, α-2,8-polysialyltransferase II and IV) can be individually delivered with the six genes required for mono-sialylation, together with genes encoding for reporter peptides. Here, a combination of different expression modules, including single-gene and multi-gene vectors, results in the expression of high-quality polySia56. Therefore, polySia can be engineered into plants by combining stable and transient expression, single and multiple expression modules, and the delivery of up to nine foreign genes that need to act in a highly synchronized mode in a single plant cell56. This is an outstanding example demonstrating the power of bioengineering to introduce new complex traits into plants.


Biological nanoparticles, such as virus-like particles and protein bodies, are naturally occurring particles with a diameter of ≤100 nm. They can have diverse structures (mostly highly repetitive) and various biological roles, ranging from intracellular storage of substances to intercellular communication. Such biological nanoparticles can be engineered with a specific composition, size and shape to enable a range of biological applications such as drug delivery and vaccine development.

Virus-like particles

Virus-like particles are formed by the spontaneous interaction between one or more viral structural proteins, resulting in particles with icosahedral, spherical or rod-like symmetry, depending on which virus they were derived from. Virus-like particles are typically assembled by viral proteins that lack viral genetic material and are therefore non-infectious. They also contain an internal cavity and may thus serve as delivery vehicles for biological material, including DNA, peptides, proteins and small drugs49,92. Importantly, virus-like particles retain the native antigenic conformation of the viral immunogenic proteins, and the repetitive structure of the original virion, displaying epitopes in a dense, repetitive array, and can therefore elicit a strong immune response.

Virus-like particles can be produced by molecular farming using non-enveloped or enveloped viruses. Among non-enveloped viruses that have been explored for targeted virus-like particle formation49, tobamoviruses (for example, TMV) and comoviruses (for example, CPMV) have most commonly been used thus far. CPMV can be applied to produce virus nanoparticles based on the complete CMPV virion, including genomic RNA, and virus-like nanoparticles based on the empty CPMV virion. RNA-free empty virus-like particles can be generated by transient co-expression of the precursor of two viral coat proteins and a viral proteinase in N. benthamiana using the pEAQ-HT expression system, as confirmed by atomic resolution cryo-electron microscopy, with yields comparable to native particles achieved through infection93. Such CPMV virus-like particles can be used as nanocarriers with a variety of payloads, including small-molecule drugs, nucleic acids, therapeutic proteins, contrast agents and photosensitizers for drug delivery, vaccines, diagnosis and other applications48,92,94. In addition, tobamovirus-based or comovirus-based virus-like particles are excellent epitope delivery platforms for antigens. Antigens displayed on the plant virus-like particle surface can interact with antigen-presenting complexes, resulting in the activation of the innate immune system95,96,97. Moreover, CPMV virus-like particles are able to induce high immunomodulatory activities in various animal models48,49.

Plant systems have also been used to generate virus-like particles based on animal and human viruses, such as the Norwalk virus and African horse sickness virus98,99, by transiently expressing capsid proteins in N. benthamiana. The spontaneously formed virus-like particles mimic the structure of the native viruses and can induce a strong immune response in animal models98,99. Remarkably, the features of the viruses can be conserved at the molecular level, even across phyla and kingdoms.

Virus-like particle assembly of enveloped viruses substantially differs from the assembly of non-enveloped viruses. In contrast to non-enveloped viruses, the formation of enveloped viruses is induced by the insertion of virus proteins in cellular membranes, followed by release of the synthesized viral particles through a sophisticated budding process. This particle formation process is often initiated in specialized organelles, such as endoplasmic reticulum–Golgi intermediate compartments100. Enveloped viruses can also form virus-like particles in mammalian cells upon recombinant expression of viral membrane proteins, which has been explored for vaccine design101. Membrane proteins from enveloped mammalian viruses can also be expressed in plants; for example, hepatitis virus surface antigens can be ectopically expressed in various plant species, including edible plants — a strategy currently investigated in clinical trials for vaccination4. Hepatitis B virus surface antigen, produced in tobacco, forms spherical particles similar to those found in serum and recombinant yeast102; however, detailed biophysical studies will need to be performed to confirm the similarity of the particles.

The membrane protein haemagglutinin of the enveloped influenza virus can be expressed in plants for vaccine development. Interestingly, haemagglutinin can assemble in planta and bud from the membrane to form independent enveloped virus-like particles even in the absence of viral coat protein components10. Moreover, the particles are immunogenic and protective in animal models9. Therefore, vaccine candidates have been explored based on carrier-free virus-like particles, including vaccines against seasonal influenza viruses and SARS-CoV-2 (refs. 103,104). As for influenza viruses, SARS-CoV-2 membrane spike proteins alone are sufficient, even preferred over co-expression of other viral membrane proteins, to induce virus-like particles in N. benthamiana103,104. Based on this approach, Medicago Inc. developed a vaccine candidate against influenza that carries a highly engineered spike protein version, including a plant leader peptide, a heterologous C-terminal transmembrane domain and stabilizing mutations104. Clinical studies have confirmed the safety and efficacy of the vaccine104,105, followed by FDA approval105,106. Of note, Medicago Inc. produced ten million doses of current GMP-level influenza vaccines 1 month after receiving the sequence of the virus9 and has announced a virus-like particle-based COVID-19 vaccine candidate, which was produced in just over 20 days after receiving the spike protein gene sequence104,105,106, highlighting the power of plant-based virus-like particle expression in emergency cases.

Protein bodies

Biopharmaceuticals can be encapsulated in microparticles or nanoparticles to generate therapeutics and vaccines. Protein body-based encapsulation exploits the natural property of plant cells of cellular or subcellular sequestration and protein-based assembly. Protein bodies naturally occur as storage organelles and can also be induced through the overexpression of recombinant proteins. This endogenous encapsulation mechanism allows the long-term storage of recombinant proteins without degradation or loss of activity and offers a platform for drug delivery. For example, zein, which is a storage protein in maize seeds, forms edible films that are resistant to microbial degradation, making them suitable as food and pharmaceutical coatings107. In particular, γ-zein can induce protein bodies; here, the truncated version of γ-zein, corresponding to the first 112 N-terminal amino acids, including a 19-kDa signal peptide, is sufficient to induce protein body formation, not only in the vegetative tissues of plants but also in fungi and mammalian cells108. This truncated version of γ-zein has been commercialized as Zera® (by Era Biotech, Spain), relying on the fusion of the N‐terminal sequence of γ‐zein to other proteins to induce the formation of protein bodies. Such γ-zein-based protein bodies can be applied to encapsulate recombinant proteins and show immunostimulatory effects, which may be beneficial for vaccine delivery109,110,111. In principle, the production of recombinant γ-zein-fusion peptides can be achieved by any (transient) expression approach.

Of note, recombinant protein production based on γ-zein induces endoplasmic reticulum-derived protein bodies. Although the recombinant proteins remain stably accumulated within these protein bodies, and the high density of these organelles permits recovery and purification processes of the protein product, the products carry features typical of the endoplasmic reticulum, including incompletely processed N-glycan structures such as high mannose glycans. Such structures might induce unwanted immunological reactions or rapid serum clearance of target proteins112.

Regulation and approval

Although the ability of plants to produce biopharmaceutically relevant proteins at high quality has been demonstrated in multiple cases, only few plant-based biopharmaceuticals have been investigated in clinical trials or have reached the market thus far (Table 1). Key reasons may be insufficient yields (at least for first-generation recombinant proteins generated by transgenic approaches), high purification costs and regulatory barriers. Yields can be increased by using potent expression vectors that allow the generation of products in milligram amounts within several days post-DNA-construct delivery. In addition, high productivity can be achieved by plastomic and seed-based expression, opening the road to new applications, particularly in oral delivery18,19,113. However, plastomic and seed-based production may not be suitable for secretory proteins with complex post-translational modifications because these production approaches usually do not provide all required modifications such as complex glycosylation.

Table 1 Plant-produced recombinant proteins on the market

A crucial step in the production procedure is downstream processing, which, although optimized for whole plants37,114,115, remains challenging owing to the accumulation of recombinant proteins within cellular networks (with the exception of less efficient systems such as hydroponic cultivation and rhizosecretion). Purification typically involves homogenization of plant tissue to release the target protein, which also causes the release of large quantities of host proteins and other plant components. The subsequent clarification of plant homogenates requires additional steps, increasing the costs of downstream processing; for example, the production of a purified monoclonal antibody using transgenic tobacco costs approximately € 1,000 per gram, for which the downstream process costs represent >80%37,114. The calculations are in line with techno-economic analyses (TEAs) demonstrating that purification steps take the main share116.

TEAs can define a manufacturing process for the evaluation of commercial viability, which is particularly important for a non-traditional process such as plant-based manufacturing. However, owing to the breadth of methodology (for example, different production platforms, production hosts, indoor and field propagation), only a limited number of TEAs are currently available (Table 2), which cover only a few plant-based manufacturing methodologies117. For example, plant-based manufacturing costs for a lectin-based drug are estimated to be approximately € 100 per gram, corresponding to <€ 1 per dose118. Similarly, the production costs for recombinant antibacterials, enzymes and antibodies are estimated to be within the range of mammalian cell-based production costs or even cheaper119. In particular, TEAs predict significant reductions in capital investment and >50% reductions in the cost of goods compared with other publicly available values120. Although these simulations are highly relevant, large-scale processes must be evaluated to confirm theoretical projections.

Table 2 Plant-based manufacturing techno-economic analyses117

A major barrier to the industrial uptake of molecular farming products is the uncertain intellectual property and regulatory landscape compared with that of well-established cell expression systems. Molecular farming companies tend to own intellectual property portfolios for their expression systems, which should, in theory, provide confidence to industrial partners. However, the restriction to individual proprietary technologies limits industry partners to individual platforms, which restricts their freedom to operate. Moreover, regulatory guidelines are uncertain because molecular farming often subverts industry norms, in which biopharmaceutical production is an entirely GMP-compliant cleanroom-based process. However, the upstream process of molecular farming, which uses whole plants, is typically only classified in the framework of good agricultural and collection practices rather than GMP. Although well-designed processes have been developed for plant-made biopharmaceuticals121, the regulatory procedures should be globally unified and streamlined122. In particular, suspension cultures of recombinant agrobacteria typically involve culture maintenance and preparation steps for agro-infiltration outside GMP processes. In addition, agrobacteria-based transient expression does not have a classical cell bank because the product is not present in Agrobacterium cells, and GMP usually starts with the product or structural precursor of the product, that is, 3–4 days after infiltration. Of note, the emergency regulations adopted by the FDA and EMA to facilitate COVID-19 drug development could also help facilitate the market transition of molecular farming products.


Plant-based bioengineering approaches could address the growing demand for new, innovative, protein-based biopharmaceuticals, particularly because mammalian cell-based systems may see a capacity shortage for several applications1. The currently limited translation of plant-produced proteins may be more related to industrial and regulatory inertia than to product inadequacy, given the plentiful evidence of functional equivalency or even superiority60,123. Potent, plant-based, modular transient expression systems have been developed that are scalable and could supply the world population with up to 7.5 billion vaccine doses per year in facilities with a capacity of >100 ha. For example, Medicago, Inc. announced a new production facility, spreading out over 9 ha (opening 2023/2024), which will be capable of providing up to 1 billion vaccine doses annually. The scalability of plants has recently been complemented by PCPs — a rapid high-throughput screening technology that facilitates the screening of more than 1,000 candidate variants per week, allowing expression results to be scaled to intact plants46. Importantly, TEAs of transient plant-based platforms predict a significant reduction in capital investment and cost of goods compared with published values at similar production scales120; however, these projections will need to be confirmed by industrial-scale productions.

Several viral outbreaks in the twenty-first century (for example, Ebola and coronaviruses) have demanded the rapid and global supply of new pharmaceutical products, which cannot be met by mammalian cell-based manufacturing facilities, which are nearly exclusively located in North America, Europe and Asia1. Plant-based manufacturing can easily be established in various settings, has minimal cold-chain requirements, and products can be administered through needle-free aerosol application (using nanoparticles and IgA antibodies). The approach may thus substantially contribute to managing disease outbreaks globally.

Two prominent examples serve as roadmap for the speed, flexibility and scalability of plant-based production: the generation of vaccines against emerging pathogens and modular design for individualized cancer vaccines8,44. Moreover, research on the ability of plants to generate self-assembled nanoparticles that may serve as self-adjuvanting vaccines or drug delivery vehicles, is just at the beginning and may advance drug development substantially48. Plant production also facilitates high-throughput screening, rapid translation to a multi-tonne biomass scale, and cost-efficient production of virus-like particles (and protein bodies) within 12 months, as demonstrated for influenza and SARS-CoV-2 vaccines9,105. This is substantially faster than animal cell-based expression systems. Therefore, plant-based platforms could ensure the rapid and global-scale deployment of biopharmaceuticals, promoting equitable access to pharmaceuticals.

Citation diversity statement

We acknowledge that papers authored by scholars from historically excluded groups are systematically under-cited. Here, we have made every attempt to reference relevant papers in a manner that is equitable in terms of racial, ethnic, gender and geographical representation.