Question: how many different peptides composed of nine amino acids can a cell produce from a protein made up of 500 amino acids? Answer: 482 — the first peptide consists of amino acids 1 to 9 of the protein; the second contains amino acids 2 to 10; and so on to the last peptide, which comprises amino acids 482 to 500.

At least, that's what we thought. But the implication of the paper on page 252 of this issue, by Hanada and colleagues1, is that protein splicing allows for a much higher number. In other words, by slicing a protein into pieces, stitching different portions together, and then cutting out strings of nine sequential amino acids from the melded pieces, cells can manufacture entirely new sets of peptides from the original protein.

Hanada and colleagues' work centres on the use of peptides by the immune system. Every cell in the body is covered with peptides composed of eight to ten amino acids, glued to receptors known as major histocompatibility complex (MHC) class I molecules. The peptides represent every protein that is being made in the cell, and are crucial in allowing the immune system to detect and eliminate intruders. If, for instance, a virus has infiltrated a cell, then the evidence of its presence will be displayed on the cell surface. The immune system's 'killer' T cells will detect that tell-tale sign and take steps to destroy the infected cell.

Previously, Hanada and colleagues had cloned a human killer T cell that they discovered infiltrating a patient's kidney cancer2. They found that the T cells recognized a peptide derived from a particular cellular protein, fibroblast growth factor-5 (FGF-5), that is overproduced in the tumour. In this case, the peptide was presented on a type of MHC class I molecule called HLA-A*03, which is known to display nine-amino-acid peptides with a tyrosine residue at position 3 and a lysine or arginine at position 9.

But what is the precise FGF-5-derived peptide that is recognized by the killer T cells — in other words, what is the T-cell 'epitope'? To find out, Hanada et al.1 prepared target cells expressing truncated FGF-5 genes. The aim was to make it easier to identify the peptide by narrowing down the region of FGF-5 to search. They discovered that cells expressing a 60-amino-acid chunk of FGF-5, amino acids 161–220, were recognized by the T cells. Knowing the requirements of the MHC molecule, it should then have been a matter of routine to identify the T-cell epitope. But the T cells recognized none of the possible strings of eight, nine or ten sequential amino acids from the 60-amino-acid fragment.

A clue to the mystery came from the authors' finding that (accidentally?) omitting several amino acids from within this 60-amino-acid fragment did not prevent T cells from recognizing their targets. But cells expressing fragments that were shortened at either end were not recognized. This serendipity prompted the authors to analyse a series of more extended deletions. The smallest construct that allowed T-cell recognition consisted of amino acids 172–176 and 199–220 of FGF-5 — suggesting that what the peptide recognized was actually a patchwork, containing bits from two separate parts of the protein. Indeed, Hanada et al. found that a peptide consisting of five residues from one end of this construct and four from the other was the T-cell epitope in question.

How could a cell make such a cut-and-paste peptide? The authors first wondered whether RNA splicing was the answer. This is a process in which, after a gene has been copied into messenger RNA but before that mRNA is translated into protein, segments of the RNA are excised and the remaining fragments are spliced together. But Hanada et al. judged this to be unlikely, as none of the known 'signatures' of RNA splicing are present in the gene sequence. Could the answer instead be sloppy translation, with the protein-making machinery skipping stretches of mRNA and starting again later? Experiments ruled this possibility out. So a post-translational mechanism seemed likely.

Could the explanation be protein splicing? This would involve the cells in cutting the protein into pieces and sticking them back together in a different order. Similar processes of protein surgery have been observed before, in single-celled organisms and some plants. Hanada and colleagues — two of whom are, coincidentally, from a surgery department — find that it also occurs here. They show that cells can use a synthetic 49-amino-acid fragment, in which the two ends of the T-cell epitope are separated by 40 amino acids, to construct the intact nine-amino-acid epitope. This could only happen if the cells were to cut the protein fragment into pieces, join two of the bits with a peptide bond, and funnel the new piece into the normal epitope-generating pathway (Fig. 1).

Figure 1: Protein splicing and the immune system.
figure 1

Killer T cells scrutinize short peptides displayed on MHC class I molecules on the surface of other cells. Hanada et al.1 have discovered that the process of peptide generation from larger proteins can occur by protein splicing. a, A protein is cleaved by an (as yet unidentified) endopeptidase enzyme. b, Two of the pieces are sewn together (also by an unknown mechanism). c, The stitched-together intermediate is probably then sliced into shorter pieces by the proteasome. d, After further processing, nine-amino-acid peptides are glued to MHC class I molecules, to be displayed on the cell surface. Details such as cellular compartments, peptide transport and trimming 'exopeptidases' have been omitted for clarity.

So far, two categories of natural protein splicing have been described. In single-celled organisms, controlled protein splicing is an important means of generating functional proteins. It is regulated by 'inteins' (intervening sequences) that catalyse their own excision out of a protein; the flanking fragments, or 'exteins', are then ligated3,4. Inteins must have a particular structural domain if they are to catalyse this reaction, and the smallest intein that can have such a domain consists of 134 amino acids5,6. So it is unlikely that the events described by Hanada et al. are regulated in the same way. More similar is the process, seen in jack beans, of enzyme-mediated protein splicing — the ligation of polypeptide stretches to result in a functional protein7. In addition, protease enzymes, which generally slice up proteins, have been engineered to work in reverse8. Whatever the mechanism, this is, to my knowledge, the first time that protein splicing in human cells has been reported.

What are the implications of these findings? First, the potential number of different proteins and protein derivatives produced from our 30,000 genes increases enormously. Processes such as DNA recombination and RNA splicing were already known to increase this number; the discovery of protein splicing adds to the toolkit. Second, although we do not yet know how frequently protein splicing occurs, we have to consider the possibility that cells can produce non-continuous T-cell epitopes not only from tumour proteins but also from infectious agents and our own normal proteins — with implications for vaccine development and autoimmune diseases, as well as for cancer research.

Do mammalian cells use protein splicing to produce functional proteins or peptides that have any physiological role apart from the one leading to its discovery? We have no idea. The enzymes involved in protein splicing are also unknown. Those responsible for cutting might be conventional 'endopeptidases', including the cell's garbage-disposal unit, the proteasome. But what entity can join two protein fragments together? Can proteases work in reverse naturally, as well as being engineered8 to do so? Could the whole process of cutting and ligation happen inside the proteasome? Also, do sequence motifs dictate where within a protein splicing occurs? Here at least we might know the answer: given that the T-cell epitope described by Hanada et al.1 is identical in every one of the kidney-tumour cells — and in other cells that are engineered to overproduce FGF-5 — this must be the case.

More questions asked than answered? Take it as an indication of an unexpected discovery.