Sir

The term ‘gene’ is widely used by biologists and is also in the public vocabulary, yet it has no precise universally accepted molecular definition.

Some authors consider that a gene includes the regulatory sequences required for its expression1. But genes do not have to be expressed to be present. The somatic cells of a multicellular organism all have the same genes, but particular cell types express only some of them.

Inclusion of regulatory sequences expands the term ‘gene’ from a specification of ‘what it is’ to indicate also ‘how it is used’. It may be useful to separate these two concepts. It is thought that major evolutionary changes in the morphology of organisms have resulted from altered patterns of synthesis of the same gene products2. It seems reasonable to consider that the genes themselves have not changed, but rather the ways in which they have been used.

Inclusion of regulatory sequences introduces considerable complexity to the term ‘gene’. There are many different types of regulatory elements, and they generally operate in complex combinations. Some of them, such as enhancers, are nonspecific and influence any compatible promoter within their range.

Others, such as the promoters of polycistronic mRNAs, control the synthesis of several gene products. Furthermore, there is already an acceptable term (‘operon’) for a unit of gene expression (page 339 of ref. 3).

Should introns be considered as parts of genes? When first discovered, genes with introns were described as ‘interrupted’. This implies that a gene consists only of its exons; otherwise it would be continuous. An ‘exons only’ definition highlights the evolutionary theory that new genes can be created by shuffling existing exons into new combinations. There is already an acceptable term (‘transcription unit’) for the entire region, including introns, between a promoter and a transcriptional terminator(page 366 of ref. 4). In cases of alternative splicing of primary transcripts, it has been suggested that, unless the protein products are very different, one gene can encode a series of protein isoforms (page 457 of ref. 4).

In my opinion, the single best molecular definition of the term ‘gene’ is the following: it is the nucleotide sequence that stores the information which specifies the order of the monomers in a final functional polypeptide or RNA molecule, or set of closely related isoforms. This definition is simple and concise. Geneticists can readily find other names for more complex genetic entities such as ‘operons’ and ‘transcription units’.