The complexity of the human transcriptome remains a mystery because up to 99% of the human genome consists of non-coding genomic regions that are still to be explored. New analysis by Cheng et al. provides a significant insight into this non-coding part of the transcriptome. The findings have important implications for our interpretation of gene definition and the regulation of gene expression.

Although it has been shown that non-coding regions are transcribed into stable cytosolic RNAs during development, little is known about their real number. The report by Cheng et al. has just revealed that unannotated, non-polyadenylated transcripts comprise a large proportion of the transcriptional output of the human genome.

In an attempt to map the sites of transcription for cytosolic RNAs with polyadenylation (poly-A) tails, the authors used 25-mer oligonucleotides spaced every 5 bp — which provided a seven-fold increase in resolution over previous studies — and identified, on average, 18,694,360 nucleotides that were transcribed as cytosolic polyadenylated RNAs in 8 cell lines. Taking this a step further, they combined the rapid amplification of cDNA ends (RACE) technique with high-density arrays to take a closer look at the 'junk' genomic regions of 10 human chromosomes.

Their results show that the number of transcribed sequences associated with transcripts that do not have poly-A tails is twofold greater than the number of sequences transcribed as polyadenylated RNAs. Interestingly, this is also the case for cytosolic transcripts, raising as yet unanswered questions concerning their potential function. Moreover, the authors saw that the transcripts encoded on both strands use the same sequences in almost 50% of the investigated cases and they suggest that overlapping transcription has to be taken into consideration when analysing genomic regions that have been mapped as disease loci.

Although this study examines 30% of the human genome, it provides useful insights into the organization of the human transcriptome and highlights the importance of systematic identification and characterization of the regions that we are less and less justified in referring to as 'junk'.