When the complete sequence of human chromosome 22 was first published in 1999 (ref. 4), John Rinn, an assistant professor at Beth Israel Deaconess Medical Center and an associate member of the Broad Institute in Cambridge, Massachusetts, got very excited. He was not interested in looking at the map of known protein-coding genes on the chromosome, but rather everything else. “We wanted to see if we could find biologically active molecules in the human genome that no one previously knew about,” he says.

Armed with the sequence of an entire chromosome — and a year later the whole human genome — researchers and developers began to create genome-wide tiling microarrays. “By probing these tiling arrays we found out that there are tonnes of biologically active regions by proxy of RNA being made,” says Rinn — results he and his colleagues reported in 2003 (ref. 5). Since then, Rinn has focused his efforts on understanding a collection of these RNAs known as large intervening non-coding RNAs (lincRNAs).

“Initially many people thought that this had to be an artefact of the technology: how could there be so many RNA molecules that we have never seen before?” says Rinn. Arguments against a true biological purpose for lincRNAs came largely from the lack of evolutionary conservation within their sequences — conservation implies function, whereas lack of conservation can often imply noise.

As so few functional lincRNAs had been described, Rinn and his colleagues set out to find more. In 2007 they reported the identification of a new 2.2-kilobase large non-coding RNA, which they called HOTAIR. It played a role in the guiding of chromatin complexes within the cell6. Although only a single new functional lincRNA — and still only one of four known to be functional at the time — the discovery gave Rinn an idea on how to enrich for functional lincRNAs from the genome.

“What we did next was to go after things that looked like HOTAIR,” he explains. Instead of using an RNA-based approach, the group decided to look at chromatin structure. Histones have clear indications of where active genes start and stop. Using high-throughput chromatin immunoprecipitation (ChIP) sequencing on the Illumina Genome Analyzer to look for these marks, Rinn and his colleagues at the Broad Institute developed genome-wide chromatin state maps. Then, just as with his analysis of chromosome 22 almost ten years ago, Rinn says he threw out the known protein-coding genes and looked at what was left. He identified 1,600 other RNAs located by themselves in the middle of nowhere in the genome that look just like HOTAIR7.

HOTAIR is one of an increasing number of functional non-coding RNAs identified from the human genome. Credit: J. RINN

To determine if some of their newly discovered RNAs were functional, the team took a 'guilt by association' approach, using microarrays to profile a number of the newly identified lincRNAs in 21 different tissue samples while at the same time profiling protein-coding genes in the same tissue samples.

Then they asked the question: which RNAs had similar profiles to protein-coding genes of known function? Their initial analysis was followed by further validation using independent systems. “This has turbo-charged the field, as not only can we identify these things now but we can get a good idea of what they might be doing to test functional relationships,” says Rinn.

For Rinn and his colleagues it is now time to muster all the force they can to explore these RNAs. “We are going to throw the Broad kitchen sink at them,” says Rinn, who is teaming up with a number of scientific platforms at the Broad Institute to look at the effects of knocking down each newly discovered lincRNA.

N.B.