Box 1. Frequently used abbreviations in this paper

From the following article:

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

The ENCODE Project Consortium

Nature 447, 799-816(14 June 2007)

doi:10.1038/nature05874

BACK TO ARTICLE

AR Ancient repeat: a repeat that was inserted into the early mammalian lineage and has since become dormant; the majority of ancient repeats are thought to be neutrally evolving.

CAGE tag A short sequence from the 5' end of a transcript

CDS Coding sequence: a region of a cDNA or genome that encodes proteins

ChIP-chip Chromatin immunoprecipitation followed by detection of the products using a genomic tiling array

CNV Copy number variants: regions of the genome that have large duplications in some individuals in the human population

CS Constrained sequence: a genomic region associated with evidence of negative selection (that is, rejection of mutations relative to neutral regions)

DHS DNaseI hypersensitive site: a region of the genome showing a sharply different sensitivity to DNaseI compared with its immediate locale

EST Expressed sequence tag: a short sequence of a cDNA indicative of expression at this point

FAIRE Formaldehyde-assisted isolation of regulatory elements: a method to assay open chromatin using formaldehyde crosslinking followed by detection of the products using a genomic tiling array

FDR False discovery rate: a statistical method for setting thresholds on statistical tests to correct for multiple testing

GENCODE Integrated annotation of existing cDNA and protein resources to define transcripts with both manual review and experimental testing procedures

GSC Genome structure correction: a method to adapt statistical tests to make fewer assumptions about the distribution of features on the genome sequence. This provides a conservative correction to standard tests

HMM Hidden Markov model: a machine-learning technique that can establish optimal parameters for a given model to explain the observed data

Indel An insertion or deletion; two sequences often show a length difference within alignments, but it is not always clear whether this reflects a previous insertion or a deletion

PET A short sequence that contains both the 5' and 3' ends of a transcript

RACE Rapid amplification of cDNA ends: a technique for amplifying cDNA sequences between a known internal position in a transcript and its 5' end

RFBR Regulatory factor binding region: a genomic region found by a ChIP-chip assay to be bound by a protein factor

RFBR-Seqsp Regulatory factor binding regions that are from sequence-specific binding factors

RT–PCR Reverse transcriptase polymerase chain reaction: a technique for amplifying a specific region of a transcript

RxFrag Fragment of a RACE reaction: a genomic region found to be present in a RACE product by an unbiased tiling-array assay

SNP Single nucleotide polymorphism: a single base pair change between two individuals in the human population

STAGE Sequence tag analysis of genomic enrichment: a method similar to ChIP-chip for detecting protein factor binding regions but using extensive short sequence determination rather than genomic tiling arrays

SVM Support vector machine: a machine-learning technique that can establish an optimal classifier on the basis of labelled training data

TR50 A measure of replication timing corresponding to the time in the cell cycle when 50% of the cells have replicated their DNA at a specific genomic position

TSS Transcription start site

TxFrag Fragment of a transcript: a genomic region found to be present in a transcript by an unbiased tiling-array assay

Un.TxFrag A TxFrag that is not associated with any other functional annotation

UTR Untranslated region: part of a cDNA either at the 5' or 3' end that does not encode a protein sequence

BACK TO ARTICLE