Box 1. Frequently used abbreviations in this paper
From the following article:
The ENCODE Project Consortium
Nature 447, 799-816(14 June 2007)
doi:10.1038/nature05874
AR Ancient repeat: a repeat that was inserted into the early mammalian lineage and has since become dormant; the majority of ancient repeats are thought to be neutrally evolving.
CAGE tag A short sequence from the 5' end of a transcript
CDS Coding sequence: a region of a cDNA or genome that encodes proteins
ChIP-chip Chromatin immunoprecipitation followed by detection of the products using a genomic tiling array
CNV Copy number variants: regions of the genome that have large duplications in some individuals in the human population
CS Constrained sequence: a genomic region associated with evidence of negative selection (that is, rejection of mutations relative to neutral regions)
DHS DNaseI hypersensitive site: a region of the genome showing a sharply different sensitivity to DNaseI compared with its immediate locale
EST Expressed sequence tag: a short sequence of a cDNA indicative of expression at this point
FAIRE Formaldehyde-assisted isolation of regulatory elements: a method to assay open chromatin using formaldehyde crosslinking followed by detection of the products using a genomic tiling array
FDR False discovery rate: a statistical method for setting thresholds on statistical tests to correct for multiple testing
GENCODE Integrated annotation of existing cDNA and protein resources to define transcripts with both manual review and experimental testing procedures
GSC Genome structure correction: a method to adapt statistical tests to make fewer assumptions about the distribution of features on the genome sequence. This provides a conservative correction to standard tests
HMM Hidden Markov model: a machine-learning technique that can establish optimal parameters for a given model to explain the observed data
Indel An insertion or deletion; two sequences often show a length difference within alignments, but it is not always clear whether this reflects a previous insertion or a deletion
PET A short sequence that contains both the 5' and 3' ends of a transcript
RACE Rapid amplification of cDNA ends: a technique for amplifying cDNA sequences between a known internal position in a transcript and its 5' end
RFBR Regulatory factor binding region: a genomic region found by a ChIP-chip assay to be bound by a protein factor
RFBR-Seqsp Regulatory factor binding regions that are from sequence-specific binding factors
RT–PCR Reverse transcriptase polymerase chain reaction: a technique for amplifying a specific region of a transcript
RxFrag Fragment of a RACE reaction: a genomic region found to be present in a RACE product by an unbiased tiling-array assay
SNP Single nucleotide polymorphism: a single base pair change between two individuals in the human population
STAGE Sequence tag analysis of genomic enrichment: a method similar to ChIP-chip for detecting protein factor binding regions but using extensive short sequence determination rather than genomic tiling arrays
SVM Support vector machine: a machine-learning technique that can establish an optimal classifier on the basis of labelled training data
TR50 A measure of replication timing corresponding to the time in the cell cycle when 50% of the cells have replicated their DNA at a specific genomic position
TSS Transcription start site
TxFrag Fragment of a transcript: a genomic region found to be present in a transcript by an unbiased tiling-array assay
Un.TxFrag A TxFrag that is not associated with any other functional annotation
UTR Untranslated region: part of a cDNA either at the 5' or 3' end that does not encode a protein sequence
