BAYESIAN [METHOD] A statistical method of combining the likelihood with additional information to produce an overall estimate of the strength of a piece of evidence.
![]()
CAGE (Cap analysis of gene expression). The high-throughput sequencing of concatamers of DNA tags that are derived from the initial nucleotides of 5' mRNA.
![]()
FUTILITY THEOREM The authors' assertion that essentially all predicted transcription-factor (TF) binding sites that are generated with models for the binding of individual TFs will have no functional role.
![]()
GLOBAL ALIGNMENT The alignment of two sequences over their full length.
![]()
HIDDEN MARKOV MODEL (HMM). A probabilistic model for the recognition of patterns in DNA or protein sequences. HMMs represent a system as a set of discrete states and as transitions between those states. Each transition has an associated probability, which can be readily derived from training sets, such as alignments of known examples of a pattern. HMMs are valuable because they enable a search or alignment algorithm to be built on firm probabilistic bases.
![]()
HOMOTYPIC CLUSTER A cluster of similar transcription-factor (TF) binding sites, often binding the same TF.
![]()
INFORMATION CONTENT A measure of nucleotide conservation in a position, based on information theory.
![]()
LOCAL ALIGNMENT The detection of local similarities between two sequences.
![]()
MACHINE LEARNING The ability of a program to learn from experience that is, to modify its execution on the basis of newly acquired information. In bioinformatics, neural networks and Monte Carlo Markov Chains are well-known examples.
![]()
NEEDLEMANWUNSCH ALGORITHM A commonly used algorithm in bioinformatics that produces a global alignment of two sequences. The term 'global' refers to alignments across the entirety of the sequences. The algorithm returns an optimal alignment, in which 'optimal' refers to the highest possible score under a specific scoring system. The algorithm is computationally demanding, restricting its direct application to sequences of modest length.
![]()
NEURAL NETWORK A machine-learning technique that simulates a network of communicating nerve cells.
![]()
ORTHOLOGY Two sequences are orthologous if they share a common ancestor and are separated by speciation.
![]()
PHYLOGENETIC FOOTPRINTING An approach that seeks to identify conserved regulatory elements by comparing genomic sequences between related species.
![]()
PSEUDOCOUNT The sample correction that is added when assessing the probability to correct for small sample sizes (that is, few binding sites).
![]()
SAGE (Serial analysis of gene expression). A method for quantitative and simultaneous analysis of a large number of transcripts; short sequence tags are isolated, concentrated and cloned; their sequencing reveals a gene-expression pattern that is characteristic of the tissue or cell type from which the tags were isolated.
![]()
SELEX (Systematic evolution of ligands by exponential amplification). A set of laboratory procedures for the identification of representative sets of ligands for a protein. In the case of DNA-binding proteins, the protein is mixed with a pool of double-stranded oligonucleotides that contain a random core of nucleotides flanked by specific sequences. The protein in complex with bound DNA is recovered and the ligands are subsequently amplified by PCR. The recovered oligonucleotides are sequenced and analysed to reveal the binding specificity of the protein.
![]()