The field of genomics has expanded enormously and witnessed many
exciting developments during the past few decades. Some of the most
significant advances
have resulted in the sequencing of genomes of many organisms and the
production of tools for the functional annotation and analysis of these
datasets. As sequencing methodologies become faster and cheaper,
genomic data have amassed at an ever increasing rate, leading to the
question
of how best to efficiently process and interpret such a large body of
information. Data visualization facilitates the ability of humans to
absorb and understand
information.
Several groups have responded to this need through the development of
web-based programs that allow the user to graphically view genomic
data. The first of these applications was the Saccharomyces Genome Database
(SGD),4,5 which was developed for the Saccharomyces cerevisiae genome assembly. A similar
model was adopted
by the WormBase project,6–8 which hosts data for Caenorhabditis elegans
and related worm species, and FlyBase, which hosts data for Drosophila species.9,10 For the human genome, a more integrated approach
was required to provide external
data resources and links to many other relevant databases
as well as faster
ways of viewing annotations of this much larger genome (i.e., thirty times larger than C. elegans). With the goal of visualizing the human genome,
three major genome browsers were developed:
- The Genome Browser at the University of California Santa Cruz (UCSC),11–17
- Ensembl18 at Wellcome Trust Sanger Institute (WTSI)/European Bioinformatics Institute (EBI),
and
- MapViewer19 at the National Center for Biotechnology Information (NCBI).
Additionally, GBrowse (developed as part of the Generic Model
Organism Database project)20 is the genome browser used by WormBase and FlyBase.
The UCSC
Genome Browser hosts genomes from a variety of organisms: As of September 2009, this included
24 vertebrates, including
14 mammals; 3 deuterostome species;
13 insects species, including 11 species
of Drosophila; 6 species of nematode worms; and a yeast. The UCSC Genome Browser is part of a package of tools accessible from the UCSC Genome
Bioinformatics website. The integration of these tools provides
researchers with methods for large-scale and powerful mining of
biological genome-related data. The UCSC Genome Browser
provides users with: visualization of results from genome-wide screens
such as single nucleotide polymorphisms (SNP) association studies,
linkage studies, and homozygosity studies;
definitions of candidate disease loci (i.e., chromosomal positions of genes or DNA regions that are implicated in a medical
condition); browsing of annotation features
at loci of interest; access to the evolutionary history of bases through the multiple alignments
of genomes; a tool for viewing polymorphisms; access to a rich variety
of information about genes (for some organisms) including links to external
databases in addition to expression
data and orthologs; and a gateway to a variety of other annotation features and biological
information. Scientists can also view and query their
own data in the context
of these annotations. Finally, for power-users, the UCSC Genome Browser can be installed locally
and configured extensively.
The UCSC Genome
Browser also includes sequence
data from the Mammalian Gene Collection (MGC) project1–3
at the National
Institutes of Health
(NIH). MGC provides sequence-validated complementary DNA (cDNA) clones
for human, mouse, rat and cow protein-coding genes for use as expression
tools in biological
and medical research. The cDNA clones are DNA sequences that are
complementary to transcripts produced from genes in the genome; clones
are synthesized from mature mRNA transcripts using reverse transcriptase
and DNA polymerase enzymes for catalysis. For any gene of interest, the
UCSC Genome
Browser can be used to analyze annotated features and determine the
availability of an MGC clone. Subsequently, via an external link, the
researcher can order the cDNA clone for use in the experimental
characterization of the gene.