1.1 Introduction

The field of genomics has expanded enormously and witnessed many exciting developments during the past few decades. Some of the most significant advances have resulted in the sequencing of genomes of many organisms and the production of tools for the functional annotation and analysis of these datasets. As sequencing methodologies become faster and cheaper, genomic data have amassed at an ever increasing rate, leading to the question of how best to efficiently process and interpret such a large body of information. Data visualization facilitates the ability of humans to absorb and understand information.

Several groups have responded to this need through the development of web-based programs that allow the user to graphically view genomic data. The first of these applications was the Saccharomyces Genome Database (SGD),^4,⁵ which was developed for the Saccharomyces cerevisiae genome assembly. A similar model was adopted by the WormBase project,⁶^–⁸ which hosts data for Caenorhabditis elegans and related worm species, and FlyBase, which hosts data for Drosophila species.^9,¹⁰ For the human genome, a more integrated approach was required to provide external data resources and links to many other relevant databases as well as faster ways of viewing annotations of this much larger genome (i.e., thirty times larger than C. elegans). With the goal of visualizing the human genome, three major genome browsers were developed:

The Genome Browser at the University of California Santa Cruz (UCSC),¹¹^–¹⁷
Ensembl¹⁸ at Wellcome Trust Sanger Institute (WTSI)/European Bioinformatics Institute (EBI), and
MapViewer¹⁹ at the National Center for Biotechnology Information (NCBI).

Additionally, GBrowse (developed as part of the Generic Model Organism Database project)²⁰ is the genome browser used by WormBase and FlyBase.

The UCSC Genome Browser hosts genomes from a variety of organisms: As of September 2009, this included 24 vertebrates, including 14 mammals; 3 deuterostome species; 13 insects species, including 11 species of Drosophila; 6 species of nematode worms; and a yeast. The UCSC Genome Browser is part of a package of tools accessible from the UCSC Genome Bioinformatics website. The integration of these tools provides researchers with methods for large-scale and powerful mining of biological genome-related data. The UCSC Genome Browser provides users with: visualization of results from genome-wide screens such as single nucleotide polymorphisms (SNP) association studies, linkage studies, and homozygosity studies; definitions of candidate disease loci (i.e., chromosomal positions of genes or DNA regions that are implicated in a medical condition); browsing of annotation features at loci of interest; access to the evolutionary history of bases through the multiple alignments of genomes; a tool for viewing polymorphisms; access to a rich variety of information about genes (for some organisms) including links to external databases in addition to expression data and orthologs; and a gateway to a variety of other annotation features and biological information. Scientists can also view and query their own data in the context of these annotations. Finally, for power-users, the UCSC Genome Browser can be installed locally and configured extensively.

The UCSC Genome Browser also includes sequence data from the Mammalian Gene Collection (MGC) project¹^–³ at the National Institutes of Health (NIH). MGC provides sequence-validated complementary DNA (cDNA) clones for human, mouse, rat and cow protein-coding genes for use as expression tools in biological and medical research. The cDNA clones are DNA sequences that are complementary to transcripts produced from genes in the genome; clones are synthesized from mature mRNA transcripts using reverse transcriptase and DNA polymerase enzymes for catalysis. For any gene of interest, the UCSC Genome Browser can be used to analyze annotated features and determine the availability of an MGC clone. Subsequently, via an external link, the researcher can order the cDNA clone for use in the experimental characterization of the gene.

In This Unit

Introduction

Accessing the Browser and Tools

Searching for a Gene