Journal home
Advance online publication
Current issue
Archive
Press releases
Free Association (blog)
Supplements
Focuses
Guide to authors
Online submissionOnline submission
For referees
Free online issue
Contact the journal
Subscribe
Advertising
work@npg
Reprints and permissions
About this site
For librarians
 
NPG Resources
Nature
Nature Biotechnology
Nature Cell Biology
Nature Medicine
Nature Methods
Nature Reviews Cancer
Nature Reviews Genetics
Nature Reviews Molecular Cell Biology
news@nature.com
Nature Conferences
RNAi Gateway
NPG Subject areas
Biotechnology
Cancer
Chemistry
Clinical Medicine
Dentistry
Development
Drug Discovery
Earth Sciences
Evolution & Ecology
Genetics
Immunology
Materials Science
Medical Research
Microbiology
Molecular Cell Biology
Neuroscience
Pharmacology
Physics
Browse all publications
User's guide
Nature Genetics  35, 29 - 32 (2003)
doi:10.1038/ng1192

Question 4
A user wishes to find all the single nucleotide polymorphisms that lie between two sequence-tagged sites. Do any of these single nucleotide polymorphisms fall within the coding region of a gene? Where can any additional information about the function of these genes be found?

The starting point for this search would be the web site for the Database of Single Nucleotide Polymorphisms (dbSNP) at the NCBI13, which is located at http://www.ncbi.nlm.nih.gov/SNP. There is a series of links on the page that allow the user to search using either information about the database submission itself or information regarding genes and gene loci.

For this particular search, assume that the region of interest is known and defined by two STS markers, RH79657 and RH45644. Begin by scrolling to the section labeled Between Markers at the bottom of the page. Enter the STS marker names 'RH79657' and 'RH45644' into the two text boxes, and click on Submit STS Markers. This will produce a display showing SNPs 1−25 out of the total of 56 within the region of interest.

The resulting page (Fig. 4.1) illustrates most of the possible types of result one would find on a typical dbSNP results page. In the table, starting from the left, the first column gives the individual dbSNP cluster IDs (all starting with 'rs'). The second column, labeled Map, shows whether a particular SNP has been mapped to a unique position in the genome (illustrated by a single green arrow, as in the first row of the example) or to multiple positions (not shown here).

Figure 4.1
Figure 1 thumbnail

Full FigureFull Figure and legend (53K)
The next set of columns, labeled Gene, indicates whether these SNPs are associated with particular features, such as genes, mRNAs or coding regions. The three columns (L, T and C) are either lit up or appear gray in every row. Taking each in order:

If the L (for locus) appears in blue, part or all of the marker position lies either within 2 kilobases (kb) of the 5' end of a gene feature or within 500 bases of the 3' end of a gene feature.

If the T (for transcript) appears in green, part or all of the marker position overlaps with a known mRNA. This does not mean, however, that the SNP marker necessarily falls within a coding region.

If the C (for coding) appears in orange, part or all of the marker position overlaps with a coding region.

The next column, labeled Het, indicates the average heterozygosity observed for this marker, on a scale of 0−100%. A reading of zero means that no information is available for that particular marker, whereas the pink bars show a 95% confidence interval for the marker. The Validation column indicates whether the marker has been validated (shown by a star) or is unvalidated (shown by light blue boxes). Validated markers have been verified by independent re-analysis of the sequence. All of the unvalidated markers shown in Fig. 4.1 are denoted by three blue boxes, which, according to the scale at the top of the column, means that there is a >95% success rate in validation. This figure indicates the probability that this marker is real. (The success rate is defined as 1 − false-positive rate.)

For some entries, the penultimate column contains the symbol TT (not shown in this example) indicating that individual genotypes are available for this marker. Finally, the Linkout Avail column indicates which markers are linked to other databases; a P in this column indicates that the variation has been mapped to a known protein structure. For a complete description of all the features within this display, click on any part of the header above the columns.

Returning to the original question, one of the SNPs displayed on this page does indeed fall within a coding region, as indicated by an orange C. To obtain more information on any particular SNP, simply click on the hyperlinked SNP Cluster ID. Clicking on rs1801973, for example, produces a new page, with all available information on that SNP (Fig. 4.2). Under the header marked Submitter records for this RefSNP Cluster is a list of the individual SNPs (in this case, only one SNP) that have been clustered together to form this single reference SNP. The sequence of the SNP is shown in the next header. Under the header marked NCBI Resource Links are GenBank and NCBI RefSeq entries that are associated with this SNP. Scrolling further down on the SNP page (Fig. 4.3), the gene whose coding region this SNP falls within is indicated on the LocusLink Analysis section (ADAM10, a disintegrin and metalloproteinase domain 10). The SNP allele is G/T, a non-synonymous change leading to replacement of the Gly residue in the reference sequence by an unspecified residue. Links are also provided to the NCBI Map Viewer, Ensembl map and UCSC genome assembly in the section labeled Integrated Maps. The sections labeled Variation Summary and Validation Summary (not shown) give the raw data on this particular SNP.

Figure 4.2
Figure 2 thumbnail

Full FigureFull Figure and legend (59K)
Figure 4.3
Figure 3 thumbnail

Full FigureFull Figure and legend (55K)
To answer the final part of this question requires jumping from dbSNP to LocusLink10. To do so, click on the ADAM10 link in the line marked LocusLink at the top of the page (Fig. 4.3). This brings the user to the LocusLink page for ADAM10 and provides numerous jumping-off points to the NCBI and affiliated resources through the boxed links at the top of the page. More information on these resources can be found by following the LocusLink FAQ link in the left-hand column of the page. By simply examining the LocusLink page itself, one sees that the ADAM10 protein belongs to a family of cell surface proteins that have potential adhesion and protease domains, and that this particular member of the family proteolytically processes pro-TNF alpha.

One often-overlooked source of information on genes and gene products is OMIM14. This is an electronic version of the catalog of human genes and genetic disorders developed by Victor McKusick at The Johns Hopkins University. OMIM provides the user with concise textual information from the published literature on most human disorders with a genetic basis, and links back to the primary literature as appropriate. Information comprising an OMIM entry includes the gene symbol, alternate names for the disease, a description of the disease (including clinical, biochemical and cytogenetic features), details of the mode of inheritance (including mapping information) and a clinical synopsis. These entries are manually curated, ensuring that the 'executive summary' is up to date and accurate. Although OMIM can be searched directly, many LocusLink entries also link to the OMIM record for the gene. The OMIM entry page for the ADAM10 protein is shown in Fig. 4.4. The page is fully hyperlinked to PubMed, GenBank and other related databases.

Figure 4.4
Figure 4 thumbnail

Full FigureFull Figure and legend (79K)
 Top
REFERENCES
  1. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). | Article | PubMed  | ISI | ChemPort |
  2. Collins, F.S. and McKusick, V.A. Implications of the Human Genome Project for medical science. J. Am. Med. Assoc. 285, 540–544 (2001). | Article | ISI | ChemPort |
  3. Watson, J.D. & Crick, F.H.C. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171, 737–738 (1953). | PubMed  | ISI | ChemPort |
  4. Green, E.D. Strategies for the systematic sequencing of complex genomes. Nature Rev. Genet. 2, 573–583 (2001). | Article | PubMed  | ISI | ChemPort |
  5. Ouellette, B.F.F. & Boguski, M.S. Database divisions and homology search files: a guide for the perplexed. Genome Res. 7, 952–955 (1997). | PubMed  | ISI | ChemPort |
  6. Bairoch, A. & Apweiler, R. The SWISS-PROT Protein Sequence Database and its supplement TREMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000). | Article | PubMed  | ISI | ChemPort |
  7. Hubbard, T. et al. The Ensembl Genome Database Project. Nucleic Acids Res. 30, 38–41 (2002). | Article | PubMed  | ISI | ChemPort |
  8. Kent, W.J. BLAT—the BLAST-like Alignment Tool. Genome Res. 12, 656–664 (2002). | Article | PubMed  | ISI | ChemPort |
  9. Stein, L. Genome annotation: from sequence to biology. Nature Rev. Genet. 2, 493–503 (2001). | Article | PubMed  | ISI | ChemPort |
  10. Pruitt, K.D. & Maglott, D.R. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 29, 137–140 (2001). | Article | PubMed  | ISI | ChemPort |
  11. Burge, C.B. & Karlin, S. Finding the genes in genomic DNA. Curr. Opin. Struct. Biol. 8, 346–354 (1998). | Article | PubMed  | ISI | ChemPort |
  12. Schuler, G.D. Electronic PCR: bridging the gap between genome mapping and genome sequencing. Trends Biotechnol. 16, 456–459 (1998). | Article | PubMed  | ISI | ChemPort |
  13. Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001). | Article | PubMed  | ISI | ChemPort |
  14. Hamosh, A. et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30, 52–55 (2002). | Article | PubMed  | ISI | ChemPort |
  15. Baxevanis, A.D. & Ouellette, B.F.F. (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (John Wiley & Sons, New York, 2001).
  16. Solovyev, V.V., Salamov, A.A. & Lawrence, C.B. Identification of human gene structure using linear discriminant functions and dynamic programming. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 367–375 (1995). | PubMed  | ChemPort |
  17. Yeh, R.F., Lim, L.P. & Burge, C.B. Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803–816 (2001). | Article | PubMed  | ISI | ChemPort |
  18. Marchler-Bauer, A. et al. CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281–283 (2002). | Article | PubMed  | ISI | ChemPort |
  19. Apweiler, R. et al. InterPro—an integrated documentation resource for protein families, domains and functional sites. Bioinformatics 16, 1145–1150 (2000). | Article | PubMed  | ISI | ChemPort |
  20. Rebhan, M., Chalifa-Caspi, V., Prilusky, J. & Lancet, D. GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14, 656–664 (1998). | Article | PubMed  | ISI | ChemPort |
  21. Blake, J.A., Richardson, J.E., Bult, C.J., Kadin, J.A. & Eppig, J.T. The Mouse Genome Database (MGD): the model organism database for the laboratory mouse. Nucleic Acids Res. 30, 113–115 (2002). | Article | PubMed  | ISI | ChemPort |
  22. Hudson, T.J. et al. A radiation hybrid map of mouse genes. Nature Genet. 29, 201–205 (2001). | Article | PubMed  | ISI | ChemPort |
  23. Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 30, 276–280 (2002). | Article | PubMed  | ISI | ChemPort |
  24. Letunic, I. et al. Recent improvements to the SMART domain–based sequence annotation resource. Nucleic Acids Res. 30, 242–244 (2002). | Article | PubMed  | ISI | ChemPort |
  25. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). | Article | PubMed  | ISI | ChemPort |
  26. Durbin, R., Eddy, S., Krogh, A. & Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge Univ. Press, Cambridge, 1998).
  27. Peri, S., Ibarrola, N., Blagoev, B., Mann, M. & Pandey, A. Common pitfalls in bioinformatics-based analyses: look before you leap. Trends Genet. 17, 541–545 (2001) [erratum Trends Genet. 18, 218 (2002)]. | Article | PubMed  | ISI | ChemPort |
  28. Ponting, C. Issues in predicting protein function from sequence. Brief. Bioinform. 2, 19–29 (2001). | PubMed  | ChemPort |
  29. Aparicio, S.A.J.R. How to count ... human genes. Nature Genet. 25, 129–130 (2000). | Article | PubMed  | ISI | ChemPort |
  30. Beadle, G.W. & Tatum, E.L. Genetic control of biochemical reactions in Neurospora. Proc. Natl Acad. Sci. USA 27, 499–506 (1941). | ChemPort |
  31. Jeffery, C.J., Bahnson, B.J., Chien, W., Ringe, D. & Petsko, G.A. Crystal structure of rabbit phosphoglucose isomerase, a glycolytic enzyme that moonlights as neuroleukin, autocrine motility factor, and differentiation mediator. Biochemistry 39, 955–964 (2000). | Article | PubMed  | ISI | ChemPort |
  32. Wistow, G. & Piatigorsky, J. Recruitment of enzymes as lens structural proteins. Science 236, 1554–1556 (1987). | PubMed  | ISI | ChemPort |
  33. Jeffery, C.J. Moonlighting proteins. Trends Biochem. Sci. 24, 8–11 (1999). | Article | PubMed  | ISI | ChemPort |
  34. Chothia, C. Proteins. One thousand families for the molecular biologist. Nature 357, 543–544 (1992). | Article | PubMed  | ISI | ChemPort |
  35. Hegyi, H. & Gerstein, M. The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol. 288, 147–164 (1999). | Article | PubMed  | ISI | ChemPort |
  36. Jansen, R. & Gerstein, M. Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. Nucleic Acids Res. 28, 1481–1488 (2000). | Article | PubMed  | ISI | ChemPort |
  37. Brenner, S.E. Errors in genome annotation. Trends Genet. 15, 132–133 (1999). | Article | PubMed  | ISI | ChemPort |
  38. Smith, R.F. Perspectives: sequence data base searching in the era of large-scale genomic sequencing. Genome Res. 6, 653–660 (1996). | PubMed  | ISI | ChemPort |
 Top
FULL TEXT
Previous | Next
Table of contents
Download PDFDownload PDF
Send to a friendSend to a friend
Save this linkSave this link

Open Innovation Challenges

  • Optimizing Sub-cellular Localization Tags

    • Deadline: Jan 31 2010
    • Reward: $20,000 USD

    The Seeker is looking for methods to optimize sub-cellular localization tags for protein expression....

  • Single-cell Analysis Platform

    • Deadline: Dec 02 2009
    • Reward: $5,000 USD

    This Challenge is looking for novel approaches to analyzing changes at a single-cell level. This is...

naturejobs

Figures & Tables
References
Export citation
Export references
natureproducts

Search buyers guide:

 
Nature Genetics
ISSN: 1061-4036
EISSN: 1546-1718
Journal home | Advance online publication | Current issue | Archive | Press releases | Supplements | Focuses | For authors | Online submission | Permissions | For referees | Free online issue | About the journal | Contact the journal | Subscribe | Advertising | work@npg | naturereprints | About this site | For librarians
Nature Publishing Group, publisher of Nature, and other science journals and reference works©2003 Nature Publishing Group | Privacy policy