Nucl. Acids Res., published online 26 October 2012; doi:10.1093/nar/gks993

Bacterial sequencing has led to a wealth of genetic information regarding natural product biosynthesis pathways, but accessing that data in a robust and straightforward way has been more challenging. Several databases focusing on aspects of polyketide synthase or nonribosomal peptide synthetase function have been developed, but Conway and Boddy hoped to create a more comprehensive and current platform by enabling the biosynthesis community to directly contribute. The result is ClusterMine360, a new database that integrates existing genetic, chemical and bioinformatic tools to speed cluster deposition and analysis as well as ensures standardization and limits potential for user error. In particular, the database draws information from the NCBI nucleotide records to assign species, linked papers and annotations. Chemical structures are queried against ChemSpider and PubChem to collect synonyms, avoiding duplicate entries, and compared across database entries to create linked compound families. AntiSMASH provides a bioinformatic analysis of each cluster, with more than 10,000 individual domains currently documented in the repository, and enables fast retrieval of relevant sequences for homology comparisons and bioprospecting. For example, the authors were able to quickly extract and analyze 106 heterocyclization domains, showing phylogenetic clustering of the enzymes according to the identity of amino acid involved in cyclization. This cluster of technologies should lead to new opportunities in biosynthetic research.