Systematic study of natural product biosynthetic gene clusters (NPGCs) encoded by actinobacteria has been complicated by their complex and repetitive nature. Here, the authors create a data set comprising NPGCs identified in 830 actinomycete genomes and apply similarity metrics to classify these gene clusters into families (GCFs). By correlating mass spectrometric detection of known small molecules with the presence or absence of their established biosynthetic gene clusters, they verify the GCF designation and demonstrate its application in the de novo correlation of natural products and biosynthetic genes.