Weirauch, M.T. et al. Cell 158, 1431–1443 (2014).

Of the approximately 170,000 eukaryotic transcription factors (TFs) believed to exist, only about 1% have characterized binding sequences. Weirauch et al. address the knowledge gap by using two protein-binding microarrays to determine the binding motifs of 1,032 cloned TFs, which come from over 130 species and represent 54 of 80 known DNA binding-domain classes. On the basis of sequence similarity, the researchers inferred motifs for another 58,000 TFs, with 89% accuracy according to cross-validation. Motifs are enriched in promoters, chromatin-immunoprecipitation sequence fragments and Arabidopsis thaliana expression quantitative trait loci. Weirauch et al. also develop an algorithm that generates ranked lists of human TFs whose binding can be altered by disease-associated genetic variants. The binding motif information is available in the catalog of inferred sequence preferences of DNA-binding proteins (CIS-BP) (http://cisbp.ccbr.utoronto.ca/).