The mechanisms by which genetic variation affects transcription regulation and phenotypes at the nucleotide level are incompletely understood. Here we use natural genetic variation as an in vivo mutagenesis screen to assess the genome-wide effects of sequence variation on lineage-determining and signal-specific transcription factor binding, epigenomics and transcriptional outcomes in primary macrophages from different mouse strains. We find substantial genetic evidence to support the concept that lineage-determining transcription factors define epigenetic and transcriptomic states by selecting enhancer-like regions in the genome in a collaborative fashion and facilitating binding of signal-dependent factors. This hierarchical model of transcription factor function suggests that limited sets of genomic data for lineage-determining transcription factors and informative histone modifications can be used for the prioritization of disease-associated regulatory variants.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gene Expression Omnibus
Data are available in the Gene Expression Omnibus (GEO) under accession GSE46494.
We thank A. J. Lusis for providing access to eQTL data (http://systems.genetics.ucla.edu/) and for productive conversations. We thank D. Pollard for discussions and suggestions, and L. Bautista for assistance with figure preparation. These studies were supported by National Institutes of Health (NIH) grants DK091183, CA17390 and DK063491 (C.K.G.). M.U.K. was supported by the Foundation Leducq Career Development award and grants from Academy of Finland, Finnish Foundation for Cardiovascular Research and Finnish Cultural Foundation, North Savo Regional fund. C.E.R. was supported by the American Heart Association Western States Affiliates (12POST11760017) and the NIH (5T32DK007494).
Extended data figures
Supplementary Table 1 - HOMER-formatted motif files for the motifs used for strain-specific motif finding listed in Extended Data Figure 3a,b
The header rows, which begin with ">", list the consensus motif, the motif name, and the log-odds threshold above which a given sequence is considered to be positive for the motif. Below each header is the position weight matrix that lists the frequency of each nucleotide (A, C, G, T in the columns from left to right, respectively) at each position (rows) of the motif from top to bottom.
Loci are shown in rows. The number of variants at each region is shown between C57BL/6J and BALB/cJ in column 4. The number of variants with alleles matching the binding pattern observed across NOD, C57BL/6J, and BALB/cJ are shown in column 5.
The genomic location, variant information, strain-specific motif information, and primer sequences used to clone strain-similar loci are shown in columns for the 9 loci tested (data in Extended Data Figure 10a).