Article | Open

A Conserved Structural Signature of the Homeobox Coding DNA in HOX genes

  • Scientific Reports 6, Article number: 35415 (2016)
  • doi:10.1038/srep35415
  • Download Citation
Published online:


The homeobox encodes a DNA-binding domain found in transcription factors regulating key developmental processes. The most notable examples of homeobox containing genes are the Hox genes, arranged on chromosomes in the same order as their expression domains along the body axis. The mechanisms responsible for the synchronous regulation of Hox genes and the molecular function of their colinearity remain unknown. Here we report the discovery of a conserved structural signature of the 180-base pair DNA fragment comprising the homeobox. We demonstrate that the homeobox DNA has a characteristic 3-base-pair periodicity in the hydroxyl radical cleavage pattern. This periodic pattern is significant in most of the 39 mammalian Hox genes and in other homeobox-containing transcription factors. The signature is present in segmented bilaterian animals as evolutionarily distant as humans and flies. It remains conserved despite the fact that it would be disrupted by synonymous mutations, which raises the possibility of evolutionary selective pressure acting on the structure of the coding DNA. The homeobox coding DNA may therefore have a secondary function, possibly as a regulatory element. The existence of such element may have important consequences for understanding how these genes are regulated.


Hox genes encode a group of transcription factors, responsible for developmental processes and establishment of the body plan1,2,3,4. All Hox genes and many other developmental transcription factors contain the homeobox, a DNA sequence encoding the functional DNA-binding domain. Hox genes are known for their colinearity: conserved arrangement on chromosomes that is the same as their order of activation along the body axis. The regulation is very precise, for example, the regions of activity of Hox genes are tightly confined to specific rhombomeres5,6,7,8,9 or to segments of the vertebrate anteroposterior body axis10. The vertebrate Hox genes are synchronized: the expression domains of paralogs from the A, B, C and D clusters are virtually identical11,12,13.

Despite 35 years of active research, the mechanisms of Hox gene regulation have remained elusive. Hox genes tend to be inhibited by more posterior ones, but this process appears not to be universal outside of vertebrates and is likely secondary to the yet unknown original mechanism of regulation14,15. It has been argued that chromatin structure16,17 and histone demethylation18,19,20,21 play important roles in activation of Hox genes, but the mechanism precisely directing chromatin modifications to specific loci at the right time remains mysterious. Ultraconserved regions and regulatory elements have been found within the coding sequences of Hox genes22,23, but the key questions remain unanswered. It is unknown what mechanism could be responsible for the exceptional synchronous colinearity of Hox gene clusters and the conserved synteny of other pairs of groups of homeobox-containing genes, however the topology of chromatin has been proposed to play a role in regulation of these genes24. Chromatin topology may depend on CTCF binding sites or long non-coding RNAs, however neither was confirmed to play a primary role in regulation of chromatin in the Hox clusters25,26. It is therefore possible that discovering a new DNA element will lead towards deciphering the regulation of genes within the Hox clusters.

Here, we report the discovery of a conserved feature of the DNA coding for Hox genes and certain other developmental transcription factors. While the function of the feature and associated mechanisms remain unknown and no conclusive statements concerning their specific role can be made without targeted genetic studies, statistical arguments point to significance of the motif and to its possible association with developmental processes and with regulation of chromatin structure.

Given the coincidence between presence of the homeobox domain and the unique evolutionary and regulatory properties of a gene, it is reasonable27,28 to hypothesize that the homeobox may be directly involved in regulation of Hox genes, or more specifically that the homeobox DNA sequence itself plays a role in regulating the gene it is contained in. While the DNA sequence of the homeobox is not ultraconserved22,29,30, a direct link may exist between the coding sequence of the homeobox and certain structural properties of DNA in this region. The local structure of chromatin is known to depend on the GC content, which can also mark transcriptionally active regions31, and on other quantitative characteristics, e.g. the Hydroxyl Radical Cleavage (HRC). In this work, we show that the HRC pattern in the homeobox coding DNA displays a significant structural property that is conserved not only between Hox genes within a species but also between distant species. The mechanistic role of the structural feature remains unknown, however one possible explanation of its conservation is its putative function in regulation of the Hox genes. The possibility of regulatory elements embedded in coding sequences has been explored among others by ref. 32, who identified thousands of possible examples of such sites and argued that dual encoding of amino acid and regulatory information may be a fundamental feature of genome evolution. A possible role in transcriptional regulation may be either as a direct transcription factor binding site or indirect, as a locus important for directing epigenetic modifications or affecting chromosomal conformation. Other explanations of the phenomenon may include a role in subsequent transcriptional or post-transcriptional processes.


The GC content of the homeobox in Hox genes

We calculated the average GC content within mouse and human homeobox-containing genes, as well as eight Hox genes of the fruit fly D. melanogaster (Table 1). In each of the three species, the homeobox-coding DNA is flanked by regions of significantly higher GC content. This observation raises the possibility that the properties of DNA or chromatin within the homeobox coding sequence may serve a conserved biological function. While the hypothetical function is unknown, it is possible that it will be related to recognition of the Hox genes by a regulatory process.

Table 1: The average GC content of the coding sequences in the different regions of Hox genes.

The Hydroxyl Radical Cleavage motif (HRC3)

To investigate the structure of the homeobox DNA with more detail, we analysed the predicted HRC pattern of genes containing these sequences. The HRC is an important parameter correlated with the local structure of chromatin, corresponding to the width of the minor groove of DNA33. HRC provides information on the local shape and structure of the DNA helix and has been shown to correlate with functional non-coding regions in the genome34. The HRC can be reliably estimated from the sequence of DNA35.

The HRC pattern of the homeobox region of the mouse gene HoxB4 is presented as an example in the top panel of Fig. 1. A striking feature of the homeobox is the 3-base pair periodicity (the “HRC3” signature), which is absent outside of the homeobox. For comparison, the predicted HRC pattern of the coding sequence adjacent to the homeobox toward the 5′ end of the gene is shown in panel B of Fig. 1. We quantified the significance of the periodic pattern by computing the periodogram36,37,38 of the data, defined as an estimate of the amplitude of the harmonic oscillation best fitting the signal within the interval [a,b], as a function of the period T:

Figure 1: The periodicity of hydroxyl cleavage pattern within the homeobox.
Figure 1

(A) The pattern of HRC in the mouse HoxB4 homeobox coding sequence (red) shows a period of three base pairs (dotted line represents a harmonic oscillation with a 3 bp period; note that the two plots are consistently in phase with one another). (B) The periodic signature is absent in other regions of the gene. (C) The periodograms of the homeobox HRC (red), the HRC of a coding region adjacent to the homeobox (green) and the HRC of a simulated DNA sequence coding for the same protein sequence of the homeodomain but using different codons (blue). The highly significant peak is present only in the actual homeobox. (D) The HRC3 patterns in homeobox DNA are more prevalent than in other coding sequences. The median periodogram of HRC of the homeoboxes of mouse Hox genes (red), all homeobox genes (green), outside of homeobox in homeotic genes (blue: 180 bp adjacent towards 5′ end, teal: between homeobox and the 3′ end), and randomly chosen coding sequences (dark blue). (E) Histograms of periodicity score at T = 3 base pairs.

The significance p of PHRC at a given period T reads: p = (1 − P/n)n − 1 36,39 (if sample variance is used as σ in the calculation) and can be approximated as p = exp(−P)40,41. For the homeobox of HoxB4, the periodogram shows a very significant (p < 1.e-5) peak corresponding to a period of three base pairs in the homeobox (red graph in Fig. 1C). The periodogram of the HRC in the 180-bp region adjacent to the homeobox in the coding sequence (green curve in Fig. 1C) does not have a significant peak at 3 bp. The periodicity analysis of all mouse Hox genes reveals the same HRC3 feature significant in 37 out of 39 homeobox-coding sequences of mouse Hox genes, with the exception of two genes: HoxA13 (p = 0.219) and HoxD8 (p = 0.104), see Table 2. For32 mouse Hox genes the significance is below 0.01, and for 21 genes p < 0.001. Examples of Hox genes in which the periodicity is significant but less prominent are HoxC13, HoxC8, HoxD13, close homologs of HoxA13 and HoxD8 (see also Supplementary Figure File SF1). Note that ortholog groups 8 and 13 are arguably the fastest evolving Hox genes, and are expected to have diverged the most from the postulated original Urhox gene14. In general, the HRC3 signature is significantly stronger in anterior Hox genes (Hox 1–7) compared to posterior ones (p = 0.0039 in mouse; p = 0.0048 in human; t-test).

Table 2: The HRC3 signature in mouse and human Hox genes.

The results point to selective pressure favouring the three-base pair periodicity of HRC within the homeobox. There are two possible explanations for the observed phenomenon: it can be either a consequence of a conserved amino acid sequence or the selection of codons within the homeobox. In the former case, the HRC3 pattern would be associated with any DNA sequence coding for the protein sequence of the homeodomain. Conversely, if the pressure is indeed on the properties of the chromatin, the pattern would be weakened or disrupted by synonymous mutations. To verify that codon selection does significantly contribute to the HRC pattern in homeoboxes, we simulated 1,000 DNA sequences with synonymous mutations for each amino acid, and we predicted the HRC pattern for the simulated sequences. The simulated DNA sequences were generated so that they would be coding for the same protein sequence, and use codon frequencies that are typical for all mouse CDS (coding DNA sequences); such comparison may show that evolutionary pressure on codon selection exist within the homeoboxes. We observed that in the simulated DNA sequences coding for the same homeodomain amino acid sequence, the periodicity of the HRC3 pattern is greatly disrupted. The result confirms that generally a strong bias exists towards using codons that maximize the HRC3 signal in the homeoboxes of mouse Hox genes. Specifically, for 36 out of 39 Hox genes, the fraction of random sequences that would have produced stronger HRC3 signal than the one actually observed in the homeobox of the gene is much lower than the expected value of 0.5. The average fraction is 0.134, and the median is 0.030. For genes with strong HRC3 signal (PHRC3 > 6), the average and median are 0.031 and 0.008, respectively. This result suggests that the codon selection effect is especially strong in the genes with highly significant HRC3 feature, although causation cannot be determined based on the presented data.

The simulation results are presented in Supplementary Table S1, and in Supplementary Figure S1 that also shows the relation between the HRC3 amplitude and the codon selection effect. Supplementary Table S2 shows details of codon usage within the homeoboxes of Hox/Antp-Ubx clusters in mouse, human and fly. Codon bias specific to homeoboxes exists in all three species, moreover the homeobox to whole-exome codon usage ratios are virtually identical between mouse and human (Pearson Correlation Coefficient, PCC = 0.97) and also similar in Drosophila (PCC = 0.38, p = 0.018), see Supplementary Figure S2; in most cases GC-rich codons are favoured in the homeoboxes.

The HRC3 motif in mouse homeobox-containing genes outside of the Hox clusters

Analysis of additional 119 mouse homeobox-containing genes outside of the Hox clusters reveals the presence of the same periodic signature in a large fraction of these genes (see Fig. 1D,E, Supplementary Figure File SF2, and Supplementary Table S3). Homeobox genes in animals are categorized into several classes based on evolutionary relationships and additional domains42. Here, we compared the average amplitudes of the HRC3 pattern for homeoboxes of genes from the ANTP, LIM, POU, ZF, PRD and TALE categories. Intriguingly, while every class contains genes with a significant HRC3 signature in the homeobox, the pattern is most prevalent in the ANTP class, which includes the NK, Hox, and ParaHox groups of genes. The average PHRC3 for mouse ANTP genes is 6.62, while for the other classes it is much lower (2.80, 3.17, 2.65, 2.88 and 3.43 respectively for the LIM, POU, ZF, PRD and TALE categories). The systematic difference between genes in different classes suggests that the HRC3 signature may be functional in homeobox-containing genes that are at least partially organized in clusters with conserved synteny or in conserved pairs (as NKX2.1—NKX2.8 or NKX2.2—NKX2.4).

If the HRC3 feature in homeobox genes is indeed related to the clustering of those genes on chromosomes, one may expect a general correlation between the HRC3 signature of a gene and presence of other homeobox genes in its chromosomal neighborhood. The relation between distance to the nearest homeobox-containing gene and the amplitude of the HRC3 signal is shown as scatterplot in Supplementary Figure S3. Indeed, the mouse genes that have other homeobox genes in their vicinity (within 300kbp) tend to display stronger HRC3 signatures. The dependence is statistically significant, with a p-value of 2.16e-7 (Wilcoxon rank sum test), or 5.14e-8 (t-test). Intriguingly, the difference between isolated and clustered homeobox-containing genes remains significant even after removing all Hox genes from the analysis (p = 0.017, Wilcoxon; p = 0.012, t-test).

While the HRC3 pattern is present in most homeoboxes, it is not common outside of the homeobox coding DNA. The difference between homeoboxes and other coding sequences is evidenced by the periodograms of the HRC patterns calculated for 22,882 randomly chosen 180-bp coding sequences from the mouse genome. The results are summarized in Fig. 1D,E, respectively showing the median periodograms and the histograms of the most significant periods of HRC in: the homeobox of mouse Hox genes, the homeobox of all homeobox-containing genes, in the 5′ and 3′ adjacent regions and in randomly selected coding sequences. Note that the 3 bp period is exceptional-no other significant periodicities (in the range between 2 bp and 7 bp) are common either to the Hox genes or to other coding sequences. The HRC3 signal in randomly chosen coding sequences is very significantly smaller than in all homeoboxes, and than in the ANTP class genes (p-value < 10−11 in both cases). On the other hand, the histogram of PHRC3 in the coding regions of homeobox-containing genes outside of homeobox is not remarkably different from that of randomly chosen coding sequences.

The HRC3 patterns of human homeoboxes are virtually identical to the patterns observed in mouse, and so are the periodograms at 3 bp (see Table 2, Table S4, and Supplementary Figure Files SF1 and SF2). To test whether the periodicity in the pattern is conserved even beyond vertebrate animals, we analyzed the HRC in the homeoboxes of the long-germ insect D. melanogaster. The 3 bp period is highly significant in seven out of the eight ANTP/Hox genes (Fig. 2, Supplementary Table S4). It is weaker in the Proboscipedia gene, whose HRC pattern is similar to the homologous mouse genes HoxA2 and HoxB2 (compare Fig. 2 and Supplementary Figure SF1), which may suggest the presence of a functional variant of the structure of the homeobox DNA.

Figure 2: Periodograms of HRC in homeoboxes of fly Hox genes.
Figure 2

The same pattern as in mammalian genes is present, revealing evolutionary conservation of the HRC3 periodic structural signature.

Evolutionary conservation of the HRC3 signature

The results presented for human, mouse and Drosophila suggest that the pattern may be conserved in the Hox genes also in other animal species. To test this hypothesis, we analysed the HRC3 pattern of homeoboxes in Hox gene homologs, as defined by GENBANK, of several additional metazoan species. In each organism, we identified the Hox genes and in each gene we computed the amplitude and significance of the HRC3 signature of the 180-bp sequence aligned with the homeobox. The results are presented in Table 3 and Supplementary Table S4. The data suggest that the pattern is generally conserved in vertebrate and invertebrate species with true segmentation of their bodies, while it may be less significant in non-segmented animals, as mollusca or tunicates.

Table 3: The HRC3 signature in Hox genes of metazoan species.

Functions of other genes with the HRC3 signature

If the HRC3 signature is indeed recognized by an unknown molecular mechanism related to the function of Hox genes and certain other homeobox-containing genes, it is possible that such mechanism may also be employed by other genes and processes. To address this question, we searched for the HRC3 signature in the sequences of all mouse coding genes. Since this search was not restricted specifically to the 180-bp homeobox sequence but rather included the entire transcribed gene, this analysis required a more strict threshold on significance of detection to avoid a large number of false positives. We performed functional annotation enrichment analysis for mouse genes with a threshold of PHRC3 = 6 (p < 0.00248) and PHRC3 = 10 (p < 4.54e-5), using the DAVID (Database for Annotation, Visualization and Integrated Discovery, web server. We uploaded the gene list of genes containing HRC3 signature at PHRC3 ≥ 6 (8000 genes) and PHRC3 ≥ 10 (5692 genes) to the server and set the background as the total number of genes present in the mouse genome (version GRCm38.p1) annotation. Moreover, only GO terms containing at least 200 of the input genes and a q-value (Benjamini corrected p-value) < 0.05 were selected. The Biological Processes most significantly enriched among those genes are summarized in Fig. 3; the complete results are presented in Supplementary Table S5 and in Supplementary Figure S4 depicting the top enriched Molecular Functions and Cellular Components. The analysis shows a highly significant enrichment of processes related to development (GO:0009888, GO:0048731, GO:00325020) as well as regulation of gene expression and metabolic processes (GO:0010628, GO:0031325, GO:0044260). One of the molecular functions significantly enriched in genes containing the HRC3 signature is DNA binding (GO:0003677). This observation led us to testing whether the HRC3 motif may significantly overlap with sequences coding for binding domains other than homeobox. To this end, we searched for HRC3 signatures overlapping with other DNA binding domains, as defined by InterPro, the database of protein families, domains and functional sites43. The overlap enrichments (relative to sequences coding for binding domains positioned randomly in the exome) are presented in Supplementary Table S6: while the interpretation is not straightforward due to different sizes of these domains and different levels of homology, the results may suggest that some domains (Forkhead, bHLH) also tend to overlap with the HRC3 signatures, while others (Ets, Pou) do not have a significant overlap (simulated overlaps equalled or exceeded the actual values). It is therefore possible that a yet unknown function of the HRC3 element exists that is not limited to the homeobox-containing genes but is also affecting certain other classes of transcription factors.

Figure 3: The GO biological processes significantly enriched in mouse genes containing the HRC3 signature (PHRC3 ≥ 10).
Figure 3

Note the high prevalence of processes associated with development.

Relation to nucleosome positions and ultraconserved regions

To test whether any putative function of the HRC3 signature may be related to the nucleosome occupancy, we predicted the nucleosome-rich sites in mouse Hox clusters44,45 and found that on average the nucleosome occupancy within the homeobox does not differ from other coding regions in the Hox genes (See Supplementary Figure S5). This suggests that even if the HRC3 feature is indeed functional, its mechanism is not likely to be directly related to the nucleosome occupancy along its sequence.

Intriguingly, in some of the genes (e.g. HOXA5, HOXB5, HOXC4) a second, shorter HRC3 region exists outside of the homeobox that may coincide with an ultraconserved region (UCR) identified by ref. 22 (see Supplementary Figure S6). This observation may suggest that while a HRC3 signature is required in Hox genes, in some cases it has moved outside of the homeobox, however it is based on a small sample of short periodic sequences and is therefore inconclusive.

Functional correlations of genome-wide loci with the HRC3 signature

One possible function of sequences carrying the HRC3 signature could be recruiting transcription factors or other proteins to the specific chromosomal loci. To check if such function could be valid, we investigated the overlaps between the HRC loci and known binding sites inferred from ChIP-seq experiments, obtained by the ENCODE project46. Hox genes and other developmental transcription factors are regulated primarily during embryonic development. Since the data available from ENCODE are not collected in embryonic tissue, we analyzed the genome-wide distribution of HRC3 loci, without restricting it to developmental transcription factors. We have analyzed ENCODE data for 161 DNA binding proteins, to note that some binding sites are significantly enriched in the HRC3 motifs, while others are not (Table 4, Supplementary Table S7). The enrichments (estimated by comparing with simulated distributions of loci) range from less than 1 (significant depletion) to over 6-fold enrichment ratio. This observation is consistent with the HRC3 sites being involved in specific cellular or systemic functions, at the same time suggesting that HRC3 is not a general mark of a process (as chromatin accessibility) that would affect all TFs equally. Notably, the DNA binding proteins most significantly coinciding with HRC3 include proteins involved in epigenetic modifications of chromatin, such as SUZ12-chromatin silencing; KDM5B, KDM5A, PHF8–histone demethylases; EZH2, RBBP5–histone methyltransferases; SAP30, HDAC1, HDAC6–histone deacetylases; CHD1, SMARCB1-chromatin organization modifier and also CTCF–a transcriptional repressor and a key regulator of chromatin architecture; etc. While this result is not sufficient to draw conclusions concerning the role of HRC3, it is consistent with the possibility of the motif being important in regulation of chromatin modifications and control of the epigenetic state of the cell and in agreement with studies in Drosophila showing that histone modifications are responsible of defining the segmental regulatory domains47.

Table 4: Transcription Factor Binding Sites (TFBS) from the ENCODE project with peaks overlapping and non overlapping HRC3 in human genome version hg19.


We analyzed the DNA sequences coding for the homeodomains of metazoan Hox genes using a new computational approach that combines sequence alignment, prediction of structural features and spectral analysis. We have discovered a three-base-pair periodic signature (“the HRC3 pattern”) in the hydroxyl radical cleavage profiles of the homeobox DNA. The hydroxyl radical cleavage profile correlates with local structural properties and bendability of the double-stranded DNA. The discovered phenomenon of characteristic periodicity of HRC (HRC3) is present in Hox genes of human, mouse, and fly and other segmented bilaterian animals. In human and mouse the signature is also found in other homeobox-containing genes, especially in genes that have other homeobox genes in their chromosomal neighborhoods. The conservation of the HRC3 pattern both between genes within a species and between distant species of metazoans raises the possibility that the structural feature arose early in the evolution, although it cannot be determined whether it was present in the postulated ancestral, Pre-Cambrian Urhox or ProtoANTP gene, from which all extant Hox genes are thought to have evolved42. The signature is also not universal as it is absent in homeoboxes of many mammalian genes that are not members of the ANTP class; it is also only marginally significant in the Hox gene orthologs in some non-segmented organisms, as the mollusk Octopus vulgaris48.

We have shown that even synonymous mutations will disrupt the HRC3 pattern. Its observed persistence, along with the different GC content in the homeoboxes, constitutes evidence that the pattern may play a role in the codon selection within these genes, and suggests that the remarkable conservation is due to evolutionary pressure on the structural properties of the homeobox coding DNA. Similar effect on codon usage has been reported for exonic binding sites of the CTCF and NRSF(REST) transcription factors32. While the biophysical nature of the HRC3 signature remains unknown, it is likely that the HRC3 pattern is characteristic of a DNA structure that serves as a regulatory element within the Hox clusters. Indirect evidence has been presented suggesting that some DNA-binding proteins may indeed be backbone conformation-specific, rather than DNA sequence-specific35,49, also periodic features have been recently indicated as functionally significant, e.g. in selection of transcription start site50. If the entire homeobox constitutes a regulatory element, it would thus play a dual role, both in regulating the targets of Hox genes and in regulating expression of the Hox genes themselves. This double function could make the homeobox a perfect material for a logical element that has evolved into the basic building block of the circuitry encoding and executing the complex logic of developmental programs. Our discovery may provide a key step towards understanding the molecular basis for the colinearity and synchronization of Hox genes, the conserved synteny of other homeobox-containing transcription factors, and its relation to the intricately regulated somite clock51,52,53,54,55. The observed significance of HRC3 signature is significantly higher in anterior Hox genes than in posterior ones; if the motif is indeed involved in regulation of the Hox genes, such difference may be explained by the stronger conservation of the anterior body plan than posterior across the animal kingdom.

Genome-wide analysis points to highly significant (up to over 6-fold) enrichment of HRC3 signatures among binding sites of proteins involved in chromatin organization, and histone modification. These coincidences suggest that if the HRC3 signature indeed plays a role in transcriptional regulation of genes, a possible mechanism of action could involve directing epigenetic modifications to specific genomic loci.

Consequently, studying the HRC3 signature may also lead towards an explanation why all genes in the collinear Hox clusters contain the homeobox domain. Preliminary enrichment-based analysis suggests that signatures related to HRC3 may be associated not only with homeoboxes, but possibly also with several other classes of DNA binding domains. The HRC3 signature may improve our understanding of certain aspects of gene regulation in developmental biology, and is likely to have impact onto other fields, including the study of cancers in which the regulation of developmental genes is disrupted.


For Human, mouse and drosophila, we aligned coding DNA sequences obtained from the Genbank CCDS and CDS databases56 with the consensus homeobox sequence RRRKRTAYTRYQLLELEKEFLFNRYLTRRRRIELAHSLNLTERHIKIWFQNRMKWKEN using tblastn57 with an expectation threshold of 0.001, and selected the genes for which the alignment length was at least 140 nucleotides. For other species, the selection was based on the species-specific list of Hox genes present in the Homeobox Database58 that have sequences in GenBank. We predicted the HRC patterns using a modified sliding tetramer window algorithm35. The original sliding tetramer HRC prediction produces four values of HRC for each position-based on the four overlapping tetramers containing each pair of bases. Rather than using only one of them, we calculated their weighted average, with weights of 1/6, 1/3, 1/3 and 1/6 for the consecutive tetramers. The significance of the 3-bp period is calculated based on the value of the periodogram computed for the 180-bp homeobox sequence at T = 3 bp and Fisher’s test for single frequency36,37,59. The computer programs (written in Perl) to calculate the periodicity of the HRC pattern based on the DNA sequence and to compute the PHRC180(3) amplitude within identified homeoboxes are provided as Supplementary data (supplementary Program File SP1).

To compare the HRC3 motifs outside of the homeobox with ultraconserved coding regions (UCRs) (Suppl. Figure 4), we computed the periodogram power at 3 bp, PHRC100(3), over intervals of 100 bp centred on every position in the sequence. We defined periodic HRC intervals as those with PHRC100(3) equal at least 1.8 over at least 80 consecutive positions in the sequence. These data are overlaid on the UCRs in HoxA, HoxB and HoxC genes reported by ref. 22. To verify that the HRC3 signature is not a consequence of the coding sequence of the homeobox, we calculated the HRC patterns in a family of simulated homeobox sequences coding for the same homeodomain as in the actual genes. We generated the simulated sequences by randomly choosing codons for each amino acid from a distribution reflecting the actual codon frequencies in the entire coding genome (Suppl. Table S3). The lists of homeobox genes in the ANTP, LIM, POU, ZF, PRD and TALE classes are derived from the HomeoDB2 database58.

The enrichments of ENCODE binding sites have been calculated using BEDTools suite of utilities for comparing genomic features60. To compute the overlap between features (peaks of HRC3 and chromosomal position of DBDs for each family), we used the “closest” option of bedtools with the human genome hg19 and chose features for which the distance is zero (at least one overlapping position).

Additional Information

How to cite this article: Fongang, B. et al. A Conserved Structural Signature of the Homeobox Coding DNA in HOX genes. Sci. Rep. 6, 35415; doi: 10.1038/srep35415 (2016).


  1. 1.

    Gene Complex Controlling Segmentation in Drosophila. Nature 276, 565–570 (1978).

  2. 2.

    , & Homeodomain Proteins. Annu Rev Biochem 63, 487–526 (1994).

  3. 3.

    , , , & A Conserved DNA-Sequence in Homoeotic Genes of the Drosophila Antennapedia and Bithorax Complexes. Nature 308, 428–433 (1984).

  4. 4.

    , , & Hox genes specify vertebral types in the presomitic mesoderm. Genes & Development 19, 2116–2121, doi: 10.1101/Gad.338705 (2005).

  5. 5.

    , & The vertebrate Hox gene regulatory network for hindbrain segmentation: Evolution and diversification: Coupling of a Hox gene regulatory network to hindbrain segmentation is an ancient trait originating at the base of vertebrates. Bioessays 38, 526–538, doi: 10.1002/bies.201600010 (2016).

  6. 6.

    et al. Conserved and distinct roles of kreisler in regulation of the paralogous Hoxa3 and Hoxb3 genes. Development 126, 759–769 (1999).

  7. 7.

    et al. Independent regulation of initiation and maintenance phases of Hoxa3 expression in the vertebrate hindbrain involve auto- and cross-regulatory mechanisms. Development 128, 3595–3607 (2001).

  8. 8.

    Hox genes and segmental patterning of the vertebrate hindbrain. Am Zool 38, 634–646 (1998).

  9. 9.

    Hox Genes and the Hindbrain: A Study in Segments. Curr Top Dev Biol 116, 581–596, doi: 10.1016/bs.ctdb.2015.12.011 (2016).

  10. 10.

    Hox patterning of the vertebrate axial skeleton. Developmental Dynamics 236, 2454–2463 (2007).

  11. 11.

    et al. Hox patterning of the vertebrate rib cage. Development 134, 2981–2989, doi: 10.1242/Dev.007567 (2007).

  12. 12.

    Hox genes and the global patterning of the somitic mesoderm. Current Topics in Developmental Biology, 47, 155–181 (2000).

  13. 13.

    , & Hox genes and regional patterning of the vertebrate body plan. Dev Biol 344, 7–15, doi: 10.1016/j.ydbio.2010.04.024 (2010).

  14. 14.

    , & Evolution of the Hox Gene Complex from an Evolutionary Ground State. Hox Genes 88, 35–61 (2009).

  15. 15.

    Global posterior prevalence is unique to vertebrates: A dance to the music of time? Developmental Dynamics 241, 1799–1807 (2012).

  16. 16.

    & Nuclear organization of the genome and the potential for gene regulation. Nature 447, 413–417 (2007).

  17. 17.

    & Chromatin decondensation and nuclear reorganization of the HoxB locus upon induction of transcription. Genes & Development 18, 1119–1130 (2004).

  18. 18.

    & Epigenetic Temporal Control of Mouse Hox Genes in Vivo. Science 324, 1320–1323, doi: 10.1126/science.1171468 (2009).

  19. 19.

    et al. Demethylation of H3K27 regulates polycomb recruitment and H2A ubiquitination. Science 318, 447–450, doi: 10.1126/science.1149042 (2007).

  20. 20.

    et al. UTX and JMJD3 are histone H3K27 demethylases involved in HOX gene regulation and development. Nature 449, 731–U710, doi: 10.1038/Nature06145 (2007).

  21. 21.

    et al. H3K27 modifications define segmental regulatory domains in the Drosophila bithorax complex. Elife 3, e02833, doi: 10.7554/eLife.02833 (2014).

  22. 22.

    , & Ultraconserved coding regions outside the homeobox of mammalian Hox genes. Bmc Evol Biol 8 (2008).

  23. 23.

    , , , & A regulatory module embedded in the coding region of Hoxa2 controls expression in rhombomere 2. Proceedings of the National Academy of Sciences of the United States of America 105, 20077–20082 (2008).

  24. 24.

    , , , & Convergent evolution of complex regulatory landscapes and pleiotropy at Hox loci. Science 346, 1004–1006, doi: 10.1126/science.1257493 (2014).

  25. 25.

    & Structural and functional differences in the long non-coding RNA hotair in mouse and human. PLoS Genet 7, e1002071, doi: 10.1371/journal.pgen.1002071 (2011).

  26. 26.

    , , , & Functional Analysis of CTCF During Mammalian Limb Development. Dev Cell 19, 819–830, doi: 10.1016/j.devcel.2010.11.009 (2010).

  27. 27.

    Quaestiones et decisiones in quattuor libros Sententiarum Petri Lombardi Editione Lugdunensi, 1495 i, dist. 27, qu. 2, K., page 114 (1495).

  28. 28.

    Ockham’s razor and the anti-superfluity principle (Principle of parsimony). Erkenntnis 53, 353–374, doi: 10.1023/A:1026464713182 (2000).

  29. 29.

    & Evolution of antennapedia-class homeobox genes. Genetics 142, 295–303 (1996).

  30. 30.

    & Conserved elements within open reading frames of mammalian Hox genes. J Biol 8, 17, doi: jbiol116 [pii]10.1186/jbiol116 (2009).

  31. 31.

    , , & GC/AT-content spikes as genomic punctuation marks. Proceedings of the National Academy of Sciences of the United States of America 101, 16855–16860 (2004).

  32. 32.

    et al. Exonic Transcription Factor Binding Directs Codon Choice and Affects Protein Evolution. Science 342, 1367–1372, doi: 10.1126/science.1243490 (2013).

  33. 33.

    et al. A Map of Minor Groove Shape and Electrostatic Potential from Hydroxyl Radical Cleavage Patterns of DNA. Acs Chemical Biology 6, 1314–1320, doi: 10.1021/Cb200155t (2011).

  34. 34.

    , , , & Local DNA Topography Correlates with Functional Noncoding Regions of the Human Genome. Science 324, 389–392 (2009).

  35. 35.

    , & Construction of a genome-scale structural map at single-nucleotide resolution. Genome Research 17, 947–953 (2007).

  36. 36.

    Fisher. Tests of significance in harmonic analysis. Proc. Roy. Soc. Ser. A. 125, 54–59 (1929).

  37. 37.

    , & SCEPTRANS: an online tool for analyzing periodic transcription in yeast. Bioinformatics 23, 1559–1561, doi: 10.1093/bioinformatics/btm126 (2007).

  38. 38.

    , , & Let the data speak. Nature Reviews Molecular Cell Biology 7, C1–C2, doi: 10.1038/Nrm1980-C3 (2006).

  39. 39.

    Correlation in seasonal variations of weather iii. Mem. Indian Meteor. Dep. 21, 13–15 (1914).

  40. 40.

    Studies in Astronomical Time-Series Analysis. 2. Statistical Aspects of Spectral-Analysis of Unevenly Spaced Data. Astrophysical Journal 263, 835–853 (1982).

  41. 41.

    & & Strimmer, K. Identifying periodically expressed transcripts in microarray time series data. Bioinformatics 20, 5–20, doi: 10.1093/bioinformatics/btg364 (2004).

  42. 42.

    Evolution of homeobox genes. Wiley Interdisciplinary Reviews-Developmental Biology 2, 31–45, doi: 10.1002/Wdev.78 (2013).

  43. 43.

    et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40, D306–D312, doi:10.1093/nar/gkr948 (2011).

  44. 44.

    et al. The DNA-encoded nucleosome organization of a eukaryotic genome. Nature 458, 362–U129, doi: 10.1038/Nature07667 (2009).

  45. 45.

    et al. A genomic code for nucleosome positioning. Nature 442, 772–778, doi: 10.1038/Nature04979 (2006).

  46. 46.

    The ENCODE Project Consortium, An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74, doi: 10.1038/nature11247 (2012).

  47. 47.

    & The border between the ultrabithorax and abdominal-A regulatory domains in the Drosophila bithorax complex. Genetics 193, 1135–1147, doi: 10.1534/genetics.112.146340 (2013).

  48. 48.

    et al. The octopus genome and the evolution of cephalopod neural and morphological novelties. Nature 524, 220−+, doi: 10.1038/nature14668 (2015).

  49. 49.

    et al. Human Origin Recognition Complex Binds Preferentially to G-quadruplex-preferable RNA and Single-stranded DNA. Journal of Biological Chemistry 288, 30161–30171, doi: 10.1074/jbc.M113.492504 (2013).

  50. 50.

    , & Influence of rotational nucleosome positioning on transcription start site selection in animals promoters. bioRxiv, doi: (2016).

  51. 51.

    & Comparison between Timelines of Transcriptional Regulation in Mammals, Birds, and Teleost Fish Somitogenesis. PLoS One 11, e0155802, doi: 10.1371/journal.pone.0155802 (2016).

  52. 52.

    & The precise timeline of transcriptional regulation reveals causation in mouse somitogenesis network. Bmc Dev Biol 13 (2013).

  53. 53.

    et al. A complex oscillating network of signaling genes underlies the mouse segmentation clock. Science 314, 1595–1598, doi: 10.1126/science.1133141 (2006).

  54. 54.

    et al. Evolutionary plasticity of segmentation clock networks. Development 138, 2783–2792, doi: 10.1242/Dev.063834 (2011).

  55. 55.

    et al. Comparison of Pattern Detection Methods in Microarray Time Series of the Segmentation Clock. Plos One 3, doi: 10.1371/Journal.Pone.0002856 (2008).

  56. 56.

    et al. GenBank. Nucleic Acids Research 41, D36–D42 (2013).

  57. 57.

    , , , & Basic Local Alignment Search Tool. J Mol Biol 215, 403–410 (1990).

  58. 58.

    & HomeoDB2: functional expansion of a comparative homeobox gene database for evolutionary developmental biology. Evolution & Development 13, 567–568, doi: 10.1111/j.1525-142X.2011.00513.x (2011).

  59. 59.

    Least-Squares Frequency-Analysis of Unequally Spaced Data. Astrophysics and Space Science 39, 447–462 (1976).

  60. 60.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842, doi: 10.1093/bioinformatics/btq033 (2010).

Download references


We thank Zbyszek Otwinowski and Monte Pettitt for valuable discussions. This work was conducted with the support of the Institute for Translational Sciences at the University of Texas Medical Branch at Galveston, supported in part by a Clinical and Translational Science Award (UL1TR000071 and UL1TR001439) from the National Center for Advancing Translational Sciences, National Institutes of Health; by March of Dimes grant 5-FY10-136; and by the National Institutes of Health grant GM112131.

Author information


  1. Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA

    • Bernard Fongang
    • , Fanping Kong
    • , Surendra Negi
    • , Werner Braun
    •  & Andrzej Kudlicki
  2. Institute for Translational Sciences, University of Texas Medical Branch, Galveston, TX, USA

    • Bernard Fongang
    •  & Andrzej Kudlicki
  3. Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, TX, USA

    • Andrzej Kudlicki


  1. Search for Bernard Fongang in:

  2. Search for Fanping Kong in:

  3. Search for Surendra Negi in:

  4. Search for Werner Braun in:

  5. Search for Andrzej Kudlicki in:


A.K. designed the study, B.F. conducted functional annotation analysis, A.K. and B.F. wrote computer programs, F.K. and A.K. analysed GC content profiles; B.F., S.N., W.B. and A.K. analysed DNA hydroxyl cleavage patterns, B.F. and A.K. wrote the paper, prepared tables and figures.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Andrzej Kudlicki.

Supplementary information


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Creative Commons BYThis work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit