Evidence for opposing selective forces operating on human-specific duplicated TCAF genes in Neanderthals and humans

TRP channel-associated factor 1/2 (TCAF1/TCAF2) proteins antagonistically regulate the cold-sensor protein TRPM8 in multiple human tissues. Understanding their significance has been complicated given the locus spans a gap-ridden region with complex segmental duplications in GRCh38. Using long-read sequencing, we sequence-resolve the locus, annotate full-length TCAF models in primate genomes, and show substantial human-specific TCAF copy number variation. We identify two human super haplogroups, H4 and H5, and establish that TCAF duplications originated ~1.7 million years ago but diversified only in Homo sapiens by recurrent structural mutations. Conversely, in all archaic-hominin samples the fixation for a specific H4 haplotype without duplication is likely due to positive selection. Here, our results of TCAF copy number expansion, selection signals in hominins, and differential TCAF2 expression between haplogroups and high TCAF2 and TRPM8 expression in liver and prostate in modern-day humans imply TCAF diversification among hominins potentially in response to cold or dietary adaptations.

. Miropeats analysis reveals structure similarity and dissimilarity at the 7q35 TCAF locus between Haplogroup 5 and the chimpanzee haplotype. Colored arrows are annotated TCAF SDs and lines connecting the sequences show regions of homology. Additional annotations include segmental duplication (SegDup) tracks and the predicted gene models using FLNC transcripts from the chimpanzee lymphoblast cells and six human tissues for the chimpanzee and Haplogroup 5 sequences, respectively (Methods). Note that the numbers above overlapping between two BAC clones indicate the percent sequence identity (#identical bases/total bases).
Supplementary Fig. 5. Miropeats analysis reveals structure similarity and dissimilarity at the 7q35 TCAF locus between Haplogroups 4 and 5. Colored arrows are annotated TCAF SDs and lines connecting the sequences show regions of homology. The orange bar indicates the location of a 100 kbp inversion between these two haplogroups. Additional annotations include segmental duplication (SegDup) tracks and the predicted gene models using FLNC transcripts from six human tissues for both haplotypes (Methods). Note that the numbers above overlapping between two BAC clones indicate the percent sequence identity (#identical bases/total bases).
Supplementary Fig. 6. Miropeats analysis reveals structure similarity and dissimilarity at the 7q35 TCAF locus between Haplogroups 2-1 and 5. Colored arrows are annotated TCAF SDs and lines connecting the sequences show regions of homology. The vertical black bars indicate the putative breakpoints for a 129 kbp deletion event in Haplogroup 2-1 with respect to Haplogroup 5. Additional annotations include SegDup tracks and the predicted gene models using FLNC transcripts from six human tissues for both haplotypes (Methods). Note that the numbers above overlapping between two BAC clones indicate the percent sequence identity (#identical bases/total bases). Fig. 7. Miropeats analysis reveals structure similarity and dissimilarity at the 7q35 TCAF locus between Haplogroups 2-2 and 5. Colored arrows are annotated TCAF SDs and lines connecting the sequences show regions of homology. The vertical black bars indicate the putative breakpoints for a >130 kbp deletion event in Haplogroup 2-2 with respect to Haplogroup 5. The right panel shows that the TCAF1A1 copy in Haplogroup 2-2 is a fusion of the TCAF1A1 and TCAF1A2 copies in Haplogroup5. Additional annotations include SegDup tracks and the predicted gene models using FLNC transcripts from six human tissues for both haplotypes (Methods). Note that the numbers above overlapping between two BAC clones indicate the percent sequence identity (#identical bases/total bases).

Supplementary
Supplementary Fig. 8. Miropeats analysis reveals structure similarity and dissimilarity at the 7q35 TCAF locus between Haplogroups 3-1 and 4. Colored arrows are annotated TCAF SDs and lines connecting the sequences show regions of homology. The vertical black bars indicate the putative breakpoints for a ~131 kbp deletion event in Haplogroup 3-1 with respect to Haplogroup 4. Note that the two inferred breakpoints on Haplogroup 4 intersect with two identical (100%) CTAGE sequences. Additional annotations include SegDup tracks and the predicted gene models using FLNC transcripts from six human tissues for both haplotypes (Methods). Note that the numbers above overlapping between two BAC clones indicate the percent sequence identity (#identical bases/total bases).
Supplementary Fig. 9. Miropeats analysis reveals structure similarity and dissimilarity at the 7q35 TCAF locus between Haplogroups 3-2 and 4. Colored arrows are annotated TCAF SDs and lines connecting the sequences show regions of homology. The vertical black bars indicate the putative breakpoints for a >134 kbp deletion event in Haplogroup 3-2 with respect to Haplogroup 4. Additional annotations include SegDup tracks and the predicted gene models using FLNC transcripts from six human tissues for both haplotypes (Methods). Note that the numbers above overlapping between two BAC clones indicate the percent sequence identity (#identical bases/total bases).
Supplementary Fig. 10. Miropeats analysis reveals structure similarity and dissimilarity at the 7q35 TCAF locus between Haplogroups 1 and 3-2. Colored arrows are annotated TCAF SDs and lines connecting the sequences show regions of homology. The vertical black bars indicate the putative breakpoints for a ~130 kbp deletion event in Haplogroup 1 with respect to Haplogroup 3-2. Additional annotations include SegDup tracks and the predicted gene models using FLNC transcripts from six human tissues for both haplotypes (Methods). Note that the numbers above overlapping between two BAC clones indicate the percent sequence identity (#identical bases/total bases).
Supplementary Fig. 11. Miropeats analysis reveals structure similarity and dissimilarity at the 7q35 TCAF locus between the chimpanzee and gorilla haplotypes. Colored arrows are annotated TCAF SDs and lines connecting the sequences show regions of homology. Additional annotations include SegDup tracks and the predicted gene models using FLNC transcripts from the chimpanzee lymphoblast cells for the chimpanzee haplotype (Methods).
Supplementary Fig. 12. Miropeats analysis reveals structure similarity and dissimilarity at the 7q35 TCAF locus between the chimpanzee and macaque haplotypes. Colored arrows are annotated TCAF SDs and lines connecting the sequences show regions of homology. Additional annotations include SegDup tracks and the predicted gene models using FLNC transcripts from the chimpanzee lymphoblast cells for the chimpanzee haplotype (Methods).
Supplementary Fig. 13. Sequence alignment between Haplogroups 4 and 5 TCAF2A amino acid sequences. Only two variants were found between the coding sequences. The blue dashed box indicates the synonymous difference at exon 2, while the red dashed box shows the nonsynonymous change at exon 3, which is beyond the putative inversion breakpoints ( Supplementary Fig. 6). The green lightning bolt represents the inversion breakpoints at the 3 rd intron between the 2 nd and 3 rd exons.  Fig. 21. Pairwise sequence identity of TCAF SDs from the 15 BAC-assembled haplotypes reported in this study. Sequence identity was computed based on single base changes between two sequences and colored accordingly. For convenience, SDs on individual haplotypes were named from the centromeric to telomeric sides indicated by the last number after the underscore in the SD IDs. The coordinates of the SDs can be found in Supplementary Data 6. Supplementary Fig. 22. Phylogenetic reconstruction of the evolutionary history of TCAF SD DupA using BAC-assembled haplogroups. Phylogeny of the TCAF DupA sequences among haplogroups was inferred using BEAST (v.2.6.2) and five independent runs of 10 million iterations of Markov Chain Monte Carlo (Methods). Each number and horizontal bar at an internal node indicate the point estimate for the divergence (in million years, Myr) and its 95% highest posterior density interval, respectively. The inset shows the putative relationships among TCAF DupA sequences, where different types of circles correspond to groups annotated on the inferred phylogeny. Supplementary Fig. 23. Phylogenetic reconstruction of the evolutionary history of TCAF SD DupB using BAC-assembled haplogroups. Phylogeny of the TCAF DupB sequences among haplogroups was inferred using BEAST (v.2.6.2) and five independent runs of 10 million iterations of Markov Chain Monte Carlo (Methods). Each number and horizontal bar at an internal node indicate the point estimate for the divergence (in million years) and its 95% highest posterior density interval, respectively. Branches with posterior probabilities <90% are colored in red. The inset shows the putative relationships among TCAF DupB sequences, where different types of circles correspond to groups annotated on the inferred phylogeny. Supplementary Fig. 24. Phylogenetic reconstruction of the evolutionary history of TCAF SD DupC using BAC-assembled haplogroups. Phylogeny of the TCAF DupC sequences among haplogroups was inferred using BEAST (v.2.6.2) and five independent runs of 10 million iterations of Markov Chain Monte Carlo (Methods). Each number and horizontal bar at an internal node indicate the point estimate for the divergence (in million years) and its 95% highest posterior density interval, respectively. The inset shows the putative relationships among TCAF DupC sequences, where different types of circles correspond to groups annotated on the inferred phylogeny. Supplementary Fig. 25. Reconstruction of phylogeny for the TCAF haplotypes using the 12.3 kbp single-copy unique sequences embedded within the TCAF SD region (Fig. 1). Phylogenetic inferences were performed using BEAST (v2.5; Methods). (A) Sequences from two nonhuman primate and seven human haplotypes. (B) The same sequences from the top panel and the archaic hominin haplotypes. The numbers are the point estimates for the branch times, while the purple bars show their 95% highest probability densities. Branches colored red have <90% posterior probability supports. Supplementary Fig. 26. Haplotypes of the HGDP panel, four archaic hominins, and eight chimpanzee samples using the 428 SNVs from the single-copy unique region embedded with the TCAF SD region. The rows and columns are haplotypes and SNVs, respectively. A hierarchical clustering based on Ward's minimum variance method were performed. Note that the colors for all HGDP haplotypes were transparent, while those for the archaic and chimpanzee samples are solid. Supplementary Fig. 27. Haplotype-based PCA for 1,275 SNVs in the three single-copy unique loci at the TCAF SD region. Haplotypes were inferred computationally for the HGDP panel as well as the four archaic genomes (Methods). Each dot is a modern-day human haplotype, colored according to its population origin. Black and blue triangles are the Neanderthal and Denisovan haplotypes. Haplotype clusters were formed using an iterative approach based on the K-mean clustering technique. The best K (K=12) was determined by minimizing the sum of squares of within-groups distance (the bottom panel).