MHC-dependent mate choice is linked to a trace-amine-associated receptor gene in a mammal

Major histocompatibility complex (MHC) genes play a pivotal role in vertebrate self/nonself recognition, parasite resistance and life history decisions. In evolutionary terms, the MHC’s exceptional diversity is likely maintained by sexual and pathogen-driven selection. Even though MHC-dependent mating preferences have been confirmed for many species, the sensory and genetic mechanisms underlying mate recognition remain cryptic. Since olfaction is crucial for social communication in vertebrates, variation in chemosensory receptor genes could explain MHC-dependent mating patterns. Here, we investigated whether female mate choice is based on MHC alleles and linked to variation in chemosensory trace amine-associated receptors (TAARs) in the greater sac-winged bat (Saccopteryx bilineata). We sequenced several MHC and TAAR genes and related their variation to mating and paternity data. We found strong evidence for MHC class I-dependent female choice for genetically diverse and dissimilar males. We also detected a significant interaction between mate choice and the female TAAR3 genotype, with TAAR3-heterozygous females being more likely to choose MHC-diverse males. These results suggest that TAARs and olfactory cues may be key mediators in mammalian MHC-dependent mate choice. Our study may help identify the ligands involved in the chemical communication between potential mates.

3.1. Table S4: Names, sampling periods and sample sizes concerning the eleven day roosts that contributed individuals to this study. 9 3.2. Figure S1. Geographic and genetic structure among the eleven day roosts that contributed individuals to this study. 10 3.3. Table S5: Number of individuals genotyped for each allele found in S. bilineata 11 3.4. Figure S2: Names, nucleotide and amino acid sequences of MHC-I alleles 12 3.5. Figure S3: Names, nucleotide and amino acid sequences of MHC-II alleles 13 3.6. Figure S4: Names, nucleotide and amino acid sequences of TAAR alleles 14 3.7. Figure S5. Comparison between males that were available for TAAR3-homozygous and TAAR3-heterozygous females. 15 3.8. Figure S6. Collinearity among the five MHC indices 16 4.

DNA extraction and barcoded amplicon production
All wing punch samples were stored in ethanol. DNA extraction was performed using a commercial kit (First-DNA all tissue kit) by GEN-IAL GmbH (Troisdorf, Germany) according to the supplier's instructions. Amplicon production was carried out for each sample with four dedicated PCR reactions using the primers given in Table S1.
PCR conditions are summarized in Table S2. We used long within-cycle extension times in order to minimize chimera generation. All primer sequences were designed for this work.

Supplemental
*Since TAAR genes have one single exon, the corresponding protein domains of the sequenced DNA are given for them. EL: Extracellular loop; TM: transmembrane domain; IL: intracellular loop. All domains are predicted based on amino acid sequence alignments with human or mouse homologue proteins. The G protein-coupled receptor structures 2Y03 and 1F88 of the PDB (http://www.pdb.org) were used as references.
# Lengths are based on annotated sequences of homologue exons from horse or domestic cat 1 . § These primers were multiplexed in a single PCR for each individual. All PCRs were carried out in 35 reaction cycles with an initial denaturation of 3 min at 98°C, extension time of 60 seconds at 72°C and a final extension of 5 min at 72°C.

Bioinformatics Pipeline for the Amplicon Sequencing Data Processing
Here we aim at giving a detailed description on the bioinformatic pipeline that we employed, from the raw-data stage until allele-calling. This workflow is written in chronological order and in a "cookbook" form in order to facilitate comprehensibility and reproducibility. The comments, metrics and other results which are specific to the sequences produced in this work are shown separately and in italics.
Except for when stated differently, all computations were performed in a dedicated Linux environment 2nd. Quality filter: a. Perform first quality filter using a Python script (all scripts available upon request) with FASTQ parameters "q30" and "p75", meaning that reads with FASTQ quality score < 30 in more than 25% of the bases in both directions are completely deleted.
b. Delete reads with clear artifacts (passages with repeats of A, C, G, T, GC or GT at least 15 bases long).
c. Delete reads with primer or tag errors (no error tolerance a. Compare all sequences in a pool in order to identify duplicated (or redundant) sequences.
This aims at decreasing the sizes of the files and is done with the Tally algorithm 2 .
b. Delete all sequences with frequency of 1 (singletons).
c. Produce FASTA files with identifiers such as ">S12:C45". This identifies a specific sequence as "S12", present with 45 copies.    1, 2: The numbers refer to the methodological filters employed (mother, father and at least one further candidate male had to be known and successfully genotyped) and indicate the individuals originally sampled (1) and those actually assessed for the mate choice analyses (2).
*One female that originally roosted in the CG colony immigrated into the BH colony, thus the number of females per colony adds up to 148 while the real total number of females is 147. Figure S1. Geographic and genetic structure among the eleven day roosts that contributed individuals to this study.

Supplemental
a: Simplified map of the La Selva Biological Station area in Costa Rica. The map was drawn with Inkscape 7 based on a reference map explicitly released into the public domain 8 in 2012. b: ΔK plot comparing different numbers of assumed populations. The K value inferred to best fit the data was 7 (K = 1 up to K = 10 tested). The analysis was carried out with STRUCTURE (v2.3.4) 9 and the plot (Evanno method for detecting the number of K) was created with STRUCTURE Harvester 10 . c: STRUCTURE 9 bar plot in which the ancestry of individuals was inferred after averaging ten STRUCTURE 9 runs with CLUMPAK (Cluster Markov Packager Across K) 11 for K = 7. Each column represents one individual while each block includes all individuals from each colony, which are indicated below each block. Further parameters used were as follows: Burn-in period: 2.5 x 10 5 ; Replicates: 10 5 ; INFERALPHA=1; INFERLAMBDA=0.
The STRUCTURE analysis was performed based on eight microsatellites previously genotyped 12 for 1026 individuals, which include all bats genotyped for the MHC (see Table S4). It provides an idea about the relatively high degree of admixture, likely due to dispersion and gene flow, among the eleven colonies. As expected from their geographic locations, the colonies PH and STR3000 stand out by showing some degree of isolation, while BH seems highly influenced by its neighbors. Importantly, the differences among colonies do not affect mate choice tests, since females, real and candidate fathers are always from the same colony. Additionally, 88.3% of all mate choices investigated in our study took place in BH, the largest colony (Table S4).  Figure S2. Names, nucleotide and amino acid sequences of MHC-I alleles 3.6. Supplemental Figure S4. Names, nucleotide and amino acid sequences of TAAR alleles 3.7. Supplemental Figure S5. The distributions make evident that the two male groups have a nearly identical diversity indices. The slightly higher values among the "hom" group (males available for homozygous females) additionally corroborate the results of the TAAR3/female choice interaction analysis, which, if biased by male sampling, would be expected to be skewed in the opposite direction (higher MALDiv and MAADiv for mating partners of TAAR3-homozygous females).

Supplemental Figure S6. Collinearity among the five MHC indices
The lower panels display Pearson correlations (R²) while the upper panels present scatter plots among each pair of variables (MHC indices) which are given in the diagonal. For both MHC classes, collinearity is high among the diversity indices (MALDiv and MAADiv) but not among the dissimilarity indices (MALDis, CALDis and µAADis). The function for the plots was adapted from the panel.cor function published elsewhere 13 .