Survey of allele specific expression in bovine muscle

Allelic imbalance is a common phenomenon in mammals that plays an important role in gene regulation. An Allele Specific Expression (ASE) approach can be used to detect variants with a cis-regulatory effect on gene expression. In cattle, this type of study has only been done once in Holstein. In our study we performed a genome-wide analysis of ASE in 19 Limousine muscle samples. We identified 5,658 ASE SNPs (Single Nucleotide Polymorphisms showing allele specific expression) in 13% of genes with detectable expression in the Longissimus thoraci muscle. Interestingly we found allelic imbalance in AOX1, PALLD and CAST genes. We also found 2,107 ASE SNPs located within genomic regions associated with meat or carcass traits. In order to identify causative cis-regulatory variants explaining ASE we searched for SNPs altering binding sites of transcription factors or microRNAs. We identified one SNP in the 3’UTR region of PRNP that could be a causal regulatory variant modifying binding sites of several miRNAs. We showed that ASE is frequent within our muscle samples. Our data could be used to elucidate the molecular mechanisms underlying gene expression imbalance.

Gene regulation is a fundamental process in the development and maintenance of organisms. In mammalian genomes the variability of gene expression is a current phenomenon 1,2 . It is therefore important to study this variability in order to understand gene regulation. There are different approaches to such studies: expression quantitative trait loci (eQTLs) and Allele Specific Expression (ASE) analyses. The combination of both approaches is highly effective at locating cis-and trans-regulation of gene expression.
An expression quantitative trait locus (eQTL) is a DNA region with some nucleotide sequence differences (Single Nucleotide Polymorphisms, insertion, deletion) that affects the expression level of a gene in cis or trans. They can be identified by expression genome-wide association studies (eGWAS), an analysis method computing the likelihood of a polymorphism affecting gene expression. Unfortunately this type of analysis needs a large number of samples to minimize false-positives 3 . Many human eQTL mapping studies have been carried out [4][5][6] including the recent Genotype-Tissue Expression (GTEx) project 7 . However in cattle there is a lack of studies. So far, there has been only one performed in dairy cattle, in Holstein-Friesians (HF), Jerseys (J) and HFxJ crossbreeds 8 .
Allele specific expression (allelic expression or allelic imbalance) analysis is a robust approach to quantify expression variation between the two haplotypes of a diploid individual distinguished by heterozygous sites 9 . This approach is complementary to identifying variants affecting gene expression with eQTL studies because we can use a smaller number of samples 10 . Genome-wide studies of ASE have been performed in different species (human 11 , mouse 12 or fruit fly 13 ) including livestock species (pig 14 , chicken 15 or sheep 16 ). In addition, some ASE genes were detected to impact economically important traits 10,17 .
In cattle, only two studies have been performed so far, both in Holstein. In the first study, they discovered 473 ASE SNPs across 5 bovine blastocysts (among 2,524 different heterozygous SNPs) 18 . In the second study, they detected 19,082 ASE SNPs (1,060 on average per tissue) across 18 different tissues from one lactating Holstein dairy cow 19 .
In our study, we performed a genome-wide investigation of ASE using 19 Limousine calf muscle samples. We distinguished between imprinting (parental mono-allelic expression) and allele specific expression (not RNA sequencing and sequence alignment. RNA extraction and sequencing was performed as previously described [24][25][26] . Briefly, after transfer to ice-cold RNeasy RLT lysis buffer (Qiagen), LT tissue samples were homogenized using a Precellys tissue homogeniser (Bertin Technologie). Total RNA was isolated using RNeasy Midi columns (Qiagen) and then treated with RNAse-free DNase I (Qiagen) for 15 min at room temperature according to the manufacturer's protocols. The concentration of total RNA was measured with a Nanodrop ND-100 instrument (Thermo Scientific) and the quality was assessed with an RNA 6000 Nano Labchip kit using an Agilent 2100 Bioanalyzer (Agilent Technologies). All 19 samples had an RNA integrity number (RIN) value greater than eight.
The mRNA-Seq libraries were prepared using the TruSeq RNA Sample Preparation Kit (Illumina) according to the manufacturer's instructions. Briefly, Poly-A containing mRNA molecules were purified from 4 μg total RNA of each sample using oligo (dT) magnetic beads and fragmented into 150-400 bp pieces using divalent cations at 94 °C for 8 min. The cleaved mRNA fragments were converted to double-stranded cDNA using SuperScript II reverse transcriptase (Life Technologies) and primed by random primers. The resulting cDNA was purified using Agencourt AMPure XP beads (Beckman Coulter). Then, cDNA was subjected to end-repair and phosphorylation and subsequent purification was performed using Agencourt AMPure XP beads. These repaired cDNA fragments were 3′-adenylated producing cDNA fragments with a single ' A' base overhung at their 3′-ends for subsequent adapter-ligation. Illumina adapters containing indexing tags were ligated to the ends of these 3′-adenylated cDNA fragments followed by two purification steps using Agencourt AMPure XP beads. Ten rounds of PCR amplification were performed to enrich the adapter-modified cDNA library using primers complementary to the ends of the adapters. The PCR products were purified using Agencourt AMPure XP beads and size-selected (200 +/− 25 bp) on a 2% agarose Invitrogen E-Gel (Thermo Scientific). Libraries were then checked on an Agilent Technologies 2100 Bioanalyzer using the Agilent High Sensitivity DNA Kit and quantified by quantitative PCR with the QPCR NGS Library Quantification kit (Agilent Technologies). After quantification, three different tagged cDNA libraries were pooled in equal ratios and a final qPCR check was performed post-pooling. Each library pool was used for 2 × 100 bp paired-end sequencing on one lane of the Illumina HiSeq2000 with a TruSeq SBS v3-HS Kit (Illumina). After sequencing, the samples were demultiplexed and the indexed adapter sequences were trimmed using the CASAVA v1.8.2 software (Illumina). The quality of the raw sequence reads was assessed using FastQC and Qualimap 27 .
The Bos taurus reference genome sequence was downloaded from Ensembl (release 91, Bos taurus-UMD3.1.dna.toplevel.fa). To align the reads to the assembled reference genome the STAR RNA-Seq (version 2.4.2a) aligner was used 28 . Default values were used for mapping except for the intron alignment (align-IntronMin: 20 and alignIntronMax: 500,000). Reads for each sample were mapped separately to the reference genome sequence. Only paired reads were retained for alignment. The number of paired-reads uniquely aligning to transcribed regions of each transcript was calculated for all genes of the annotated transcriptome. The transcript paired-read count was calculated as the number of unique paired-reads that aligned within the exons of each transcript, based on the coordinates of mapped reads. SNP identification and annotation. SNPs were called following the best practices from GATK (version 3.4-46) with HaplotypeCaller for DNA and RNA sequence data respectively 29,30 . First, reads were subjected to local realignment, coordinate sorting, base quality score recalibration and indel realignment. We then performed SNP and indel discovery and genotyping. In the GATK analysis, we used a minimum confidence score threshold of Q30 with default parameters. We also used multi-sample variant calling in order to distinguish between a homozygous reference genotype and a missing genotype among the analysed samples. SNPs were annotated with VEP 31 using the transcript set from Ensembl 87.
Detection of ASE SNPs. We used ASEReadCounter 9 to calculate read counts per allele. We performed an N-masking (replacing for each identified variant the nucleotide of the bovine genome reference sequence by N) to remove mapping bias and we only kept overlapping heterozygous SNPs from DNA and RNA to remove discordant genotypes, possibly due to imprinting or RNA editing. We only kept candidates with minimum 10 reads for at least one allele. To determine if the imbalance was significant, we used a binomial test against an allelic ratio of 0.5 with a p-value of 5% (Python).
Correlation analysis. The SNP being tested for ASE might not be the variant regulating the expression of the gene. So in order to determine the SNPs within the regulatory regions or potentially the regulatory variant itself, we detected SNPs in linkage disequilibrium with our ASE SNPs using PLINK 1.9 32 (intra-chromosomal analysis and r 2 > = 0.75). We used HTSeq-count 33 to determine the number of reads for each transcript per individual and normalised this using DESeq2 34 . We computed the Spearman's rank correlation coefficient between the genotypes of ASE SNPs or SNPs in LD and expression level of the corresponding transcript. We performed a correction for multiple testing, for the same transcript, using the Bonferroni correction.

ASE SNP validation.
First-strand cDNA was synthesized from 500 ng of DNase I-treated total RNA using the SuperScript III First-Strand Synthesis System kit (Thermo Fisher Scientific) and oligo-dT primers with random hexamers, according to the manufacturer's instructions in a total volume of 20 μl. The resulting cDNA was diluted 1:10.
PCR and Pyrosequencing primers were designed using PyroMark Assay Design 2.0 (Qiagen) with sequences previously masked with RepeatMasker 35 . One of the forward or the reverse PCR primer had a 5′-biotin modification and was HPLC-purified. Primers were synthesized by IDT and are listed in Table S1. Polymerase chain reactions were performed in 50 μl using 1 μl of diluted cDNA or 100 ng of genomic DNA, 1 U GoTaq DNA polymerase (Promega), 1X PCR buffer, 1.5 mM MgCl 2 , 200 μM of each dNTP and 0.3 μM of each PCR primer. The following touchdown cycling protocol was used: 95 °C for 2 min, followed by 13 cycles of 95 °C for 1 min, 1 min of annealing (the annealing temperature was progressively lowered from 68 to 56 °C in steps of 1 °C every cycle) and 72 °C for 1 min 30 s. These initial cycles were followed by 20 cycles of 95 °C for 1 min, 55 °C for 1 min and 72 °C for 1 min 30 s, and a final extension step at 72 °C for 10 min. To check the quality of the amplification 10 μl of PCR products were then analysed by gel electrophoresis with a 1% agarose gel.
Biotinylated PCR products (20 μl) were immobilized on streptavidin-coated Sepharose beads (GE Healthcare), purified, washed and denatured using a 0.2 M NaOH solution and rewashed all using the PyroMark Vacuum workstation (Qiagen) as recommended by the manufacturer. Purified single-stranded PCR product was annealed to the pyrosequencing primer (diluted to 0.3 μM) and then sequenced using the PyroMark Q24 system (Qiagen), following the manufacturer's instructions. For validating candidate ASE SNPs, DNA and RNA (cDNA) from each sample were pyrosequenced simultaneously. The proportions of individual alleles for each SNP were obtained using the PyroMark Q24 software version 1.0.10 (Qiagen). Genomic DNA was examined to confirm the heterozygosity. The final ASE ratio for each SNP of each sample analysed was calculated using the following formula: ASE ratio = (allele 1%/allele 2%) RNA/(allele 1%/allele 2%) genomic DNA.
Prediction of microRNA binding sites. Prediction of microRNA (miRNA) binding sites was done as follows: first, for SNPs within 3′UTR regions, flanking sequences (+/−100 bases) were retrieved using the whole-genome reference sequence (UMD3.1). Then we created two versions of this sequence, one with the reference allele and one with the alternate allele. Next we used miRanda 36 for both sequences with all known bovine miRNAs using the default parameters. Bovine miRNA sequences were retrieved from the miRBase database (version 21). To finish, we selected miRNAs which could bind only one of these two sequences.

Results and Discussion
DNA and RNA sequencing data statistics. Sequencing of all 19 whole-genome sequences generated a total of 5.3 billion of raw paired-end reads corresponding to 537.51 Gb. Approximately, 92 to 400 million pairedend reads were obtained for each library. On average, 83% (56-92%) of the paired-end reads were properly aligned with BWA on the UMD3.1 bovine reference genome (Table S2).
Sequencing of all 19 RNA-Seq libraries generated a total of 1.4 billion raw paired-end reads. Approximately, 35 to 180 million paired-end reads were obtained for each library. On average, 89% (86-91%) of the reads were uniquely mapped (Table S3). In a previous study 26 , 17 of our 19 RNA samples were sequenced and mapping was performed using BWA (version 0.5.9-r16) 21 . 63-76% of the mapped reads were aligned. The increase of the mapping rate (on average 17.8% more reads) indicates that STAR performs best. This is largely because STAR is a splice aware aligner. The mapping performance is comparable to other studies done in cattle with STAR and the same reference genome (UMD3.1). For instance 90% of transcripts from Holstein-Friesian peripheral blood leukocytes were mapped 37 .
The count of transcripts was performed using HTSeq-count 33  We found 67.8% of SNPs from WGS data as intergenic. This percentage is in agreement with the 70.4% of the intergenic part of the bovine genome. This proportion is also similar in others studies done in cattle. For instance 73% of intergenic, 26.2% of intronic, 4.26% of downstream gene and 4.14% of upstream gene variants were found in Hanwoo and Yanbian cattle 38 or 65.6% of intergenic and 33.6% were identified of intronic variants in Qinchuan cattle 39 . Interestingly, we found 20.20% (54,410) of SNPs identified from our RNA-Seq data as intergenic. These SNPs could be located in transcripts of large intergenic non-coding RNAs. Indeed, we found 7,706 (14,16%) intergenic SNPs from our RNA-Seq data within lincRNAs previously identified from six of our samples by Billerey and collaborators 25 . We also found 39.61% of SNPs identified from our RNA-Seq data in intronic regions. These SNPs could be from premature transcripts (before splicing).  (Table 2). We focused on overlapping SNPs identified from WGS and RNA-Seq data and checked the concordance between their genotype. This overlap is on average 90% (75.7% to 96.0%) concordant (69% for both homozygous and 31% for both heterozygous). For the 10% discordant SNPs, 84.3% are homozygous from DNA-Seq and heterozygous from RNA-Seq data. This could be explained by RNA editing. 15.7% are heterozygous from DNA-Seq and homozygous from RNA-Seq; this could be explained by gene imprinting (mono-allelic expression). Alternatively, discrepancies between DNA and RNA genotypes could be due to sequencing errors. To study the allelic imbalance, we only kept the heterozygous concordant SNPs. ASE SNP identification. Using ASEReadCounter we calculated reads count per allele for all heterozygous concordant SNPs from alignment to the UMD3.1 reference genome sequence and the N-masked genome sequence. On average, the N-masking removed 27.1% of the candidate SNPs from ASE detection. We identified 6,908 ASE SNPs (Table S4) in 2,451 genes corresponding to 9.8% of all bovine genes (25,066), 15% of the genes with detectable expression in Longissimus thoraci muscle (16,338) and 20% of the genes with at least one heterozygous SNP (12,269). On average, we detected 574 ASE SNPs per individual (min: 184, max: 991) corresponding to 3.2% of the heterozygous SNPs from RNA-Seq data (Table S5). Last, we removed ASE SNPs within CNV regions previously identified within our Limousine animals 40 and kept 5,658 ASE SNPs located in 2,119 genes. We then checked the distribution of the ASE SNPs across chromosomes. There is a weak correlation between the number of ASE SNPs per chromosome and the size of the chromosomes (ρ = 0:45, p-value = 0.015). However, the number www.nature.com/scientificreports www.nature.com/scientificreports/ of ASE SNPs per chromosome is strongly correlated with the number of coding genes (ρ = 0:84, p-value = 9.13 E-09) and with the number of expressed genes (ρ = 0:85, p-value = 4.81 E-09) (Fig. 1).

RNA-Seq and DNA-
We compared our detected ASE SNPs with ASE SNPs previously identified by Chamberlain and collaborators in a Holstein muscle sample 19 . In their study, ASE detection was performed on one lactating dairy cow using TOPHAT2 41 for the read alignment and a Chi-squared test. We found 118 ASE SNPs in common with the 2,006 ASE SNPs from Holstein muscle representing 5.9% of their detected ASE SNPs. We investigated why we do not detect the remaining ASE SNPs in our results. 684 of these SNPs (34.1%) were not polymorphic in our Limousine animals, 43   www.nature.com/scientificreports www.nature.com/scientificreports/ located on the chromosome X (excluded because we have only males). For the 1,123 remaining ASE SNPs (60.0%) identified in Holstein muscle, we found at least one heterozygous Limousine animal. This discrepancy might be due to differences in ASE detection methods or in breed gene regulation.

Functional annotation of ASE SNPs and of their genes. 4,193 of the detected ASE SNPs were located
within cattle QTL regions reported in Animal QTLdb 42 (Table S6). Interestingly, 1,213 of these ASE SNPs were inside QTL regions found in Limousine and 2,107 of these SNPs were in QTL regions linked to growth or meat traits.
In order to study the impact of genes affected by ASE on specific biological pathways, we performed a Gene Ontology (GO) enrichment. This analysis was carried out by first converting the cow gene list into a human gene list using Biomart 43 . This resulted in a list of 2,143 genes that was tested for enriched GO terms using the GOrilla tool 44 with a background gene list of all expressed genes in Longissimus thoraci muscle (13,998).
In total, the genes showing ASE corresponded to 127 enriched functions (q-value < 0.05), with many of these related to striated muscle development (Table S7). The top 20 most-enriched terms are presented in Fig. 2. Thirteen functions were related to muscle functions or components: contractile fiber part (GO:0044449), Z disc (GO:0030018), actin binding (GO:0003779), actin filament-based process (GO:0030029), cytoskeletal protein binding (GO:0008092), muscle contraction (GO:0006936), muscle system process (GO:0003012), structural constituent of muscle (GO:0008307), actin filament binding (GO:0051015), muscle alpha-actinin binding (GO:0051371), sarcomere organization (GO:0045214) and M band (GO:0031430). The seven GO terms not directly related to muscle were linked to intracellular part and/or organelle and can be associated with contractile fibre part, mitochondrion or nucleus. ASE validation. We used Pyrosequencing in order to validate ASE SNPs. Several filters were applied to narrow down the number of ASE SNPs to test. Firstly, we kept ASE SNPs present in a QTL region associated with growth or meat quality traits reported in Animal QTLdb. Secondly, we removed SNPs absent from dbSNP. Then, we only kept ASE SNPs present in exonic, 5′UTR or 3′UTR regions. Finally, we selected two ASE SNPs located within CAST and we choose randomly four extra ASE SNPs.
We tested these 6 ASE SNPs by Pyrosequencing with replicates (Table 3). Technical replicates obtained from independent experiments show standard deviations ranging from 0-4% indicating that our Pyrosequencing procedure has negligible inter-PCR and Pyrosequencing variations. The allele frequencies determined for genomic DNA samples, which we analysed in duplicate showed an average variation of 2% +/− 1% (n = 4). For the cDNA samples, the average variation between replicates was 2% +/− 2% (n = 4).We could therefore detect allele frequency differences larger than 4%. Five ASE SNPs were validated by Pyrosequencing. For example, we observed for the validated ASE SNPs rs110694123 in PALLD gene 47% for allele G (complementary base of C) and 53% for allele A (complementary base of T) in gDNA and we observed 33% and 67% in cDNA (Fig. 3). We get an ASE ratio of 1.80 showing an allelic imbalance in favour of allele A (it means there is 1.80 more expression of transcripts with the A allele than with the G allele). This is consistent with the ASE ratio computed from the read counts for this SNP (1.52 with 39.67% for G and 60.33% for A).

Cis-regulation of genes showing allele specific expression.
Our detected ASE SNPs are probably not the causative variants, but rather markers in cis with the causative polymorphisms. It is known that the majority of causative SNPs are in regulatory regions instead of coding regions 45 . Therefore, we were looking for a link between ASE SNPs and the putative causative SNPs in cis. With this in mind, we used PLINK to identify all the SNPs in linkage disequilibrium (LD) (r 2 > = 0.75) with our predicted ASE SNPs. We obtained 2,955 SNPs (including ASE SNPs) with genotypes for all the 19 individuals. For each transcript showing allele-specific expression, we calculated the Spearman correlation coefficient score between expression level of genes containing ASE SNPs and www.nature.com/scientificreports www.nature.com/scientificreports/ genotypes of SNPs in LD with ASE SNPs. We computed correlations between 2,794 SNP genotypes and 1,085 unique transcripts, averaging 2.74 SNP genotypes per transcript (min 1, max 37). We found 100 significant correlations with 45 transcripts (ρ > |0.6| and q-value < 0.05) including 42 negative correlations (Table S8). 25 of those correlations involved an ASE SNP.
For example, we found one SNP (C/T, rs41691181) in LD (r 2 = 0.79, distance of 12.5 kb) with a SNP (C/T, rs208256739) in upstream and exonic (synonymous variant) regions of APMAP respectively. The second SNP shows ASE in one individual (LIM8) among the nineteen. The genotypes of the first SNP (8 C/C, 7 C/T, 4 T/T) is significantly correlated (ρ = −0.75 and q-value = 0.000188) to the APMAP level expression. Indeed, we found on average for the 19 animals 404, 323 and 214 transcripts (read counts) for C/C, C/T and T/T animals (Fig. 4a) showing an expression bias in favour of the C allele. We investigated how this SNP (rs41691181) in the upstream gene region could cause this allelic imbalance by testing if the SNP could alter Transcription Factor Binding site (TFBS) using TFBS-match 46 with the SNP flanking sequences (+/−10 bases). None of the allele-specific sequences of these SNPs were located in predicted TFBS.
We extended the TFBS search for 5 other SNPs in 5 different genes (5 S rRNA, LRRC66, ENSBTAG00000026637, GLOD4 and PLK1) with a significant correlation in the upstream region without detecting any TFBS.
In another example, we found one SNP (G/A, rs109763272) in LD (r 2 = 0.86, distance of 274 bases) with a SNP (G/A, rs378125518). Both SNPs are in 3′UTR region of the PRNP gene and show ASE in four individuals among the nineteen. The genotypes of the first SNP (8 G/G, 6 G/A, 5 A/A) is significantly correlated (ρ = 0.61 and q-value = 0.0057966) to the PRNP expression level. On average, the PRNP expression level was 4,641 transcripts for G/G individuals, 4,455 for G/A individuals and 3,324 for A/A individuals (Fig. 4b) showing an expression    www.nature.com/scientificreports www.nature.com/scientificreports/ bias in favour of allele G. Given that this correlated SNP is also an ASE SNP, we looked if allele counts estimated with ASEReadCounter is in agreement with the transcript expression level. Indeed, transcripts with the G allele are 1.54 times more expressed than transcripts with the A allele. We investigated how this SNP (rs109763272) in 3′UTR region could cause this allelic imbalance. It is known that polymorphisms in microRNA (miRNA) binding sites may affect miRNA/target gene interaction 47 . Therefore, we used miRanda to detect miRNA binding sites within this SNP flanking region. We predicted 9 miRNAs which could bind the reference allele (G) and 5 miRNAs which could bind the alternate allele (A) (Table S9). Interestingly, we noticed less expression with the alternate allele (Fig. 4b). This could suggest that some of the 5 detected miRNAs binding with the A allele could reduce the expression of PRNP.
We lack data on miRNA expression in our samples but several studies describing catalogs of miRNAs expressed in bovine muscle or skeletal muscle satellite cells have been published [48][49][50][51][52][53][54][55][56][57][58] . However, no study describes so far miRNAs expressed in Limousin animals. We found that all fourteen miRNAs impacted by the SNP rs109763272 are expressed in muscle [50][51][52][53] including in Longissimus dorsi/thoracis 53 (Table S10). We therefore cannot exclude any of the 5 miRNAs binding to the A allele or any of the 9 miRNAs binding to the G allele, as candidate PRNP regulators. Further work is needed to identify which if any of these candidate miRNAs reduce PRNP expression level.
We extended the miRNA binding sites prediction analysis to all SNPs with a significant correlation and located in a 3′UTR region (Table S9). We analysed 13 additional SNPs present in 6 other genes (1 SNP in ANKRD, 1 in CCDC90B, 2 in FAM32A, 2 in TYK2, 3 in IMP3 and 4 in TTC3). We found no binding sites for 3 of these SNPs and for the remaining 10 SNPs we always found allele-specific binding sites for both alleles (Fig. S1) including 8 SNPs with a lower expression with the alternate allele. This could suggest that some of the detected miRNAs are binding with the alternate allele to reduce the gene expression. We found 2 SNPs with a lower expression of the reference allele. Similar to the alternate allele, the detected miRNAs binding with the reference allele could reduce gene expression. Survey of miRNAs expressed in bovine muscle allowed us to exclude only eleven miRNAs (Table S10). Further work is needed to identify which SNPs impact target sites of the remaining 386 miRNAs.
For most of the 45 genes for which we had a significant correlation between expression level and SNP (ASE SNP or SNP in LD with an ASE SNP) genotypes we couldn't find SNPs altering TFBSs or the binding sites of miRNAs. It is therefore likely that epigenetic mechanisms might also play a role, rather than just cis-regulatory genetic variants (in TFBS or 3′UTR). ASE genes potentially involved in meat quality traits. The aldehyde oxydase 1 (AOX1) gene encodes a homodimeric protein, which produces hydrogen peroxide. In mouse, it is involved in myogenesis 59 . Therefore, it might play a role in muscle development in cattle. We detected eleven ASE SNPs in this gene with six also detected by Chamberlain and collaborators 19 . Among these 6 ASE SNPs, three had genotypes significantly correlated to the expression of this gene. In addition, we found 13 others SNPs in AOX1 with significant correlation (Fig. S2).
The palladin (PALLD) gene encodes a cytoskeletal associated protein, which exists as multiple isoforms 60 . This actin associated protein plays a significant role in regulating cell adhesion and cell motility. It is also important for the early smooth muscle cell differentiation in mouse 61 . In cattle, palladin might play dual roles (positive and negative) in maintaining the proper skeletal myogenic differenciation 62 . We detected two ASE SNPs in this gene including one experimentally validated by Pyrosequencing. Interestingly, these SNPs are within a QTL region associated with average daily gain (ADG) trait in Hereford 63 .
The calpastatin (CAST) gene encodes an inhibitor of protease μ-calpain, which has a known effect on beef muscle tenderness variation 64 . Interestingly, a more recent study confirmed that CAST affected meet tenderness in Longissimus muscle in Limousine crossed-breed animals 65 . We detected seven ASE SNPs in this gene including two experimentally validated.
These 3 genes could be associated with meat quality and carcass traits. Interestingly, one of the ASE SNPs found in AOX1 is a missense variant. This SNP (rs109201304) modifies a glycine residue into a cysteine amino acid and is located within a protein region conserved in mammals (Fig. S3). This residue (p.G1023C) lies within the substrate pocket subdomain IV of the large C-terminal domain which is important for substrate access and positioning but also in the dimerization of the two AOX1 monomeric subunits 66,67 . Several studies performed on AOX1 variants resulting from rat or human missense SNPs have shown that some of these SNPs increased or decreased the rate of superoxide radical production [68][69][70][71] . Further work is needed to investigate whether r109201304 can affect the catalytic activity of bovine AOX1.
We didn't find any missense polymorphisms in PALLD and CAST but we identified several synonymous variants (2 in PALLD and 2 in CAST). They don't alter the primary sequence of the corresponding proteins however it has been shown that codon usage can vary between genes and that this codon bias can affect RNA secondary structure, splicing and translation 72 . Further work is needed to investigate the phenotypic impact of these variants/genes. Biological relevance of allele specific expression in muscle. Overall we identified 5,658 ASE SNPs in 13% of genes (2,119) with detectable expression in Longissimus thoracis muscle. The high number of genes potentially impacted by allele-specific imbalance prompted us to investigate if some of these ASE SNPs could have a major impact on muscle biology.
First we looked if ASE SNPs could induce a gene loss-of-function. We didn't find any ASE SNP that could create or remove stop codons and causing consequently protein truncations or changes in the open reading frame, respectively. However, we identified 14 ASE SNPs that according to the VEP annotation have or could perturb the splicing of the corresponding gene. Further work is needed to check this potential impact.
www.nature.com/scientificreports www.nature.com/scientificreports/ Second we investigated further the 421 missense ASE SNPs. According to the VEP annotation, only 37 of those missense ASE SNPs are predicted to be deleterious. 95% of these deleterious ASE SNPs are found in only one or two animals. Interestingly, we found one T/C deleterious ASE SNP (chromosome10, position 37,912,737) within ZFP106 in one animal (LIM18). ZFP106 encodes a zinc fingered RNA binding protein. Disruption of Zfp106 in mice induces several skeletal muscle phenotypic abnormalities [73][74][75] , such as severe muscle wasting 74 , loss of muscle strength [73][74][75] and degeneration of muscle fibers 75 in homozygous knock out Zfp106 −/− mice. Heterozygous Zfp106 +/− mice are comparable to wild type littermates 74,75 . These results suggest that ZFP106 might not be a dosage-sensitive gene and that haploinsufficiency of ZFP106 (in ASE SNP heterozygous animals) might not impact muscle physiology. We also found a deleterious ASE SNP (rs110365838) within MAP4, a muscle-specific microtubule associated protein which is expressed in early myogenesis 76 and that is required for muscle cell differentiation 77 . This ASE SNP was detected in two animals (LIM2 and LIM15). We didn't find, so far, any information on potential consequences of deleterious variants within this gene. However, because of the critical role of MAP4 in muscle development, it will be interesting to investigate if the two heterozygous animals for this ASE SNP have normal amount of MAP4 protein.
Third, we examined if ASE SNPs could impact genes important for muscle cell development or function. We focused on ASE SNPs located in downstream, upstream, 5′ or 3′ UTR regions, as they might have an effect on the regulation of the transcription of important genes. We found that myogenin (MYOG), a muscle-specific transcription factor required to induce myogenesis 78 , had in total 21 ASE SNPs, including 5 and 7 in downstream and 3′UTR regions, respectively. However, disruption of murine myogenin showed no overt effects in heterozygous Myog +/− mice 79 suggesting that a potential reduction of MYOG in animals heterozygous for those 12 ASE SNPs might not have phenotypic consequences.

Conclusion
We performed a genome-wide survey of ASE using 19 Limousine muscle samples combining WGS and RNA-Seq data. This analysis shows that ASE is pervasive in beef muscle. We identified 5,658 ASE SNPs located in 2,119 genes and 37.2% of these ASE SNPs are found within QTLs associated to meat or carcass traits. We validated 5 out of 6 selected ASE SNPs suggesting that our pipeline identify mostly true ASE SNPs. In addition, we detected SNPs with genotypes significantly associated with gene expression levels.
For example, we identified one SNP in the 3′UTR region of PRNP that could be a causal mutation by modifying binding sites of several miRNAs. We showed that our in silico ASE approach can facilitate the identification of candidate cis-regulatory SNPs. However, further work is needed to validate these candidates. In the future, functional analyses of the impact of polymorphisms within TF or miRNA binding sites will try to elucidate the molecular mechanisms underlying gene expression imbalance.

Data Availability
RNA-Seq data analysed during the current study is available from the European Nucleotide Archive (accession numbers ERP002220, E-MTAB-2646, E-MTAB-4625 and E-MTAB-6947). The ASE SNPs identified in this study are included in the Table S4.