Introduction

Cheese quality depends on many related, interacting factors, ranging from compositional, functional, sensory and safety characteristics to nutritional, psychological, convenience, processing and economic factors1. Consumer acceptability of dairy products is highly dependent on sensory characteristics2, particularly flavour, an important determinant of quality. During the manufacture and ripening of cheese, enzymes from various sources (native milk enzyme, rennet, lactic acid bacteria, secondary microflora and exogenous enzyme preparations) are responsible for the breakdown of macronutrients (fat, proteins and lactose) into fatty acids, amino acids and lactic acid, the major precursors of volatile organic compounds (VOCs), which play a significant role in determining cheese flavour3,4.

Several studies aimed at characterizing the volatile fraction of various cheeses have been conducted5,6,7, most of them using solid-phase micro-extraction (SPME)-GC-MS equipment. Proton-transfer-reaction time-of-flight mass spectrometry (PTR-ToF-MS), however, is a more time-efficient and sensitive method for characterising the cheese VOC fingerprint8,9. Several factors (e.g. dairy system, herd, individual cow characteristics) have been shown to affect the cheese volatilome9,10 and evidence for the existence of an exploitable genetic variation in the cheese VOC profile has also recently been put forward11, suggesting there is potential to modify cheese flavour through selective breeding in order to improve cheese quality.

Genome-wide association studies (GWAS) have been widely used to disentangle the genomic architecture underlying complex traits in dairy cattle12,13,14. It has become common to couple GWAS with biological pathway analysis to extract biological information from the GWAS data and overcome the limitations of this method, such as the its reduced ability to detect small-effect loci and its poor replication15,16,17. The genomic and biological information thus acquired makes it possible to elucidate the genetic basis and molecular mechanisms underlying complex traits on the one hand, and, on the other hand, to increase the accuracy of genomic prediction when incorporated into prediction models18,19.

Herein, we investigated whether cow’s genetic background contributes to variability in the cheese volatilome and, therefore, might play a role in determining cheese flavour. The potential existence of a genomic control for VOC profile in cheese would be of considerable significance given the economic importance of cheese quality to the dairy industry. To our knowledge, there is no existing information on whether there is a relationship between the cow’s genome and the cheese VOC profile, nor on the biological functions that may be involved in regulating the cheese volatilome. The aim of this study, therefore, was i) to perform GWAS analyses for milk and cheese composition traits in dairy cows, and for cheese VOC profiles determined by proton-transfer-reaction time-of-flight mass spectrometry (PTR-ToF-MS), and ii) to carry out pathway analyses on the SNP markers, in order to identify genomic regions and biological mechanisms that contribute to the variability in cheese volatilome.

Results

Descriptive statistics and genomic heritability estimates for milk and cheese composition are reported in Table 1. We found milk fat percentage to have a relatively low heritability (0.08), and confirmed protein percentage as being under strong genetic influence (0.40). Lactose percentage was moderately heritable (h2 = 0.22), while heritability estimates were, instead, close to 0 for the milk fat to protein ratio, and cheese fat and protein, which depend mainly on the cheese-making procedure. Table 2 shows the concentrations and heritabilities for some of the spectrometric peaks associated with the VOCs of model cheeses measured by PTR-ToF-MS. Among the tentatively identified spectrometric peaks, those associated with dimethylsulfone m/z 95.017 (0.22), alkyl fragment (terpenes) m/z 81.070 (0.15), butan-1-ol/pentan-1-ol, heptan-1-ol m/z 75.080 (0.14) and hexanal/nonanal m/z 83.086 (0.10) had moderate heritabilities. Among the unknown compounds, the peaks at m/z 85.029 (0.22), m/z 135.134 (0.18), m/z 66.063 (0.18), m/z 48.053 (0.17), m/z 44.980 (0.15), m/z 83.071 (0.15) and m/z 169.044 (0.15) had the highest heritabilities.

Table 1 Descriptive statistics and genomic heritability (h2) for milk and cheese composition.
Table 2 Descriptive statistics and genomic heritability (h2) for some spectrometric peaks from PTR-ToF-MS analysis of model cheeses*.

Results of the GWAS analyses of milk and cheese composition and cheese VOCs are summarised in Table 3 and Supplementary Table S1. Overall, we detected 186 significant SNPs (P < 5E-05) across all Bos taurus autosomes (BTAs), which were associated to 120 traits. One SNP had an unknown position on the genome, which was significantly associated with m/z 131.107 (P = 1.04E-05). Most of the significant associations were one SNP-one trait (80%).

Table 3 Summary results of the genome wide association analysis for spectrometric peaks from PTR-ToF-MS analysis of model cheeses.

We identified significant associations on BTA4, BTA14, BTA23 and BTA27 for milk fat, with the highest peak corresponding to marker rs42435059 (P = 4.43E-06) located at 39,244,447 on BTA4. We also identified significant associations for milk protein and milk yield on BTA6, the highest signal being associated with milk protein and corresponding to the marker rs110239739 (P = 1.32E-07) located at 84,689,991 bp. Only 1 SNP (rs109429918) was significant for the fat-to-protein ratio and this was located on BTA15 at 55,488,319 Mbp. A significant association was found for lactose on BTA16 and corresponded to rs109818696 located at 12,963,666. Significant SNPs for cheese fat were mapped on BTA 14 (~26.17 Mbp) and BTA23 (~42.36 Mbp). We detected very high peaks for cheese protein on BTA16 and BTA20, corresponding to markers rs41798196 located at 21,772,991 on BTA16 (P = 9.62E-08) and rs41631276 located at 46,296,840 on BTA20 (P = 3.76E-07). We found other significant associations for cheese protein on BTA12 at ~15.45 Mbp.

Regarding cheese VOCs, we detected the strongest signals on BTA11 and BTA18. Marker rs41671173 located at 60,150,644 bp on BTA11 was significant for the spectrometric peak at m/z 78.001 (P = 5.30E-07). We detected another strong signal at 16,119,985 bp on BTA18 and corresponded to marker rs41867785, which was associated with the peak at m/z 135.134 (P = 1.10E-07). Overall, this marker was significant for 24 spectrometric peaks, three of which were tentatively associated with butan-1-ol/pentan-1-ol, heptan-1-ol9,11 m/z 75.080 (P = 1.97E-06), 3-methyl-1-butanol/3-methyl-3-buten-1-ol/pentan-1-ol m/z 71.086 (P = 1.20E-05) and hexan-1-ol/hexan-2-ol m/z 85.101 (P = 1.61E-05). The largest regions of consecutive SNPs were located on BTA6 (~81.65–88.07 Mbp) and BTA21 (~40.72–45.33 Mbp). The spectrometric peaks with the highest number of significant SNPs were those associated with ethyl pentanoate (ethyl valerate)/ethyl-2-methylbutanoate/ethyl-3-methylbutanoate (ethyl isovalerate)/heptanoic acid m/z 131.107 (6), m/z 149.045 (5), m/z 48.053 (5), m/z 66.063 (5), the peaks associated with butan-1-ol/pentan-1-ol, heptan-1-ol m/z 75.080 (5), m/z 84.942 (5), m/z 105.039 (5), m/z 117.047 (5), m/z 119.072 (5), m/z 121.122 (5) and m/z 169.044 (5). The chromosomes with the highest number of significant associations were BTA1 (10), BTA3 (9), BTA4 (11), BTA6 (16), BTA16 (9) and BTA21 (14).

Based on the similarity matrix generated with ExpressionCorrelation, we identified 8 sub-networks represented by ≥3 nodes (Fig. 1), within which ClusterOne identified 12 densely connected clusters (P < 0.05; Supplementary Table S2). Two clusters were detected in sub-network 1: one with 13 nodes, which included some spectrometric peaks tentatively associated with aldehydes and/or ketones, i.e. hexan-1-one/hexan-2-one/hexanal m/z 101.097, heptan-2-one m/z 115.112, octan-1-one m/z 129.127 and nonan-2-one m/z 143.143; the other with 7 nodes, which included some spectrometric peaks associated with aldehydes, ketones or alcohols, i.e. propan-2-one (acetone) m/z 59.049, 1,2-pentanediol m/z 105.091 and 2-methylbutanal/3-methylbutanal/pentan-2-one m/z 87.080. A cluster of 16 nodes was significant in subnetwork 2, which included the spectrometric peak associated with butan-1-ol m/z 75.080. Sub-network 3 comprised 2 clusters with 7 nodes, which contained some spectrometric peaks associated with esters and/or alcohols, i.e. ethyl hexanoate/octanoic acid m/z 145.123, ethyl butanoate/ethyl-2-methylpropanoate (ethyl isobutyrate) m/z 117.091 and hexanoic acid m/z 99.081. Two clusters were detected in sub-network 4: the first with 4 nodes, including the spectrometric peak associated with acetates/acetic acid m/z 61.028; the second also included the spectrometric peaks tentatively identified as 3-hydroxy-2-butanone (acetoin) m/z 89.060 and butanoic acid m/z 71.049. Two clusters with 5 nodes were significant in sub-network 5, which included spectrometric peaks associated with the alkyl fragment m/z 43.054, m/z 41.039 and m/z 57.070. Sub-network 6 contained a significant cluster which included the spectrometric peak associated with 2,6-dimethyl pyrazine m/z 109.070. Sub-networks 7 and 8 contained clusters of 4 and 3 nodes, respectively, neither of which included any tentatively identified spectrometric peaks.

Figure 1
figure 1

Similarity network among cheese volatile compounds generated using ExpressionCorrelation. The nodes corresponded to cheese VOCs and the edges represented the similarity between vectors of the additive effects of all SNPs. Only correlations with r > |0.80| and P < 0.01 are represented. Eight sub-networks of ≥3 nodes were identified which contained significantly dense clusters of VOCs (P < 0.05) detected by ClusterOne. The width of the edge indicates the value of the correlation; a wider edge corresponds to a higher correlation in absolute value.

Pathway analyses

Of the total 37,568 SNPs used in this study, 17,006 were located 15 kb up- or down-stream of the coding regions. An average of around 900 genes were significant (P < 0.05) for the peaks tentatively associated with cheese VOCs. We carried out pathway analyses to shed light on the biological role of these genes and to identify potentially overrepresented pathways or molecular functions that might help explain the variability in the cheese volatilome.

Overall, pathways of 5 of the 45 tentatively identified compounds were significantly enriched (FDR < 0.05) (Fig. 2, Supplementary Table S3). Results showed that purine metabolism was enriched for the peak associated with phenol m/z 95.049 (FDR = 0.00017), while the tight junction pathway was overrepresented for the spectrometric peaks associated with heptan-2-one20 m/z 115.112 (FDR = 0.00013) and ethyl pentanoate (ethyl valerate)/ethyl-2-methylbutanoate/ethyl-3-methylbutanoate (ethyl isovalerate)/heptanoic acid m/z 131.107. Furthermore, the nitrogen metabolism pathway was significantly enriched for the peak associated with ethyl pentanoate (ethyl valerate)/ethyl-2-methylbutanoate/ethyl-3-methylbutanoate(ethyl isovalerate)/heptanoic acid (FDR = 0.00019) m/z 131.107. Finally, the long-term potentiation pathway was enriched for the peaks associated with octan -1–one m/z 129.127 and nonan-2-one m/z 143.143 (FDR = 0.00023 and FDR = 0.00024, respectively).

Figure 2
figure 2

Significantly enriched KEGG pathways using genes associated to spectrometric peaks with a tentative identification from PTR-ToF-MS analysis of model cheeses. Only the traits showing significantly enriched terms are reported (FDR < 0.05). EPE_E2MB_E3MB_HA: Ethyl pentanoate (ethyl valerate)-Ethyl-2-methylbutanoate-Ethyl-3-methylbutanoate (ethyl isovalerate)-Heptanoic acid.

Discussion

GWAS analysis

In recent years, there has been growing concern about food quality and safety from both the demand and the supply sides. Given that flavour attributes play a crucial role in cheese quality21, better knowledge of the key flavour components and pathways involved in the development and characterisation of cheese VOCs would provide a useful basis for defining cheese-making procedures more precisely, and improving cheese sensory characteristics. There is also increasing interest in the authentication of traditional cheeses with EU protected designation of origin classification, which are often linked to local breeds and help maintain farm animal biodiversity22. In this study, therefore, we first sought to investigate whether the cow’s genome organisation significantly impacts on the cheese volatilome, and possibly cheese flavour.

Although plenty of GWAS studies for milk production traits13,23,24 and cheese-making properties25,26 in dairy cows have now been published, to our knowledge none has focused on identifying the genomic regions associated with cheese composition and quality traits. Despite the lack of GWAS analyses for cheese VOCs, the estimates of genomic heritability found in this study confirm previous findings supporting the existence of an exploitable genetic variation in cheese VOCs11. The main pathways involved in the formation of cheese VOCs are glycolysis (metabolism of lactose, lactate and citrate), lipolysis (and metabolism of fatty acids) and proteolysis (and catabolism of amino acids)3. Accordingly, our GWAS analyses revealed a contribution of cow’s genes related to protein, fat and carbohydrate metabolism.

Protein metabolism

A region of 9 SNPs on BTA6 covered the cluster of casein genes (~87.14–87.38 Mbp) and showed significant association with 12 traits, including milk protein. In particular, 4 spectrometric peaks - m/z 85.029, m/z 149.045, m/z 163.096 and m/z 169.044 - were associated with the marker rs41567942, which was located 0.4 Mb from the gene encoding for k-casein (CSN3), which is essential for milk coagulation and therefore largely influences milk coagulation properties27. Moreover, the marker rs29001782 was located on BTA6 at 4 kb from GNRHR, which signalling pathway has been shown to play a role in controlling milk protein synthesis and metabolism17. Interestingly, the markers rs110300263 and rs111018457, which had significant associations with 10 traits, were located in the region at ~77.29–77.47 Mbp on BTA16, which was close to a quantitative trait locus (QTL) for the milk protein and k-casein percentages28. Two markers, which were associated to m/z 56.045 and m/z 119.107 (rs43096354) and m/z 40.027 (rs42353243), mapped on BTA20 at ~0.3 Mb from FAM169A which has been suggested to be a key regulator of milk protein synthesis in dairy cattle17. Additionally, the region of 7 SNPs on BTA21 included a known QTL for milk fat and protein yield and percentage from the Cattle QTL database information28.

Fat metabolism

The contribution of fatty acid metabolism to cheese VOCs is corroborated by several significant associations. For instance, rs43283349, which was significant for 3-methylbutyl butanoate (isoamyl butyrate)/nonanoic acid m/z 159.138, was located on BTA1 at ~0.1 Mb from AGPAT3, a positional candidate gene for milk FA29. The marker rs110986676, which was located on BTA6 and was significant for m/z 116.078, corresponded to an intron variant of SCD5 which was associated to variation in milk FA composition in dairy cattle16,30. The marker rs110681423, which was associated with m/z 117.047, was located on BTA19 at ~0.2 Mb from GH1 which has been put forward as candidate gene for milk fat percentage and fat composition12,31. The marker rs110858406, associated to m/z 63.044, mapped on BTA26 at ~0.9 Mb from GPAM which is involved in the regulation of milk fat synthesis and composition in dairy cattle32,33. Finally, rs110820252, which had significant associations with the spectrometric peaks associated with the alkyl fragment m/z 42.01 and propanoic acid/ propanoic ester m/z 75.044 mapped on BTA28 within 2 kb 5′ to AGT, which is the sole precursor of all angiotensin peptides. Interestingly, the renin-angiotensin system is believed to impact body-fat storage as well as lipid and carbohydrate metabolism34,35.

Carbohydrate metabolism

A significant association was found between rs110002748 and m/z 117.047, which mapped on BTA8 at ~1 Mb from B4GALT1. This gene encodes an enzyme that participates in glyconjugation and lactose biosynthesis, which occurs exclusively in the mammary gland36. An increase in the expression of B4GALT1 was observed in transition milk samples, and is reflected in an increase in lactose biosynthesis during the earlier stages of lactation37. The high signal detected on BTA11 (rs41671173) was located on BTA11 at ~0.5 Mb from B3GNT2, which synthesizes a unique structure known as poly-N-acetyllactosamine (polyLacNAc), a linear carbohydrate polymer composed of alternating N-acetylglucosamine and galactose residues38. This SNP explained ~60% of additive genetic variance for m/z 78.001. Finally, the high signals on BTA18 corresponded to the marker rs41867785, which is annotated as an intron variant of PHKB. This gene has been associated with the carbohydrate metabolic process, the generation of precursor metabolites and energy, and energy reserve39.

Correlations among VOCs based on SNP additive effects

A greater level of detail concerning the shared genomic basis of cheese VOCs might form the basis for more accurate prediction models to be developed in the context of genomic selection for possible modulation of cheese flavour. In a previous work, we estimated the genetic relationships among cheese VOCs based on pedigree information11. Here, we used ExpressionCorrelation to calculate pairwise correlations between VOCs based on the SNP additive genetic effects, and we clearly identified groups of VOCs sharing a common behaviour. Having tentatively identified some compounds, we sought to associate the largest sub-networks to biochemical pathways and possibly associated flavour notes. Sub-network 1 contained mostly ketones and aldehydes and might, therefore, represent catabolism of amino acids and fatty acids. Branched-chain aldehydes originate from AA degradation, in particular 2-methylbutanal from isoleucine and 3-methylbutanal from leucine40, while ketones can be produced from β-ketoacids derived from β-oxidation of fatty acids41. Green/fruity/floral notes are mostly associated with the compounds included in this group20,40,42. The reaction between free fatty acids and alcohols from lactose and AA degradation yield esters43, common cheese VOCs, and this pathway might be represented in sub-network 3, including the spectrometric peaks associated with hexanoic acid, ethyl hexanoate/octanoic acid and ethyl butanoate/ethyl-2-methylpropanoate (ethyl isobutyrate). Most esters (e.g. ethyl butanoate, ethyl hexanoate, ethyl-2-methylpropanoate) are associated with the sweet, fruity and floral characteristics of cheese44,45,46. Finally, sub-network 4 might represent the glycolysis pathway, and, in particular, lactate or citrate metabolism, since it included the spectrometric peaks associated with the acetate ester fragment/acetic acid, 3-hydroxy-2-butanone(acetoin) and butanoic acid. Lactose is metabolised by starter bacteria, mostly through the glycolytic pathway, into lactate, which might be further metabolised into acetate by lactococci or into butyrate by Clostridium sp.47. Acetate is also the main flavour compound originating from citrate metabolism as well as acetoin47,48. Cheesy, rancid and sour milk notes are associated with 3-hydroxy-2-butanone(acetoin) and butanoic acid45,49, while acetic acid has a typical vinegar odour50.

Pathway analysis

Standard GWAS analysis allows individual loci and genes likely to play a role in controlling the investigated traits. However, it lacks the power to establish whether the detected genes act in cooperation as part of a complex network to control specific biological functions. We therefore carried out pathway analyses to prioritize genes in associated loci that are part of the biological pathways and processes potentially contributing to the cheese volatilome.

These pathway analyses confirmed the importance of proteolysis and amino acid metabolism for the formation of cheese VOCs (i.e. nitrogen and purine metabolism). Phenol in cheese originates from the metabolism of protein (casein) and, in particular, from the catabolism of tyrosine3. Besides sugar and fat metabolism, amino acid metabolism also provides substrates for ester formation, which might explain the enrichment of nitrogen metabolism for the spectrometric peaks associated with ethyl pentanoate (ethyl valerate)/ethyl-2-methylbutanoate/ethyl-3-methylbutanoate(ethyl isovalerate)/heptanoic acid m/z 132.109. The tight junction pathway was enriched for the spectrometric peaks associated with heptan-2-one and ethyl pentanoate (ethyl valerate)-ethyl-2-methylbutanoate-ethyl-3-methylbutanoate (ethyl isovalerate)-heptanoic acid. In the mammary gland, the tight junction (TJ) state is closely linked to milk secretion51, as they are involved in the transcellular transport of lactose and K+ to the extracellular fluid, while Na+ and Cl− are transported to the milk52. TJ integrity is compromised during mammary involution and also as a result of mastitis and periods of mammary inflammation53. Among the genes identified within this pathway, we found three protein kinase C (PKC) family members: alpha (PRKCA), beta (PRKCB) and epsilon (PRKCE). Several PKC inhibitors affect both the assembly and disassembly of TJs, which means that PKCs may regulate the dynamics of TJ formation54. Interestingly, this pathway was enriched for the energy of the curd as a percentage of the energy of the milk processed, which is an indicator of cheese-making efficiency55. Finally, enrichment of the long-term potentiation pathway for the spectrometric peaks associated with two ketones, octan-1-one and nonan-2-one, might be connected to their biosynthetic pathway, which is related to fatty acid metabolism; indeed, this pathway was significantly overrepresented in a recent GWAS and pathway-based analysis of milk fatty acids in dairy cows16. Moreover, this pathway contained several genes coding for glutamate ionotropic receptors (GRI), including GRIA1; it is of note that previous findings assigned to this gene a significant SNP for C14:115.

In our study, we exploited the potential of PTR-ToF-MS to provide detailed spectral information to characterise food quality and authentication, and this was integrated with the genomic and biological information provided by GWAS and pathway analyses. Results obtained increase our understanding of the metabolic pathways and biological functions likely involved in the formation of cheese VOCs, providing unprecedented insights into the potential contribution of the cow’s genes to cheese flavour. A more effective approach might be to more accurately identify compounds using PTR-MS and to improve the quality of cattle genome annotations.

Methods

Ethics statement

The cows in the current study belonged to commercial private herds and were not subjected to any invasive procedures. Milk and blood samples were previously collected during routine milk recording coordinated by technicians from the Breeders’ Association of Trento Province (Italy), hence certified by the local authority.

Phenotypes and genotypes

Individual milk samples were collected from 1,075 Italian Brown Swiss cows from 72 commercial herds located in the Alpine province of Trento (Italy). Details of the animals used in this study and the characteristics of the area are reported in Cipolat-Gotet et al.56 and Cecchinato et al.57 Gross milk composition was measured using a MilkoScan FT6000 (Foss Electric A/S Hillerød, Denmark). Model cheeses were manufactured from the raw milk of individual cows, as described in detail in Cipolat-Gotet et al.56. We used a commercial starter culture at a concentration 8 times higher than recommended in order to reduce the acidification time to 90 min and minimise the role of milk microflora. After ripening (60d), the model cheeses were weighed and analysed for fat and protein contents using a FoodScan apparatus (Foss Electric, Hillerød, Denmark). The headspace gas of each model cheese (n = 1,075) was measured with a commercial PTR-ToF-MS 8000 instrument supplied by Ionicon Analytik GmbH, Innsbruck (Austria), as described in detail in Bergamaschi et al.10 Internal calibration and peak extraction was performed according to the procedure described by Cappellin et al.58 Absolute headspace VOC concentrations, expressed as parts per billion by volume (ppbv), were estimated using the formula described by Lindinger et al.59 Given that the distribution of all spectrometric peaks showed a strong positive skewness, the data were transformed: the fraction of each peak plus one was multiplied by 106 and expressed as a natural logarithm to obtain a Gaussian-like data distribution. After filtering out all peaks below a threshold of 1 ppbv and interfering ions, 240 spectrometric peaks remained for the analyses. The fragmentation pattern of 61 relevant compounds, representing 78.0% of the total spectral intensity of the compressed data set without interfering ions, were retrieved from available GC-MS data on the same model cheeses10 and from the literature60,61,62. Isotope removal (r > 0.95, P < 0.001) yielded 173 spectrometric peaks, of which 45 were tentatively associated with VOCs.

The Illumina BovineSNP50 v.2 BeadChip (Illumina Inc., San Diego, CA) was used to genotype 1,152 cows (blood samples were not available for all the phenotyped animals). Quality control excluded markers with call rates >95%, with minor allele frequencies >0.5%, and without extreme deviation from Hardy-Weinberg equilibrium (P > 0.001, Bonferroni corrected). After filtering, 1,011 cows and 37,568 SNPs were retained for subsequent analyses.

Genome-wide association study

Genome-wide association analyses (GWAS) were conducted using single-marker regression and the three-step Genome-wide Association using the Mixed Model and Regression-Genomic Control (GRAMMAR-GC) approach63 implemented in the GenABEL R package64. In the first step, an additive polygenic model with a genomic relationship matrix is fitted; secondly, the residuals obtained from this model are regressed on the SNPs to test for associations; in the third step, genomic control corrects for the conservativeness of the procedure65. The polygenic model was:

$${\bf{y}}={\bf{X}}\beta +{\bf{a}}+{\bf{e}},\,$$
(1)

where y is a vector of the observed response (milk fat, protein and fat-to-protein ratio; cheese fat and protein; cheese VOCs); β is a vector with the fixed effects of (i) days in milk of the cow (classes of 30 days each), (ii) the parity of each cow (classes of 1, 2, 3, ≥4), and (iii) the herd-date effect (n = 72); X is an incidence matrix connecting each observation to specific levels of the factors in β. The two random terms in the model were the animal and the residuals, which were assumed to be normally distributed as \({\boldsymbol{a}} \sim N(0,{\bf{G}}{\sigma }_{g}^{2})\) and \({\boldsymbol{e}} \sim N(0,{\bf{I}}{\sigma }_{e}^{2})\), where G is the genomic relationship matrix, I is the identity matrix, \({\sigma }_{g}^{2}\) is the additive genomic variance and \({\sigma }_{e}^{2}\) the residual variance. The G matrix was built in GenABEL64 using identity-by-state coefficients. We adopted a threshold of P < 5 × 10−5 to declare significant SNPs66.

The proportion of genomic variance explained by the SNPs was calculated as 2pqa2, where p and q were the allele frequencies and a was the allele substitution effect. Model (1) was also used to estimate the variance components and the genomic heritability of the traits based on the genomic relationship matrix. Heritability was estimated as \({h}^{2}=\frac{{\sigma }_{g}^{2}}{{\sigma }_{g}^{2}+{\sigma }_{e}^{2}}\).

The results of the GWAS analysis without filtering for the P-value threshold were used to build a matrix of row-wise SNPs (n = 37,568) and column-wise phenotypes (i.e. cheese VOCs, n = 173) in which the value in the cell corresponded to the SNP additive effect. This matrix was fed into the ExpressionCorrelation plugin of Cytoscape67 to create a correlation matrix of pair-wise Pearson correlations between phenotypes based on the effect across all the SNPs included in the analysis. Only the high-confidence correlations with P < 0.01 and >|0.80| were selected. A similarity network was generated by ExpressionCorrelation, where the nodes corresponded to the phenotypes and the edges represented the similarity between vectors of the additive effects of all SNPs. This network was analysed with the ClusterOne plugin of Cytoscape68 to identify significantly dense clusters of VOCs (Mann-Whitney test, P < 0.05).

Gene-set enrichment and pathway analyses

Pathway analyses were carried out on the tentatively identified spectrometric peaks (n = 45) to shed light on the biological functions underlying the synthesis and/or metabolism of cheese VOCs. As detailed in Dadousis et al.55, the GWAS results were filtered for significance with a P-value < 0.05 to identify “relevant” and “non-relevant” SNPs. Using the BiomaRt R package69,70, we assigned “relevant” SNPs to genes if they were located within the gene or within 15 kb up- or down-stream of the gene71 based on the Ensembl Bos taurus UMD 3.1 assembly. This made it possible to also capture those SNPs that are missed by standard GWAS, due to its stringent significance threshold, but that may help explain the variability in the observed phenotypes, which may play a role in organised pathways or biological functions. The Kyoto Encyclopaedia of Genes and Genomes (KEGG)72 and the Gene Ontology (GO) databases73 were used to define the functional categories associated with the gene sets. To avoid testing broad or narrow functional categories, only GO and KEGG terms with >10 and <1000 genes were considered. A Fisher’s exact test was used to test for overrepresentation of functional categories (FDR < 0.05). The gene-set enrichment analysis was performed with the R package goseq74.