Integrated PTR-ToF-MS, GWAS and biological pathway analyses reveal the contribution of cow’s genome to cheese volatilome

Volatile organic compounds (VOCs) are small molecules that contribute to the distinctive flavour of cheese which is an important attribute for consumer acceptability. To investigate whether cow’s genetic background might contribute to cheese volatilome, we carried out genome-wide association studies (GWAS) and pathway–based analyses for 173 spectrometric peaks tentatively associated with several VOCs obtained from proton-transfer-reaction mass spectrometry (PTR-ToF-MS) analyses of 1,075 model cheeses produced using raw whole-milk from Brown Swiss cows. Overall, we detected 186 SNPs associated with 120 traits, several of which mapped close to genes involved in protein (e.g. CSN3, GNRHR and FAM169A), fat (e.g. AGPAT3, SCD5, and GPAM) and carbohydrate (e.g. B3GNT2, B4GALT1, and PHKB) metabolism. Gene set enrichment analysis showed that pathways connected with proteolysis/amino acid metabolism (purine and nitrogen metabolism) as well as fat metabolism (long-term potentiation) and mammary gland function (tight junction) were overrepresented. Our results provide the first evidence of a putative link between cow’s genes and cheese flavour and offer new insights into the role of potential candidate loci and the biological functions contributing to the cheese volatilome.

Pathway analyses. Of the total 37,568 SNPs used in this study, 17,006 were located 15 kb up-or down-stream of the coding regions. An average of around 900 genes were significant (P < 0.05) for the peaks tentatively associated with cheese VOCs. We carried out pathway analyses to shed light on the biological role of these genes and to identify potentially overrepresented pathways or molecular functions that might help explain the variability in the cheese volatilome.

Discussion
GWAS analysis. In recent years, there has been growing concern about food quality and safety from both the demand and the supply sides. Given that flavour attributes play a crucial role in cheese quality 21 , better knowledge of the key flavour components and pathways involved in the development and characterisation of cheese VOCs would provide a useful basis for defining cheese-making procedures more precisely, and improving cheese sensory characteristics. There is also increasing interest in the authentication of traditional cheeses with EU protected designation of origin classification, which are often linked to local breeds and help maintain farm animal biodiversity 22 . In this study, therefore, we first sought to investigate whether the cow's genome organisation significantly impacts on the cheese volatilome, and possibly cheese flavour.
Although plenty of GWAS studies for milk production traits 13,23,24 and cheese-making properties 25,26 in dairy cows have now been published, to our knowledge none has focused on identifying the genomic regions associated with cheese composition and quality traits. Despite the lack of GWAS analyses for cheese VOCs, the estimates of genomic heritability found in this study confirm previous findings supporting the existence of an exploitable genetic variation in cheese VOCs 11 . The main pathways involved in the formation of cheese VOCs are glycolysis (metabolism of lactose, lactate and citrate), lipolysis (and metabolism of fatty acids) and proteolysis (and catabolism of amino acids) 3 . Accordingly, our GWAS analyses revealed a contribution of cow's genes related to protein, fat and carbohydrate metabolism.
Protein metabolism. A region of 9 SNPs on BTA6 covered the cluster of casein genes (~87.14-87. 38 Mbp) and showed significant association with 12 traits, including milk protein. In particular, 4 spectrometric peaksm/z 85.029, m/z 149.045, m/z 163.096 and m/z 169.044 -were associated with the marker rs41567942, which was located 0.4 Mb from the gene encoding for k-casein (CSN3), which is essential for milk coagulation and therefore largely influences milk coagulation properties 27 . Moreover, the marker rs29001782 was located on BTA6 at 4 kb from GNRHR, which signalling pathway has been shown to play a role in controlling milk protein synthesis and metabolism 17 . Interestingly, the markers rs110300263 and rs111018457, which had significant associations with 10 traits, were located in the region at ~77.29-77.47 Mbp on BTA16, which was close to a quantitative trait locus (QTL) for the milk protein and k-casein percentages 28 . Two markers, which were associated to m/z 56.045 and m/z 119.107 (rs43096354) and m/z 40.027 (rs42353243), mapped on BTA20 at ~0.3 Mb from FAM169A which has been suggested to be a key regulator of milk protein synthesis in dairy cattle 17 . Additionally, the region of 7 SNPs on BTA21 included a known QTL for milk fat and protein yield and percentage from the Cattle QTL database information 28 .
Fat metabolism. The contribution of fatty acid metabolism to cheese VOCs is corroborated by several significant associations. For instance, rs43283349, which was significant for 3-methylbutyl butanoate (isoamyl butyrate)/nonanoic acid m/z 159.138, was located on BTA1 at ~0.1 Mb from AGPAT3, a positional candidate gene for milk FA 29 . The marker rs110986676, which was located on BTA6 and was significant for m/z 116.078, corresponded to an intron variant of SCD5 which was associated to variation in milk FA composition in dairy cattle 16,30 . The marker rs110681423, which was associated with m/z 117.047, was located on BTA19 at ~0.2 Mb from GH1 which has been put forward as candidate gene for milk fat percentage and fat composition 12,31 . The marker rs110858406, associated to m/z 63.044, mapped on BTA26 at ~0.9 Mb from GPAM which is involved in the regulation of milk fat synthesis and composition in dairy cattle 32,33 . Finally, rs110820252, which had significant associations with the spectrometric peaks associated with the alkyl fragment m/z 42.01 and propanoic acid/ propanoic ester m/z 75.044 mapped on BTA28 within 2 kb 5′ to AGT, which is the sole precursor of all angiotensin peptides. Interestingly, the renin-angiotensin system is believed to impact body-fat storage as well as lipid and carbohydrate metabolism 34,35 . Carbohydrate metabolism. A significant association was found between rs110002748 and m/z 117.047, which mapped on BTA8 at ~1 Mb from B4GALT1. This gene encodes an enzyme that participates in glyconjugation and lactose biosynthesis, which occurs exclusively in the mammary gland 36 . An increase in the expression of B4GALT1 was observed in transition milk samples, and is reflected in an increase in lactose biosynthesis during the earlier stages of lactation 37 . The high signal detected on BTA11 (rs41671173) was located on BTA11 at ~0.5 Mb from B3GNT2, which synthesizes a unique structure known as poly-N-acetyllactosamine (polyLacNAc), a linear carbohydrate polymer composed of alternating N-acetylglucosamine and galactose residues 38 . This SNP  explained ~60% of additive genetic variance for m/z 78.001. Finally, the high signals on BTA18 corresponded to the marker rs41867785, which is annotated as an intron variant of PHKB. This gene has been associated with the carbohydrate metabolic process, the generation of precursor metabolites and energy, and energy reserve 39 .  Correlations among VOCs based on SNP additive effects. A greater level of detail concerning the shared genomic basis of cheese VOCs might form the basis for more accurate prediction models to be developed in the context of genomic selection for possible modulation of cheese flavour. In a previous work, we estimated the genetic relationships among cheese VOCs based on pedigree information 11 . Here, we used ExpressionCorrelation to calculate pairwise correlations between VOCs based on the SNP additive genetic effects, and we clearly identified groups of VOCs sharing a common behaviour. Having tentatively identified some compounds, we sought to associate the largest sub-networks to biochemical pathways and possibly associated flavour notes. Sub-network 1 contained mostly ketones and aldehydes and might, therefore, represent catabolism of amino acids and fatty acids. Branched-chain aldehydes originate from AA degradation, in particular 2-methylbutanal from isoleucine and 3-methylbutanal from leucine 40 , while ketones can be produced from β-ketoacids derived from β-oxidation of fatty acids 41 . Green/fruity/floral notes are mostly associated with the compounds included in this group 20,40,42 .
The reaction between free fatty acids and alcohols from lactose and AA degradation yield esters 43 , common cheese VOCs, and this pathway might be represented in sub-network 3, including the spectrometric peaks associated with hexanoic acid, ethyl hexanoate/octanoic acid and ethyl butanoate/ethyl-2-methylpropanoate (ethyl isobutyrate). Most esters (e.g. ethyl butanoate, ethyl hexanoate, ethyl-2-methylpropanoate) are associated with the sweet, fruity and floral characteristics of cheese [44][45][46] . Finally, sub-network 4 might represent the glycolysis pathway, and, in particular, lactate or citrate metabolism, since it included the spectrometric peaks associated with the acetate ester fragment/acetic acid, 3-hydroxy-2-butanone(acetoin) and butanoic acid. Lactose is metabolised by starter bacteria, mostly through the glycolytic pathway, into lactate, which might be further metabolised into acetate by lactococci or into butyrate by Clostridium sp. 47 . Acetate is also the main flavour compound originating from citrate metabolism as well as acetoin 47,48 . Cheesy, rancid and sour milk notes are associated with 3-hydroxy-2-butanone(acetoin) and butanoic acid 45,49 , while acetic acid has a typical vinegar odour 50 .
Pathway analysis. Standard GWAS analysis allows individual loci and genes likely to play a role in controlling the investigated traits. However, it lacks the power to establish whether the detected genes act in cooperation as part of a complex network to control specific biological functions. We therefore carried out pathway analyses to prioritize genes in associated loci that are part of the biological pathways and processes potentially contributing to the cheese volatilome. These pathway analyses confirmed the importance of proteolysis and amino acid metabolism for the formation of cheese VOCs (i.e. nitrogen and purine metabolism). Phenol in cheese originates from the metabolism of protein (casein) and, in particular, from the catabolism of tyrosine 3 . Besides sugar and fat metabolism, amino acid metabolism also provides substrates for ester formation, which might explain the enrichment of nitrogen metabolism for the spectrometric peaks associated with ethyl pentanoate (ethyl valerate)/ethyl-2-methylbutanoate/ ethyl-3-methylbutanoate(ethyl isovalerate)/heptanoic acid m/z 132.109. The tight junction pathway was enriched for the spectrometric peaks associated with heptan-2-one and ethyl pentanoate (ethyl valerate)-ethyl-2-methylbutanoate-ethyl-3-methylbutanoate (ethyl isovalerate)-heptanoic acid. In the mammary gland, the tight junction (TJ) state is closely linked to milk secretion 51 , as they are involved in the transcellular transport of lactose and K+ to the extracellular fluid, while Na+ and Cl− are transported to the milk 52 . TJ integrity is compromised during mammary involution and also as a result of mastitis and periods of mammary inflammation 53 . Among the genes identified within this pathway, we found three protein kinase C (PKC) family members: alpha (PRKCA), beta (PRKCB) and epsilon (PRKCE). Several PKC inhibitors affect both the assembly and disassembly of TJs, which means that PKCs may regulate the dynamics of TJ formation 54 . Interestingly, this pathway was enriched for the energy of the curd as a percentage of the energy of the milk processed, which is an indicator of cheese-making efficiency 55 . Finally, enrichment of the long-term potentiation pathway for the spectrometric peaks associated with two ketones, octan-1-one and nonan-2-one, might be connected to their biosynthetic pathway, which is related to fatty acid metabolism; indeed, this pathway was significantly overrepresented in a recent GWAS and pathway-based analysis of milk fatty acids in dairy cows 16 . Moreover, this pathway contained several genes coding for glutamate ionotropic receptors (GRI), including GRIA1; it is of note that previous findings assigned to this gene a significant SNP for C14:1 15 .
In our study, we exploited the potential of PTR-ToF-MS to provide detailed spectral information to characterise food quality and authentication, and this was integrated with the genomic and biological information provided by GWAS and pathway analyses. Results obtained increase our understanding of the metabolic pathways and biological functions likely involved in the formation of cheese VOCs, providing unprecedented insights into the potential contribution of the cow's genes to cheese flavour. A more effective approach might be to more accurately identify compounds using PTR-MS and to improve the quality of cattle genome annotations.

Methods
Ethics statement. The cows in the current study belonged to commercial private herds and were not subjected to any invasive procedures. Milk and blood samples were previously collected during routine milk recording coordinated by technicians from the Breeders' Association of Trento Province (Italy), hence certified by the local authority.
Phenotypes and genotypes. Individual milk samples were collected from 1,075 Italian Brown Swiss cows from 72 commercial herds located in the Alpine province of Trento (Italy). Details of the animals used in this study and the characteristics of the area are reported in Cipolat-Gotet et al. 56 and Cecchinato et al. 57 Gross milk composition was measured using a MilkoScan FT6000 (Foss Electric A/S Hillerød, Denmark). Model cheeses were manufactured from the raw milk of individual cows, as described in detail in Cipolat-Gotet et al. 56 . We used a commercial starter culture at a concentration 8 times higher than recommended in order to reduce the acidification time to 90 min and minimise the role of milk microflora. After ripening (60d), the model cheeses were weighed and analysed for fat and protein contents using a FoodScan apparatus (Foss Electric, Hillerød, Denmark). The headspace gas of each model cheese (n = 1,075) was measured with a commercial PTR-ToF-MS 8000 instrument supplied by Ionicon Analytik GmbH, Innsbruck (Austria), as described in detail in Bergamaschi et al. 10 Internal calibration and peak extraction was performed according to the procedure described by Cappellin et al. 58 Absolute headspace VOC concentrations, expressed as parts per billion by volume (ppb v ), were estimated using the formula described by Lindinger et al. 59 Given that the distribution of all spectrometric peaks showed a strong positive skewness, the data were transformed: the fraction of each peak plus one was multiplied by 10 6 and expressed as a natural logarithm to obtain a Gaussian-like data distribution. After filtering out all peaks below a threshold of 1 ppb v and interfering ions, 240 spectrometric peaks remained for the analyses. The fragmentation pattern of 61 relevant compounds, representing 78.0% of the total spectral intensity of the compressed data set without interfering ions, were retrieved from available GC-MS data on the same model cheeses 10 and from the literature [60][61][62] . Isotope removal (r > 0.95, P < 0.001) yielded 173 spectrometric peaks, of which 45 were tentatively associated with VOCs. The Illumina BovineSNP50 v.2 BeadChip (Illumina Inc., San Diego, CA) was used to genotype 1,152 cows (blood samples were not available for all the phenotyped animals). Quality control excluded markers with call rates >95%, with minor allele frequencies >0.5%, and without extreme deviation from Hardy-Weinberg equilibrium (P > 0.001, Bonferroni corrected). After filtering, 1,011 cows and 37,568 SNPs were retained for subsequent analyses.
Genome-wide association study. Genome-wide association analyses (GWAS) were conducted using single-marker regression and the three-step Genome-wide Association using the Mixed Model and Regression-Genomic Control (GRAMMAR-GC) approach 63 implemented in the GenABEL R package 64 . In the first step, an additive polygenic model with a genomic relationship matrix is fitted; secondly, the residuals obtained from this model are regressed on the SNPs to test for associations; in the third step, genomic control corrects for the conservativeness of the procedure 65 . The polygenic model was: where y is a vector of the observed response (milk fat, protein and fat-to-protein ratio; cheese fat and protein; cheese VOCs); β is a vector with the fixed effects of (i) days in milk of the cow (classes of 30 days each), (ii) the parity of each cow (classes of 1, 2, 3, ≥4), and (iii) the herd-date effect (n = 72); X is an incidence matrix connecting each observation to specific levels of the factors in β. The two random terms in the model were the animal and the residuals, which were assumed to be normally distributed as σ a N G (0, ) g 2 and σ e N I (0, ) e 2 , where G is the genomic relationship matrix, I is the identity matrix, σ g 2 is the additive genomic variance and σ e 2 the residual variance. The G matrix was built in GenABEL 64 using identity-by-state coefficients. We adopted a threshold of P < 5 × 10 −5 to declare significant SNPs 66 .
The proportion of genomic variance explained by the SNPs was calculated as 2pqa 2 , where p and q were the allele frequencies and a was the allele substitution effect. Model (1) was also used to estimate the variance components and the genomic heritability of the traits based on the genomic relationship matrix. Heritability was estimated as = 2 . The results of the GWAS analysis without filtering for the P-value threshold were used to build a matrix of row-wise SNPs (n = 37,568) and column-wise phenotypes (i.e. cheese VOCs, n = 173) in which the value in the cell corresponded to the SNP additive effect. This matrix was fed into the ExpressionCorrelation plugin of Cytoscape 67 to create a correlation matrix of pair-wise Pearson correlations between phenotypes based on the effect across all the SNPs included in the analysis. Only the high-confidence correlations with P < 0.01 and >|0.80| were selected. A similarity network was generated by ExpressionCorrelation, where the nodes corresponded to the phenotypes and the edges represented the similarity between vectors of the additive effects of all SNPs. This network was analysed with the ClusterOne plugin of Cytoscape 68 to identify significantly dense clusters of VOCs (Mann-Whitney test, P < 0.05).

Gene-set enrichment and pathway analyses.
Pathway analyses were carried out on the tentatively identified spectrometric peaks (n = 45) to shed light on the biological functions underlying the synthesis and/or metabolism of cheese VOCs. As detailed in Dadousis et al. 55 , the GWAS results were filtered for significance with a P-value < 0.05 to identify "relevant" and "non-relevant" SNPs. Using the BiomaRt R package 69,70 , we assigned "relevant" SNPs to genes if they were located within the gene or within 15 kb up-or down-stream of the gene 71 based on the Ensembl Bos taurus UMD 3.1 assembly. This made it possible to also capture those SNPs that are missed by standard GWAS, due to its stringent significance threshold, but that may help explain the variability in the observed phenotypes, which may play a role in organised pathways or biological functions. The Kyoto Encyclopaedia of Genes and Genomes (KEGG) 72 and the Gene Ontology (GO) databases 73 were used to define the functional categories associated with the gene sets. To avoid testing broad or narrow functional categories, only GO and KEGG terms with >10 and <1000 genes were considered. A Fisher's exact test was used to test for overrepresentation of functional categories (FDR < 0.05). The gene-set enrichment analysis was performed with the R package goseq 74 .