Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Evaluation of a novel food composition database that includes glutamine and other amino acids derived from gene sequencing data



To determine the content of glutamine in major food proteins.


We used a validated 131-food item food frequency questionnaire (FFQ) to identify the foods that contributed the most to protein intake among 70 356 women in the Nurses’ Health Study (NHS, 1984). The content of glutamine and other amino acids in foods was calculated based on protein fractions generated from gene sequencing methods (Swiss Institute of Bioinformatics) and compared with data from conventional (USDA) and modified biochemical (Khun) methods. Pearson correlation coefficients were used to compare the participants’ dietary intakes of amino acids by sequencing and USDA methods.


The glutamine content varied from 0.01 to to 9.49 g/100 g of food and contributed from 1 to to 33% of total protein for all FFQ foods with protein. When comparing the sequencing and Kuhn's methods, the proportion of glutamine in meat was 4.8 vs 4.4%. Among NHS participants, mean glutamine intake was 6.84 (s.d.=2.19) g/day and correlation coefficients for amino acid between intakes assessed by sequencing and USDA methods ranged from 0.94 to 0.99 for absolute intake, −0.08 to 0.90 after adjusting for 100 g of protein, and 0.88 to 0.99 after adjusting for 1000 kcal. The between-person coefficient of variation of energy-adjusted intake of glutamine was 16%.


These data suggest that (1) glutamine content can be estimated from gene sequencing methods and (2) there is a reasonably wide variation in energy-adjusted glutamine intake, allowing for exploration of glutamine consumption and disease.


Increasing evidence suggests that dietary glutamine decreases insulin levels (Opara et al., 1996; Borel et al., 1998) and weight gain in animal studies (Opara et al., 1996). Therefore, dietary glutamine intake may affect diabetes risk.

The data on amino acid in foods are obtained primarily by ion-exchange chromatography following acid hydrolysis of proteins (Schegg et al., 1997) and more recently by measurement of protein-bound glutamine (Kuhn et al., 1996, 1999). Drawbacks of biochemical measurements include the conversion of glutamine to glutamate, especially in the acid hydrolysis method. Therefore, the content of glutamine in food proteins is not available in nutrient databases and that of glutamate is overestimated.

Given the historical lack of accurate measurement and absence from current nutrient databases, the purposes of this study were (1) to estimate the content of glutamine and glutamate in food using data from sequencing methods, (2) to compare the calculated proportion of glutamine in meat and casein with historical data derived from Kuhn's method (Kuhn et al., 1996, 1999), and (3) to include all 20 proteinogenic amino acids derived from the sequencing methods in the Nurses’ Health Study nutrient database for further comparison with data compiled by the USDA.

Materials and methods

Study sample

The Nurses' Health Study is a prospective cohort study of diet and lifestyle factors in relation to chronic diseases among 121 700 female registered nurses aged 30–55 years at enrollment in 1976. We excluded women who did not satisfy the a priori criterion of reported daily energy intake between 2514 (600 kcal) and 14 665 kJ (3500 kcal), BMI between 15 and 49 kg/m2 (=5 s.d.), available data on protein intake, and no previous diagnosis of diabetes, cardiovascular disease, and cancers. The final baseline population consisted of 70 356 women who were 38–63 years old in 1984. This study was approved by the Institutional Review Board at the Channing Laboratory.

Assessment of diet

In 1976, women completed questionnaires on their medical history and lifestyle. A 131-item food frequency questionnaire (FFQ) was first completed in 1984 and updated in 1986, 1990, 1994, 1998, and 2002. Nutrient intakes were computed from the reported frequency of consumption of each specified unit of food or beverage and from published data on the nutrient content of the specified portions. We used data on amino acids from The Food Composition Handbook 8 series (1976–1992) and serial releases 10, 14, and 16 published by the USDA (2006). This FFQ has been previously evaluated for validity of a variety of nutrients including protein intake (Willett et al., 1985).

Amino acid data derived from USDA publications

Amino acid values in food items with more than one protein-containing ingredient were derived from the amino acid composition of the various protein-containing ingredients. These USDA amino acid values were included in the protein-containing food items and ingredients of the NHS nutrient database. The amino acids available in the USDA nutrient database include alanine, arginine, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, and valine. Cystine, an oxidized dimeric form of cysteine, is available rather then cysteine. There is no glutamine or asparagine while glutamate and aspartate are overestimated.

Data derived from the EXPASY Web site

Data on the sequence of thousands of proteins from a variety of organisms are compiled in the Swiss Institute of Bioinformatics (Bairoch and Apweiler, 2000). These data are available on the internet in Switzerland and mirror sites such as Canada SWISS-PROT is an annotated protein sequence database created at the Department of Medical Biochemistry of the University of Geneva with the European Molecular Biology Laboratory collaboration. SWISS-PROT is the most complete database on gene expression of amino acids in proteins on the Web that applies rigorous scientific methods including internal and external expertise.

Amino acids values derived from the EXPASY Web site

We calculated the proteinogenic amino acid content of food proteins consumed in the Nurses’ Health Study nutrient database based on (1) the identification of protein fractions of food proteins reported in the literature, (2) the identification of the amino acid composition of protein fractions using the EXPASY protein sequence database server, (3) the weighted sum of amino acids content of protein fractions in food proteins, and (4) the incorporation of amino acid content of food proteins in recipes used to create the NHS study nutrient database. The amino acid content of food protein fractions was calculated using the entire sequence of the protein fraction examined. We decided to exclude the terminal residues from the amino acid calculations as terminal amino acids have a short lifetime (Bachmair et al., 1986), even though terminal amino acids have little impact on the final composition of proteins given their small contribution of the total sequence.

The formula to calculate the content of amino acids in food protein derived from sequencing data was described in earlier studies (Lacey and Wilmore, 1990; Swails et al., 1992). As an example, the glutamine content of β-casein in milk was calculated as: [(% β-casein × # GLN in sequence × molecular weight of GLN (g/mole)]/molecular weight of β-casein (g/mole)=(23.4% × 20 × 146.15)/(23583.2)=2.9%. The same procedure was repeated for each of the 20 amino acids in each of the known protein fractions of a food.

The sum of all protein fractions derived from the Expasy Web site contributed 90–98% of the total protein weight of all foods. The 28 foods for which the amino acid content was derived from the Expasy Web site included barley, beef, brown rice, casein, cocoa, coffee, corn, egg yolk, egg white, egg, kidney bean, lentil, milk, oat, oat bran, pea, peanut, potato, sesame, soybean, sweet potatoes, tomato, walnut, wheat, wheat bran, wheat gluten, whey, and white rice. The site was last accessed on 26 February 2004.


Given the low amount of protein and undetectable amounts of measured glutamate in food items such as fruits and vegetables, as well as the lack of sequences in the EXPASY Web site for these plant proteins, the amino acid content of these foods were based on one plant protein (soy). Several assumptions were necessary to calculate the amino acid composition of food proteins. For example, the sequence of chicken serum albumin was available in the EXPASY Web server. The sequence of chicken serum albumin, which is identical to α-livetin, was used to represent the category of livetins. Livetins correspond to about 7% of yolk solids. We also needed to make other assumptions concerning the proportions of food protein in protein fractions. For example, 22 proteins had been sequenced as wheat zeins in SWISS_PROT, we assumed that each of them contributed equally to the total amount of zeins. We also lacked information on the sequence of proteins such as enzymes. However, the contribution of these proteins to the total weight of amino acids in food proteins is probably negligible for nutritional purposes as they are found in very small proportions.

Milk protein is presented as an example to show how we used protein fractions to calculate total protein. The protein fractions in milk include caseins such as α-casein, β-casein, κ-casein, and γ-casein as well as β-lactoglobulin, α-lactalbumin, immunoglobulins (IGGs, IGM, IGA, FSC), albumin, and proteose-peptone. Together, these protein fractions contribute to nearly 98% of the total weight of milk protein. As the weight of the remaining thousands of proteins that contribute to milk protein are proportionally small (2%), the contribution of these proteins to amino acids content of milk protein is probably negligible for dietary purposes. Likewise, free amino acids found in foods were not included in the final calculation of amino acids in food proteins because of their short half-life and small contribution to total nitrogen in food (Schloerb et al., 2002). For example, free amino acids contribute to <2% of the total protein in yolk solids (Osuga and Feeney, 1977) and breast milk (Wu et al., 2000). Finally, we used similar food proteins for missing food items. For example, beef amino acids were used for all muscle proteins.

Statistical methods

Values for each amino acid percentage (g/100 g protein) were calculated based on gene expression of protein fractions from the EXPASY Web site. The proportion of glutamine in meat and casein protein was further compared with historical data measured using Kuhn's modified biochemical technique (Kuhn et al., 1996, 1999; Baxter et al., 2004). Group means and standard deviations of amino acids intake were calculated from FFQs (g/day). Participants’ amino acid intakes (g/day) were calculated for 100 g protein and for 1000 kcal. We used the Pearson correlation coefficients to compare the participants’ dietary intakes of amino acids using the sequencing and USDA methods (1994), as they were similar to Spearman correlations (data not shown). Finally, the between-person coefficient of variation (CV) of energy-adjusted and nonadjusted amino acid intake was calculated based on the adjusted or nonadjusted amino acid standard deviation divided by its mean among the NHS participants.

The amino acid content of food proteins derived from the EXPASY Web site was calculated using STATA 7.0 ( The comparisons between amino acid consumption of women assessed by the gene sequencing method and by the conventional biochemical methods were performed using SAS software (version 8; SAS Institute Inc, Cary, NC, USA). All P-values were two-sided.


Glutamine content varied from 0.01 to 9.49 g/100 g of food (from apple juice to wheat germ) and contributed from 1 to 33% (g/100 g) of protein (from kidney bean to whole grain bread). Using data from the sequencing compared with Kuhn's modified biochemical method (Table 1), the amount of glutamine was similar for meat protein (4.8 vs 4.4%) and casein-based formula protein (8.7 vs 9.2%).

Table 1 Glutamine in meat protein and casein-based formula protein (g/100 g protein)

Table 2 shows data on the amino acid composition of selected foods that may have a function in chronic diseases. Using the gene sequencing method, the percentage of glutamine was 4.4% for egg, 4.8% for beef, 8.1% for milk, 9.1% for Tofu (soy), 11.1% for white rice, and 16.2% for corn protein. Thus, the total amount of glutamine per 100 g among those six foods increased from 0.28 g in milk protein to 1.23 g in beef protein with intermediate values for white rice protein, corn protein, egg protein, and soy protein (Table 2).

Table 2 Total protein and composition in amino acid (g/100 g food) of selected foods derived from the conventional method (USDA) and from the gene sequencing method (Study)

As shown in other studies, the largest contribution to food protein intake in the Nurses’ Health Study came from animal sources (>70%). Among the top 30 food proteins contributing to protein intake, beef contributed to 15% of total food protein intake among these women and clam chowder contributed to 0.7%. The amount of dietary protein in the top 30 food items varied from 0.7 g/100 g in orange juice to 28.9 g/100 g in chicken without skin. Using the gene sequencing method, the 20 amino acids in the top 30 food proteins contributed by definition to 100% (g/100 g protein) of each of the food proteins. However, the percentage of protein for the 18 amino acids available in the USDA database excluding asparagine and glutamine was already at or above 100% for 8 of the 30 food item proteins listed in Table 3 (American cheese, cottage cheese, whole grain, mashed potato, yogurt, shrimps, English muffin, and peanut butter). Although large differences were observed in the contribution of amino acid data to specific protein foods using conventional methods, the correlation coefficients among specific foods for amino acid composition in 100 g of food were high.

Table 3 Characteristics of the top 30 commonly consumed food proteins among 70 356 study participants including the correlation coefficients for the composition of amino acids (AA in g/100 g food) of each food between methods

Among women in the NHS, the consumption of amino acids in (g/day) was comparable between methods (Table 4). The intake of cystine using the USDA food composition data were strongly correlated with those of cysteine calculated from gene sequencing data (0.98). After excluding asparagine, aspartate, glutamine, and glutamate, the correlation coefficients for the 16 protein-adjusted (g/100 g protein) proteinogenic amino acids assessed by the two methods, ranged from −0.08 for tryptophan and serine to 0.90 for arginine. After adjusting for energy (g/1000 kcal), the correlation coefficients between the two methods were higher, varying from 0.88 for tryptophan to 0.99 for arginine (Table 4). Using the sequencing method, the between-person CV for absolute intake of the 20 amino acids ranged from 31 to 34% for crude intake, 4–14% after adjusting for protein, and 16–25% after adjusting for 1000 kcal. Glutamine had a CV of 32% for crude intake, 14% for intake of 100 g of protein, and 16% for intake of 1000 kcal.

Table 4 Mean and Pearson correlation coefficients for the AA intake of 70 356 participants comparing gene sequencing and conventional methods


The main finding of this study is that the sequencing and modified biochemical methods provide similar proportions of glutamine in proteins from meat and casein-based formulas, within the measurement error of Kuhn's method. Although modified biochemical methods may be accurate, they are expensive, require specialized laboratories, and do not provide information on most amino acids. Thus, identifying a method to estimate the amino acid content of food protein that may be cheaper than and as accurate as a complex biochemical method has great advantage.

The gene sequencing technique was extended to the composition of the other proteinogenic amino acids in food proteins. After exclusion of glutamine, glutamate, asparagine, and aspartate, which are absent from the USDA food composition database or overestimated, the participants’ daily intake of amino acids assessed by the two methods was highly correlated before and after adjusting for energy intake. However, after adjusting for 100 g of total protein intake, a large change in the correlation coefficients was observed for some of the amino acids, which may be explained in part by several factors.

First, the amino acid data from the conventional method contributed from 43 to 111% of proteins (g/100 g) for 18 amino acids whereas that from the sequencing method contributed by definition to 100% (Table 3) for all 20 amino acids. Second, adjusting for total protein provides an estimate of the composition of food proteins and reduces the between-person variability of the data, which will also tend to reduce correlation coefficients. Third, measurements of amino acids may vary by type of food or laboratory.

Although, the FAO/WHO consultation in 1989 reported an inter-laboratory error of 10% for amino acid analysis (Joint FAO/WHO Expert Consultation, 1989), results from a 1996 multi-laboratory study comparing amino acid measurements in one protein showed a wide range of error from 4.0 to 58.9% for proteinogenic amino acids excluding measurements of asparagine, glutamine, cystine, and tryptophan (Schegg et al., 1997). The assessment of amino acids in a few foods including casein, soy, pea flour, whole-wheat flour, egg white solids, minced beef, and rapeseed concentrate showed that the between-laboratory CV were better for isoleucine, leucine, lysine, phenylalanine, threonine, and valine (CV <10%) than cystine, methionine, and tryptophan (CV=10–20%) (Friedman, 1996). In that study, the between-laboratory CV were also better for casein, soy, and minced beef (CV <7%) compared with the other foods analyzed (CV 10–24%). Overall, these data show that conventional biochemical measurements vary substantially not only between laboratories, but also with type of food and amino acid.

Finally, although the data compiled by the USDA includes cooked and processed food when available, that derived from gene sequencing does not. During processing of foods, protein sources may be treated with heat, oxidizing agents, organic solvents, alkalis, and acids. These treatments may lead to the formation of multiple compounds that result in lower amino acid availability and protein quality. Although decreased amino acid availability is of concern, studies involving the ingestion of 15N-labeled dietary proteins show that the true ileal digestibility of a number of protein sources, including milk, cereals, and legumes, is >90% and that it varies only minimally among the common sources of dietary protein (Reeds and Garlick, 2003). On the basis of that review and the inadequacy of available scoring patterns (Sarwar, 1997), we did not use a factor to account for amino acid availability or protein quality.

Results from the conventional method confirm official recommendations that other biochemical methods than the traditional acid hydrolysis of peptidic links and separation by chromatography be used. Other biochemical methods would allow for a more precise identification of glutamine, asparagine, and tryptophan, which are typically destroyed, but also to some extent the sulfur amino acids such as methionine, cysteine, or cystine (Schegg et al., 1997). The strength of the gene sequencing method is that the same methodology was applied for all 20 amino acids derived from sequencing data available in the EXPASY Web site.


The use of the gene sequencing method includes several assumptions to calculate the amount of amino acids in food proteins. Although a series of assumptions are made for both the biochemical and gene sequencing methods, those used for the gene sequencing method are made without discrimination by amino acids as all values are calculated based on the same sequence vs separate biochemical methods. Future updates using more recent values of amino acids compiled by the USDA and additional data on gene sequences to minimize assumptions might improve the correlation between the two methods. We did not account for cooked and processed foods, digestibility and quality of protein, as valid data are not available. Although new scoring systems to account for food processing may improve the correlation between the methods, the contribution to actual protein absorption of specific food proteins is expected to be small.


These data suggest that the glutamine content of food protein can be estimated from gene sequencing methods. Furthermore, there is a reasonably wide variation in glutamine intake, allowing for examination of glutamine consumption and disease risk after adjustment for energy intake.

Conflict of interest

The authors declare no conflict of interest.


  1. Bachmair A, Finley D, Varshavsky A (1986). In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179–186.

    CAS  Article  Google Scholar 

  2. Bairoch A, Apweiler R (2000). SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acid Res 28, 45–48.

    CAS  Article  Google Scholar 

  3. Baxter JH, Phillips RR, Dowlati L, Johns PW (2004). Glutamine in commercial liquid nutritional products. J Agric Food Chem 52, 4963–4968.

    CAS  Article  Google Scholar 

  4. Borel MJ, Williams PE, Jabbour K, Levenhagen D, Kaiser E, Flakoll PJ (1998). Parenteral glutamine infusion alters insulin-mediated glucose metabolism. JPEN 22, 280–285.

    CAS  Article  Google Scholar 

  5. Composition of Foods Loose-leaf Agriculture Handbook No 8 Series (1976–1992). Agricultural Research Service. United States Department of Agriculture (USDA). US Government Printing Office. Washington DC.

  6. Friedman M (1996). Nutrition. In: Nakai S, Modler W (eds). Food Proteins: Properties and Characterization, Chapter 6, Wiley-VCH: New York. pp 281–298.

    Google Scholar 

  7. Kuhn KS, Schumann K, Stehle P, Darmaun D, Furst P (1999). Determination of glutamine in muscle protein facilitates accurate assessment of proteolysis and de novo synthesis-derived endogenous glutamine production. Am J Clin Nutr 70, 484–489.

    CAS  Article  Google Scholar 

  8. Kuhn KS, Stehle P, Furst P (1996). Quantitative analysis of glutamine in peptides and proteins. J Agric Food Chem 44, 1808–1811.

    CAS  Article  Google Scholar 

  9. Lacey JM, Wilmore DW (1990). Is glutamine a conditionally essential amino acid? Nutr Rev 48, 297–309.

    CAS  Article  Google Scholar 

  10. Opara EC, Tevrizian A, Feinglos MN, Surwit RS (1996). L-glutamine supplementation of a high fat diet reduces body weight and attenuates hyperglycemia and hyperinsulinemia in C57BL/6J mice. J Nutr 126, 273–279.

    CAS  Article  Google Scholar 

  11. Osuga DT, Feeney RE (1977). Egg proteins. In: Whitaker JR, Tannenbaum SR (eds). Food Proteins. AVI Pub. Co., Inc.: Westport, CT. pp 209–221.

    Google Scholar 

  12. Joint FAO/WHO Expert Consultation (1989). Protein quality evaluation: report. Bethesda, MD, USA, 4–8 December. Accessed on 1 December 2008 at:

  13. Reeds PJ, Garlick PJ (2003). Protein and amino acid requirements and the composition of complementary foods. J Nutr 133 (Suppl 9), 2953–2961.

    Article  Google Scholar 

  14. Sarwar G (1997). The protein digestibility-corrected amino acid score method overestimates quality of proteins containing antinutritional factors and of poorly digestible proteins supplemented with limiting amino acids in rats. J Nutr 127, 758–764.

    CAS  Article  Google Scholar 

  15. Schegg KM, Denslow ND, Andersen TT, Bao YA, Cohen SA, Mahrenholz AM et al. (1997). Quantitation and identification of proteins by amino acid analysis. In: Marshak DR (ed). Techniques in Protein Chemistry VIII. Academic Press: San Diego. pp 207–216. For more information, please see accessed on 1 December 2008.

    Google Scholar 

  16. Schloerb PR, Cook LT, Hall TJ (2002). Digestive release of glutamine from enteral wheat protein hydrolysate. Am J Clin Nutr 75 (Suppl 2), 403 (abstract).

    Google Scholar 

  17. Swails WS, Bell SJ, Borlase BC, Forse RA, Blackburn GL (1992). Glutamine content of whole proteins: implications for enteral formulas. Nutr Clin Pract 7, 77–80. Erratum in: Nutr Clin Pract 1992;7(3):133–134.

    CAS  Article  Google Scholar 

  18. US Department of Agriculture, Agricultural Research Service (2006). USDA. Nutrient Database for Standard Reference, Release 1–16. Nutrient Data Laboratory. For more information, please see and on 1 December 2008.

  19. Willett W, Sampson L, Stampfer MJ, Rosner B, Bain C, Witschi J et al. (1985). Reproducibility and validity of a semi quantitative food frequency questionnaire. Am J Epidemiol 122, 51–65.

    CAS  Article  Google Scholar 

  20. Wu ZC, Chijang CC, Lau BH, Hwang B, Sugawara M, Idota T (2000). Crude protein content and amino acid composition in Taiwanese human milk. J Nutr Sci Vitaminol (Tokyo) 46, 246–251.

    CAS  Article  Google Scholar 

Download references


We are indebted to the Nurses continuous participation in the Nurses’ Health study. We particularly thank the Swiss Institute of Technology for providing free access to their Web site, the Ross laboratory for teaching CL how to navigate on the Expasy Web site, and Martin Van Denburgh for programming support. Supported by 5T32DK007703-07 and K23 DK082732.

Author information



Corresponding author

Correspondence to C M Lenders.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lenders, C., Liu, S., Wilmore, D. et al. Evaluation of a novel food composition database that includes glutamine and other amino acids derived from gene sequencing data. Eur J Clin Nutr 63, 1433–1439 (2009).

Download citation


  • food composition
  • gene sequencing
  • amino acids
  • database
  • glutamine
  • food frequency

Further reading


Quick links