This paper describes the ad hoc methodological concepts and procedures developed to improve the comparability of Nutrient databases (NDBs) across the 10 European countries participating in the European Prospective Investigation into Cancer and Nutrition (EPIC). This was required because there is currently no European reference NDB available.
A large network involving national compilers, nutritionists and experts on food chemistry and computer science was set up for the ‘EPIC Nutrient DataBase’ (ENDB) project. A total of 550–1500 foods derived from about 37 000 standardized EPIC 24-h dietary recalls (24-HDRS) were matched as closely as possible to foods available in the 10 national NDBs. The resulting national data sets (NDS) were then successively documented, standardized and evaluated according to common guidelines and using a DataBase Management System specifically designed for this project. The nutrient values of foods unavailable or not readily available in NDSs were approximated by recipe calculation, weighted averaging or adjustment for weight changes and vitamin/mineral losses, using common algorithms.
The final ENDB contains about 550–1500 foods depending on the country and 26 common components. Each component value was documented and standardized for unit, mode of expression, definition and chemical method of analysis, as far as possible. Furthermore, the overall completeness of NDSs was improved (⩾99%), particularly for β-carotene and vitamin E.
The ENDB constitutes a first real attempt to improve the comparability of NDBs across European countries. This methodological work will provide a useful tool for nutritional research as well as end-user recommendations to improve NDBs in the future.
The emergence of large multicentre nutritional studies, in response to the need for more concerted and harmonized diet-related public health actions in the European Union, has raised major methodological challenges for the standardization of dietary assessments across different study populations and geographical regions, both at the food and nutrient levels (Kohlmeier, 1991; Hautvast et al., 1993; Leclercq et al., 2001; Lagiou and Trichopoulou, 2001; Brussaard et al., 2002). Nutrient databases (NDBs), required to calculate nutrient intakes from food consumption data, are important potential sources of random and systematic dietary measurement errors (Bingham, 1987; Cameron and van Staveren, 1988; Greenfield and Southgate, 1992, 2003; Nieman et al., 1992; Guilland et al., 1993; Lee et al., 1995; Garcia et al., 2003; Hakala et al., 2003; Kim et al., 2003; Vaask et al., 2004). These errors, which will be detailed in this paper, can affect individual and population means as well as the distribution of nutrient intakes, and subsequently distort diet–disease associations. This methodological problem is further amplified when different NDBs with great variability in structure and content are to be used for pooled nutrient analyses on an international scale (Kohlmeier, 1991; Slimani et al., 1995; Deharveng et al., 1999; Puwastien, 2002).
A number of different international and regional initiatives to improve the harmonization of analytical laboratory methods and definition and mode of expression of nutrients have been promoted over the past decades (Chatfield, 1949; Polacchi, 1986; Klensin et al., 1989; Truswell et al., 1991; Greenfield and Southgate, 1992, 2003; Scrimshaw, 1997). The INFOODS organization proposed systems to describe foods and food components to facilitate international food data interchange (Klensin et al., 1989, 1992; Truswell et al., 1991). In Europe, this effort was followed by other regional initiatives and proposals to improve data interchange (Møller, 1992; Schlotke, 1996; Unwin and Becker, 1996) from European networks such as EUROFOODS COST99 and NORFOODS. More recently, a EU concerted action ‘European Food Consumption Survey Method’ (EFCOSUM), proposed recommendations to harmonize methodology for monitoring national nutritional surveys in Europe, including the use of NDBs (Brussaard et al., 2002).
However, despite these major contributions principally oriented towards the harmonization of data management and interchange (or general recommendations), no standardized European food composition databases are yet available. Only recently has a Network of Excellence project on ‘European Food Information Resources’ (EuroFIR) been funded by the European Union (http://www.eurofir.net). The main purpose of this project is to provide comparable nutrient and other food component databases (e.g. bioactive components) between more than 20 participating countries in Europe. However, the end deliverables of this initiative are not expected before 2008–2010.
In the absence of an existing standardized European NDB for nutritional epidemiology and as a pre-requisite for pooled diet-disease analyses, the European Prospective Investigation into Cancer and Nutrition (EPIC) Nutrient DataBase (ENDB) has been developed to harmonize NDBs across the 10 European countries participating in the EPIC. It contains a first set of 26 priority nutrients and their related components between 550 and 1500 foods per country (about 10 000 foods in total). This paper describes the methodological concepts and approaches developed to build the ENDB for epidemiological research, using EPIC as a real study context. Furthermore, this project provides qualitative and quantitative insights into the strengths and weaknesses of standardizing NDBs across countries, as well as recommendations to improve them in the future. Indeed, ENDB is considered as an important pre-existing experience in standardizing NDBs, currently used as a prototype in EuroFIR for generating long-term procedures to handle NDBs at the European level.
Rationale for the development of the ENDB
EPIC is one of the largest cohort studies on diet and cancer worldwide. It involves over 520 000 subjects from 23 centres in 10 European countries (Denmark, Sweden, Norway, Germany, the Netherlands, UK, France, Italy, Spain, Greece) (Riboli et al., 2002). Dietary information was collected at baseline from all subjects using different dietary instruments developed and validated locally (Margetts et al., 1997). A calibration approach was implemented (Kaaks et al., 1994; Slimani et al., 2002) to pool dietary information and relate it to different disease outcomes. This involved collecting a single 24-h dietary recall (24-HDR), as the reference calibration measurement, using a highly standardized computerized program (EPIC-SOFT), from a representative sample of each centre (n=37 000) (Slimani et al., 1999, 2000a). Different EPIC papers on food intakes and cancers (Bingham et al., 2003; Norat et al., 2005; Van Gils et al., 2005; Engeset et al., 2006) and other outcomes (Sabaté et al., 2006, in preparation) have already reported results using the standardized EPIC-SOFT 24-HDRs both to correct systematic errors in the baseline dietary questionnaires and de-attenuate the relative risk estimates at the food (sub) group level (Kaaks et al., 1995). However, so far the use of the calibration approach at the nutrient level has not been possible owing to the lack of standardized reference NDBs. The first objective for developing the ENDB was to provide a standardized reference instrument for calibrating the EPIC dietary measurements at the nutrient level. These NDBs can, however, be used in other study contexts or for other research purposes.
Pre-evaluation of the ENDB project and theoretical concepts of standardization
With limited resources available for this project, it was concluded that it was impossible to generate new analytical compositional data; rather the most cost-effective use was made of the readily available data in the 10 national databases. The actual development of the ENDB was preceded by a series of methodological steps including an extensive review of the comparability of the NDBs available in the countries participating in EPIC (Deharveng et al., 1999). The general theoretical concept developed to standardize NDBs in the EPIC setting and an interim report have been reported previously (Slimani et al., 2000b; Charrondière et al., 2002). This preliminary work served to evaluate the nature and magnitude of the methodological problems of standardizing selected European NDBs, and to propose approaches described in this paper to overcome or better control them.
Structure of the ENDB project and partners involved
Different working groups in charge of specific tasks relevant to the ENDB project were created involving about 30 partners. The ‘Task Force Group’ was in charge of preparing reference guidelines for standardizing ENDB across countries, whereas the ‘Computer Group’ supervised the development of the DataBase Management System (DBMS) (EnMan, EPIC Nutrient Manager) used in the ENDB project. The ‘Compiler Group’ involved selected national compilers in charge of documenting, compiling and evaluating their own national data sets (NDSs), that is, a subset of the national database matching the EPIC foods reported by the study subjects in the 24-HDRs. The ‘EPIC Collaborator Group’, involving mainly experienced nutritionists from EPIC, worked on food matching and preparation of country-specific recipe files. An ‘Expert Group’ involving experts on food chemistry and computer science was in charge of revising, commenting on and validating the documents prepared by the ‘task force’ and ‘computer’ groups.
Database management system (EnMan)
A DBMS (called EnMan) was developed from existing software, the Food Table Input (FTI) program (Unwin and Becker, 2002) to support the documentation, standardization, calculation, evaluation and export of the 10 NDSs and ENDB. This software offers facilities for compiling and documenting NDBs, complying with the recommendations for data exchange and management developed by the COST99/EUROFOODS network (Schlotke et al., 2000) and the ENDB guideline notes for preparing and exporting food composition data. To satisfy the ENDB's specific needs, three EnMan releases were developed for (1) documentation, (2) standardization and evaluation of the NDSs, and (3) building the ENDB. Additional functions to view, copy, edit, calculate and export nutrient data were also implemented in the system. An EnMan version containing the NDS (so-called ‘host database’) was prepared for each country. The other NDSs were also available to the compilers as documented data sources for comparing and evaluating their own national data (e.g. copy values from other documented NDSs for imputing missing values in their own data set).
Main steps for standardizing and building up the ENDB
Figure 1 shows the main steps involved in the construction and standardization of the ENDB. This involved successively (a) Construction of the empty standardized ENDB matrices, that is, the definition of the vertical (food lists) and horizontal (nutrient lists) axes of the matrices; (b) matching of the EPIC foods to those available in national databases or, alternatively, defining how to calculate or adjust them (e.g. recipe calculation, weighted averaging or adjustment for skin, fat, sugar, cooking weight changes and mineral/vitamin losses); (c) documentation of the NDSs; (d) data assessment and standardization of the ENDB component values; (e) evaluation of the standardized NDSs by the national compilers; (f) in parallel, the development of country-specific recipe files and common algorithms to calculate or average nutrient values for EPIC foods not available in the NDSs; (g) building of the ENDB by merging foods derived from the NDSs and those generated by calculation (e.g. recipe calculation); (h) evaluation of the ENDB; (i) export of ENDB and NDSs nutrient values and documentation according to a common format.
The main steps in the construction of the ENDB are detailed successively below. However, for facilitating the reading, some of the steps listed above and depicted in Figure 1 have been combined in the same section. In addition, the calculation processes and the handling of missing values that occur in different steps have been addressed separately.
Building up empty standardized ENDB matrices
The extensive inventory of the national databases available in the countries participating in EPIC revealed differences in the availability and completeness of nutrients of interest for EPIC (Deharveng et al., 1999). In addition, important differences in the number, level of detail, and type of food items were reported across NDBs (Slimani et al., 2000b). These differences, which are particularly difficult to correct for, explain to a large extent the difficulty of comparing and standardizing nutrient values across tables. Furthermore, the discrepancy between the foods available in NDBs and those the study subjects actually reported constitutes one of the principal sources of food coding/matching errors, and subsequent mis-estimation of nutrient intakes (Hoover, 1983; Franck et al., 1984). To minimize these major problems, the general standardization approach adopted for the ENDB first entailed building up new empty matrices, using specific features of the EPIC dietary data (Slimani et al., 2000b), to define its vertical (food lists) and horizontal (component lists) axes.
ENDB food lists (vertical axis of the matrices)
The vertical axis (i.e. food lists) of the ENDB matrices consists of aggregated foods from 37 000 standardized computerized 24-HDRs used as reference calibration measurements in EPIC (Slimani et al., 2000b). In contrast to foods reported in the national NDBs, the level of description from EPIC 24-HDRs is highly standardized and detailed, ensuring that the same or similar foods are described with equal levels of detail within and between countries. This approach, adapted from the LANGUAL multifactorial coding system (FDA, 1992; Langual, 2005), involves the use of ‘facets’ and ‘descriptors’ in addition to the food name in order to describe foods. In EPIC-SOFT, food ‘facets’ (e.g. cooking method and preservation method) were used as systematic questions to describe similar foods, whereas the ‘descriptors’ were used as their pre-defined potential answers available to the subjects (e.g. boiled or fried for the ‘cooking method’ facet) (Slimani et al., 2000a).
Depending on the country, between 3500 (Greece) and 15 000 (France) different food occurrences, resulting from the combination of food names and facet/descriptor strings, were initially reported by the EPIC study subjects. These initial EPIC food occurrences were further aggregated to obtain 547–1537 foods per country as entries in the vertical axis of the ENDB using common rules across countries to aggregate both foods and their respective facets/descriptors. As the calibration approach described previously required only good estimates of mean population intakes, the final list of foods consisted of 80% of total quantities of each EPIC food subgroup consumed. The foods with lower contributions, but which were important sources of the selected components, were also included in the ENDB food lists. For the remaining foods (20% of the subgroup quantities) with low individual contributions to the mean intakes, the component contents were averaged at the food (sub) group level.
ENDB component list (horizontal axis of the matrices)
The nutrients considered in ENDB as horizontal axes of the matrices (see Figure 1) were prioritized according to four main criteria: (1) their pre-evaluated availability in the national databases; (2) their relative comparability; (3) their completeness; and (4) their relevance to the cancer research priorities of EPIC. Of the 100 nutrients considered initially, 26 were prioritized in the ENDB (Table 1). Furthermore, other related components needed for calculation or standardization of the selected components were also considered, although their completeness varies substantially from country to country. The specific approaches used to harmonize the ENDB components across countries are detailed in Table 3. In this paper food components and nutrients will be interchangeably used.
Food matching, sometimes reported as food coding in the literature, is a major step in data processing to derive nutrient estimates from dietary consumption data (Guilland et al., 1993; Price et al., 1995, Welch et al., 2001). This procedure involves linking foods reported by the study subjects to the restricted number of foods actually available in the national databases or other sources of composition data. Alternative approaches to approximate nutrient contents are usually considered if no equivalent foods are found in the reference national databases (e.g. raw-to-cooked, recipe calculation or averaging calculation). In most cases, the procedure for aggregating food lists precedes or is implicitly included in the food matching approach.
In the ENDB project (see Figure 1), the food matching procedure was highly standardized across countries. The facet/descriptor approach used in EPIC-SOFT to standardize food description across countries facilitated this task. Furthermore, a common matching code system was developed in order to define the main food characteristics to be considered per food group when matching. The nature and quality of the food matching was coded to indicate systematically whether the EPIC foods and those available in the national (or other) databases were exactly the same (‘exact match’) or similar (‘similar match’) in definition, description and nutrient content. Two other matching options were available for foods to be obtained by recipe calculation (‘calculated as recipe’) or by weighted average (‘generic item’), if no exact equivalent or similar food was available in the NDSs. When the match was not exact, further information on food characteristics leading to differences in the final nutrient contents such as differences in cooking methods, fat content and visible fat, food with or without skin or peel, source of foods (e.g. chicken sausage vs turkey sausage), sugar content, canning medium, physical state (e.g. dried, powder vs reconstituted or fresh) were always reported. In ENDB, these differences were tentatively corrected using common algorithms as detailed under the section ‘Calculation of reported foods not available in NDSs'. As far as possible, it was recommended to match consumption data to contemporary composition data, especially to take into account possible changes in food fortification. Furthermore, the food matching performed locally was systematically re-checked by two senior nutritionists at the coordinating centre, who supervised and provided assistance on how to use the reference guidelines.
Figure 2 shows that about 40% (France, Italy, Spain, Denmark and Sweden) to 60% (UK, Germany, NL and Norway) of the EPIC foods were matched to exact foods available in the 10 NDBs, whereas a marginal proportion was matched to similar not adjustable NDB foods (⩽2%). Between 15 (Sweden, UK) and 30% (Spain, Italy) of the EPIC foods were matched to NDB foods requiring further adjustments for weight changes, vitamin/mineral losses or for differences in fat or sugar content. Between 17 (Germany, Greece, NL) and 37% (Denmark, Sweden) of the EPIC foods were not readily available in the NDBs but obtained by recipe. Overall few EPIC foods have been treated as generic items (⩽4%).
Target documentation of the NDSs
The harmonization of NDBs requires a high level of documentation to evaluate the level of comparability across databases accurately and develop specific procedures to correct for systematic or random differences, when relevant (Greenfield and Southgate, 1992, 2003). The national databases initially available in the EPIC participating countries were either not documented or not in a fully comparable way. The national compilers involved in the ENDB project, except Germany, were therefore asked to document a subset of their national databases corresponding to foods matching the EPIC foods (so-called NDSs), according to standard procedures and formats. These procedures were based on the ‘EUROFOODS recommendations for food database management and data interchange’ (Schlotke et al., 2000), which have been further extended and adapted to the ENDB needs. Documentation of the NDSs involved collecting information on the source of the database, component values (unit, mode of expression, value type, type of analytical or calculation method, statistical information), food groups (classification according to national and EPIC food groups), food description (scientific and taxonomic names, national and English names, information on the food samples) and publications (as references for the sources of values or reporting analytical method). However, considering both the limited resources and information available, the documentation was targeted and prioritized to obtain the information actually relevant to data standardization. For example, beside possible differences in units corrected for in ENDB, the minerals considered (calcium, potassium, iron, magnesium and phosphorus) were assumed to be comparable in terms of analytical methods across countries, whereas fibre, vitamin E and β-carotene required more specific information on the units, mode of expression, methods of analysis and calculation for appropriate standardization across countries. The documentation enabled systematic coding of whether given expected information was available or not. Thus the status of documentation of the individual NDSs could be evaluated and the procedures of standardization and evaluation adapted accordingly, as summarized in Table 3.
In Germany, the documentation and full evaluation of the NDS was not possible within the ENDB project because the institute in charge of the national NDB was moved to another town while the project was on-going. An emergency solution involving the German EPIC collaborators was set up with the aim of obtaining at least the nutrient data and the general database policies to identify systematic problems of standardization or comparability.
Table 2a summarizes the level of documentation obtained on the method of analysis, which is crucial for the standardization of method-dependent components. As requested to the national compilers, values resulting from direct or indirect analyses (i.e. analytical results (4–23%), aggregation of analytical results (0–45%), summation of constituent components (2–16%), calculation involving conversion factors (3–15%) and calculated on component profile (<0.5–2%)), were largely documented in most data sets. These analytical values represent 20% (Italy) to 68% (France) of the NDS values. Overall, 87% of them are documented for analytical method with a large variation across countries, from 51% in France to over 90% in Italy, the UK, Greece, Sweden, Denmark and Norway. However, when all NDS values are considered, the documentation on method of analysis is missing for 70% of the values. To a large extent, this missing information concerns values with unknown method type (30%) or estimated values (23%). This lack of documentation has relatively little impact on the selected ENDB components with analytical methods which are assumed to be comparable, such as fatty acid fractions, cholesterol, sugars, starch, alcohol, B-vitamins, vitamin C and minerals. However, for fibres (Table 2b), for example, for which values are highly method-dependent, only 26% values were assessed as comparable to the reference AOAC methods, whereas 74% are of indeterminate comparability. Nevertheless, more than half of these values concern logical zeros from non-plant foods or low sources of fibres (e.g. sauces, miscellaneous foods), implicitly assumed comparable. Furthermore, as fibre values from fruits and vegetables and derived products in the ENDB were also assumed to be comparable (see Table 3), only 8–18% of the total fibre values of indeterminate comparability (fibres from potato-, legume- and cereal-based foods) required visual checks to improve comparability and identify outlier values. So, finally, 81–96% of fibre values can be assumed to be comparable.
Standardization, data assessment and evaluation of the NDSs
Following the documentation (see Figure 1), the main objective of the harmonization of the NDSs by the national compilers was to re-evaluate the documented NDSs to improve their comparability and completeness. The main purpose of the ENDB was not to re-evaluate the reliability of the existing national data but to improve their comparability and completeness for use on an international scale.
Strategies and priorities were, therefore, defined to make the most cost-effective use of the current available resources, taking into account the level of information and documentation actually available in the 9 NDSs. To evaluate the NDSs, a series of assumptions and decisions were made:
It was decided to accept the national data, assuming that the national compilers provide the most reliable values available locally, although it still might include obsolete or non-comparable values. Errors or inconsistencies were reported to the national compilers when identified.
It was assumed that the national compilers had provided all available and required documentation during the target documentation phase and that any further attempt to obtain more information was incompatible with the resources available and the time-schedule of this project. Furthermore, although the sampling procedures for analytical values were not always provided, we assumed they were suitable for reflecting local food practices and habits.
When available in NDSs, the nutritional values of the foods cooked without fats were assumed to be reliable, whereas the values for foods with added fat were systematically recalculated using an algorithm common to all countries because of the internal rules of EPIC-SOFT, as explained under section 6.
It was assumed that for nutritional epidemiology research purposes, it is better to approximate nutrient values than to leave them as missing.
The evaluation of the documented NDSs by the national compilers was performed using an evaluation release of the EnMan DBMS. To minimize the inherent differences between data sets, country-specific reports and instructions were given to the national compilers, taking into account the completeness and level of documentation of their own data set. Assessment reports on specific component groups were available to the national compilers, as well as EnMan routine facilities to identify and display values requiring further documentation or to be evaluated, that is, non-comparable or non-standardized values. Most of the standardization of units and mode of expression in the NDSs was completed automatically. However, any values still not standardized (e.g. alcohol values estimated in volume instead of grams with missing specific density factors for recalculation) were automatically flagged by the EnMan system. An ad hoc grading system developed by the expert group was used to label the values as comparable, not comparable or comparability undetermined, according to the documentation available and the ENDB reference analytical methods and component definitions. This grading also takes into account methods that are food-group specific, obsolete or inappropriate. For a large proportion of values reported in the NDSs, no information on the analytical or calculation method was provided, so these values were graded as of indeterminate comparability (∼72%, including no and indeterminate method of analysis). As it is assumed that compilers have provided the best values available, the evaluation of these values was restricted to a visual check for comparison with similar foods, particularly for components known to be highly dependent on the analytical method or definition, such as dietary fibre.
Other values from the NDSs were automatically assigned as ‘comparable values’, and the national compilers were therefore not asked to evaluate them. These concern particularly component values that are automatically and systematically calculated by the EnMan system (e.g. protein values re-calculated using the evaluated nitrogen values or conversion factors) or values assumed to be comparable in definitions and methods of analysis used (e.g. selected ENDB mineral values). The missing values to be imputed by the national compilers were also specifically flagged and were retrievable per component and/or food groups. In addition, different facilities were available to enable the national compilers to edit, change and/or adjust an existing value and its related documentation, copy values from the same or similar foods in the national or other data sets, calculate a simple average from several foods or derive a value by profile calculation from a similar food for missing fatty acid fractions or sugar/starch.
Furthermore, frequency of consumption derived from the EPIC 24-HDRs was also available in EnMan to prioritize the evaluation of foods contributing more to the mean population intakes, as this is the purpose of the EPIC calibration. After evaluation by the national compilers or automatic (back-) calculations performed in the EnMan system, the evaluation status of individual values changed from ‘to be evaluated’ to ‘evaluated’, and the related documentation was updated/provided accordingly whenever original values were changed or missing values imputed.
Depending on the status of comparability across countries, each ENDB component was standardized, evaluated and completed for missing values, as far as possible, according to component-specific procedures summarized in Table 3.
Imputation of missing values
Missing values in NDBs are an important source of imprecision that may affect nutrient intakes by different magnitudes, particularly if the completeness of the NDBs used in pooled analyses varies. Several authors have suggested that it is better to approximate a missing value rather than to leave it blank and assign a zero value (Schakel et al 1997; Cowin and Emmett, 1999). One of the first objectives of the documentation phase in the ENDB project was to identify the missing values corresponding to logical zeros. This task was crucial to concentrate on the imputation of real missing values only. Facilities were implemented in the EnMan DBMS to flag and display easily the missing values to be imputed per component and/or food group. The national compilers were asked to reduce the number of missing values in their data set following specific component-group strategies and by prioritizing foods with high consumption (derived from the EPIC 24-HDRs) or considered as an important source of the given component. Values could be imputed using the best available alternatives from similar foods in the same NDSs or alternatively from the same foods in other foreign data sets. For fatty acid fractions (saturated, monounsaturated and polyunsaturated fatty acids) or sugars and starch, the imputed missing values could also be derived from profile calculation of foods with similar fat (or carbohydrate) content but with no missing values, using their relative proportion of fatty acids (or sugar and starch). The choice of the most appropriate procedure to impute missing values among the restricted options was under the national compiler's appreciation.
The level of completeness in the NDSs has been considerably improved following specific procedures detailed in Table 3, particularly for certain components and countries (Table 4). Except Germany (100%), the overall completeness of the initial NDSs ranged from 81% (Sweden, Greece) to ∼94% (France, NL, Norway). After evaluation and imputation of missing values the overall completeness is above 98% in all NDSs. Certain ENDB components were missing in the original NDSs, for example, magnesium and vitamin B12 in Italy and sugars and starch in Sweden and/or have definitions differing from the reference ENDB ones which have been corrected in the final evaluated NDSs. The latter concerns particularly β-carotene in France and Italy; vitamin C in Sweden; vitamin E in Sweden and Norway and carbohydrates in Denmark. Among the missing values in the original NDSs, 17% (Norway) to 57% (NL) were evaluated as logical zeros and assigned zero values during the documentation or evaluation phase, whereas 40% (NL) to above 60% (Denmark, Sweden, Spain, Greece and Norway) were real missing values for which values different from zeros were imputed. A minimal proportion of missing values remained in the final evaluated NDSs (<1%).
Calculation of reported foods not available in the NDSs
Foods not available in the national databases but frequently reported in the EPIC 24-HDRs or nutritionally important (e.g. foods less frequently consumed but with high concentrations of nutrients of interest, such as liver and derived products for retinol) were obtained by different calculation procedures. These concerned particularly foods cooked according to cooking methods not available in NDBs, composite foods or mixed recipes not broken down during the 24-HDR interviews (e.g. cakes, biscuits, soups, sauces, cream desserts), and generic items (i.e. unspecified foods such as ‘meat n.s.’or ‘oil n.s.’). Furthermore, when the matching between the foods reported by the study subjects and those available in national databases was not perfect because of differences in fat or sugar content or skin/visible fat consumed or not, appropriate further adjustments were performed.
One of the important sources of heterogeneity between European NDBs is that they do not provide, or not systematically, sufficient information on different cooking methods that are reported by the study subjects. These differences may introduce systematic or random errors in the estimation of all nutrient intakes, particularly thermolabile vitamins and minerals. In the ENDB, the same algorithms, standard coefficients and procedures have been used to adjust for cooking. After aggregation of the detailed cooking methods reported in the EPIC 24-HDRs (about 30 different methods), the rules described below were applied, considering separately whether foods were cooked with fats or not.
Foods cooked without fat (i.e. boiled, steamed, poached, blanched, etc.) were matched to the same/equivalent cooked foods from the national or foreign data sets, if the match was exact. If this was not the case, foods reported cooked were matched to raw foods and adjusted for weight changes and vitamin/mineral losses (e.g. ‘potato, boiled’ matched to ‘potato, boiled’, or else to ‘potato, raw’). The foods cooked with fat were systematically broken down during the 24-HDR interviews, so they were available as two separate items (cooked food and its related fat used for cooking) when matching to the NDBs. Fats were matched to corresponding fats in the national databases, whereas the related foods (cooked foods minus cooked fats) were systematically matched to raw foods (‘potato, fried’ matched to ‘potato, raw’), and the resulting values adjusted. These involve adjustments for water (and fat for meats), weight changes and mineral and vitamin losses due to cooking and preparation practices, using food-specific raw-to-cooked coefficients and retention factors available in the participating countries or gathered from the literature (Møller, 1994; Bergström, 1997; Bognar and Piekarski, 2000; Bognar, 2002; USDA, 2003).
Recipe calculation was proposed for calculating nutrient content of reported multi-ingredient foods not available in the local or foreign NDBs, using standard procedures across countries. A standard format was used to create country-specific recipe databases. Information and bibliographic sources on the recipe and its related ingredients were systematically collected (e.g. local and English names, cooking method used, homemade vs commercial recipes, reconstituted from powder as well as the specific type of fat or liquid used, whether its ingredient quantities were estimated as cooked or raw and with inedible part or not). Recipe-specific weight change factors for water and fat were also collected and used when appropriate. Ingredient-specific density, edible and raw-to-cooked factors and standard units (e.g. medium onion=110 g) were used to calculate the weights of ingredients as 100% edible, raw and in grams. Specific algorithms implemented in EnMan to calculate the nutrient content of recipes took into account the water, fat and possibly alcohol weight changes of the entire recipe, and weight changes due to mineral and vitamin losses at the ingredient level.
Depending on the country, the final recipe files contain between 97 and 580 standard or substitutable recipes (see Table 5). The latter are variants of the same standard recipes (e.g. cakes, biscuits, sauces and soups) taking into account the types of fat (e.g. specific vegetable oils, margarines) and liquids (e.g. milk, water, broth) systematically questioned in the EPIC-SOFT 24-HDRs when these mixed foods were homemade or commercial (for fat only).
The generic or non-specified items correspond to foods reported with insufficiently detailed description or specification to be properly identified and matched to NDBs (e.g. citrus fruits, red meat, cakes or vegetable oils not specified). In ENDB, the nutrient content of generic items was averaged using country-specific internal weighted factors based on the frequencies of consumption of related foods reported in the EPIC-SOFT 24-HDRs (e.g. weighted average of olive oil, sunflower oil, corn oil, etc. for the generic item ‘vegetable oil’), although it represents a limited proportion of reported foods.
Final construction and content of the ENDB
As indicated in Table 5, the final build up of the ENDB is based on the combination of two sets of food data coming from the documented national (or foreign) data sets and the country-specific recipe files. A given national or foreign food (or standard recipe) may have been matched to several ENDB foods. Except for Greece, 79–97% of the foods used to build up the ENDB, came directly or indirectly from their NDSs whereas the remainder came from other EPIC data sets. For Greece, 68% of the foods used to build up its ENDB, came directly or indirectly from its NDS. From these 28% refer to 96 local foods recently chemically analysed, with missing values completed by imputation, whereas 41% refer to 142 UK foods included in the published NDB (Trichopoulou, 2004) and considered part of the Greek NDS. Beside this, an additional percentage of UK foods (13%), not included in the published NDB have been borrowed for the completion of the ENDB. Consequently, in total, 53% of the foods used to build up its ENDB are derived from UK data and documented by the British compiler.
Overall, the ENDB contains a total of 10 076 foods varying from about 547 in Greece to 1537 in Sweden, representing about 260 000 nutrient values. About 40% (Denmark, Italy, Spain) to about 60% (NL, UK, Germany, Norway) come from a direct copy of the foods from national or foreign foods, when the match was exact (or similar). The rest of the ENDB foods (37–64%) are obtained by different calculation processes. 15% (NL, Sweden) to about 30% (Spain, Italy) of the ENDB foods result from further adjustments of the original NDS foods for weight changes and mineral/vitamin losses whereas 17% (Germany, Greece, NL) to 37% (Denmark, Sweden) are obtained by recipe calculation. The contribution of generic foods obtained by weighted averaging represents less than 4% in ENDB.
The lack of comparable food databases is a major obstacle to investigating the wide range of food components of etiological interest in international multicentre epidemiological studies. To a large extent, this situation can be explained by the difficulty and cost of compiling comparable databases. Another more methodological reason is that standardization between national databases has so far been essentially focused on the harmonization of nutrients (i.e. modes of expression, units and chemical analytical methods) and international food data interchange (Klensin, 1992; Schlotke, 1996; Greenfield and Southgate, 2003). The complete standardization of existing NDBs has, however, always come up against the difficulty of standardizing the food lists. Some special features of the ENDB project, resulting from the use of standardized 24-HDRs as food entries in the matrices, helped overcome this problem partially. The same food aggregation rules were applied to generate new matrices (food lists) independently of differences in type and number of foods across the national databases. As a major implication, a relatively large proportion of the ENDB foods were approximated by calculating/adjusting in order to compensate for the lack of equivalent foods in the national or other databases (37–64%). Thus, in addition to being a first attempt to standardize NDSs for multicentre/population epidemiological research, the ENDB should be seen as a prototype offering a unique opportunity to evaluate the current strengths and limitations for harmonizing NDBs and to contribute to their improvement through the broader context of the EuroFIR project. Furthermore, as the project is nested in EPIC, an existing large multicentre nutritional study, it was possible to address problems not only relevant to the standardization of NDBs but also those related to their matching to real consumption data.
Despite some lack of documentation on, particularly, the definition and chemical method of analysis used for certain components, the ENDB improves the comparability of NDBs used in the 10 EPIC countries. This concerns first an extensive documentation of the NDSs by retrieving information that was not always readily available (i.e. information retrieved from laboratory reports, scientific papers, food surveillance information sheets, etc.). Although it was not always possible to obtain the full requested information, this provides the present documentation status of individual NDSs so that procedures can be adopted to standardize and evaluate them accordingly. Furthermore, all component values have been (re-) expressed according to the same standard units, and the modes of expression harmonized when expressed differently. The latter concerns particularly carbohydrates and all component values of alcoholic beverages in some countries. Protein values were systematically back-calculated using 6.25 as standard reference as well as total energy using the Atwater factors in kJ, 17,17,37,29 for proteins, carbohydrates, fats and alcohol, respectively. The number of missing values, which were unequally distributed among components and countries, were substantially reduced using common procedures of imputation, whereas logical zeros and trace values were identified in each data set and assigned zeros or appropriate numerical values. However, although these advances are important achievements for improving the comparability of NDBs within the ENDB project, certain inherent methodological problems related to the use and standardization of NDBs are still unsolved.
The lack of documentation, or differing levels of documentation across databases on definitions and methods of analysis is currently one of the major limitations to evaluating the quality of nutrient data (Holden et al., 2002) and improving further the comparability of NDBs. The lack of information on sampling procedures did not allow evaluating their level of comparability and possible impacts on the actual values. Furthermore, only 26% of the dietary fibre values have information on their definition and method of analysis. Similar figures are also reported for other major nutrients of research or public health interest for which the standardization is definition- and/or method-dependent (e.g. fats, carbohydrates, vitamin D, retinol and β-carotene). Most of the missing information on the method of analysis is, however, due to logical zeros, imputed/estimated values and other values with unknown origin. Because of the absence of fully adequate documentation, alternative evaluation procedures combining automatic, food-group specific and visual checks were developed, as summarized in Table 3. As experienced in certain ENDB countries, tracking back the documentation was a heavy task because the NDBs were either not documented, or not according to the same rules or data support/format as those proposed in the ENDB interchange guidelines.
Another intrinsic problem when using NDBs is that they cannot contain and easily maintain overtime the permanent changes in the consumer market, reflecting new foodstuffs and changes in existing ones, such as food fortification and nutrient enrichment or reformulation (Anderson et al., 2001; Gillanders et al., 2002). The latter reported an annual turnover of 30–50% in The New Zealand Manufactured Food Database of 5000 manufactured food items. The aggregation of foods according to similarities in their botanic origin, food classification, food preparation, cooking practices and nutrient content or fortification/enrichment is one of the possible approaches to handle this problem and reduce the number of food entries in NDBs. However, this may also artificially reduce true within- and between-subject variability in nutrient intakes by combining foods with heterogeneous nutrient contents, particularly if NDBs offer a limited number of food entries. As an indication of the order of magnitude of this problem, the original EPIC food occurrences were aggregated reducing the list of EPIC foods by a factor 6–12, depending on the country, before being matched to national databases. This high degree of aggregation is owing to the high level of details initially in the EPIC 24-HDRs, using a facet/descriptor approach to describe fully foods reported according to their source, physical state, method of preservation, packing medium, cooking method, etc., including brand names or product names.
Although it has a series of limitations, the only alternative left to end users is to approximate the nutrient content by means of calculation or adjustment procedures. As reported in our study, a large proportion of the ENDB foods (40–60%) come from adjusted values of the NDS foods or other calculation methods (i.e. raw to cooked and recipe calculation or weighted averaging), using algorithms common across databases. This approximation was preferable to forced matching which would not take into account important differences in nutrient content or individual information provided, for example, on cooking methods and types of fats used. This approach, adopted in the ENDB, also reveals the current lack of reference algorithms and country- and food-specific conversion factors to refine these approximations. Despite important efforts already undertaken (Møller, 1994; Bergström, 1997; Bognar and Piekarski, 2000; Bognar, 2002; USDA, 2003), more resources are needed to develop common algorithms and related coefficient databases for calculating important reported foods not available in the national databases. Considering the increasing proportion of commercial or processed foods in European diets, the food industry should be involved as a major partner in these efforts (Weiss, 2001). Furthermore, although food aggregation and matching are traditionally considered an end user's task, the errors associated with food coding/matching might be substantially reduced and better controlled if common guidelines were set up across European countries with support from national compilers. Increasing the number of basic foods analysed according to different cooking methods and preparation methods, and defining a systematic approach to deal with mixed recipes and industrial multi-ingredient foods seems the only way to reduce in the future the gaps between the foods available on the market and reported by the study subjects and those actually available in national databases. A precise assessment of the absolute intakes as ‘final consumed’ forms (i.e. including all nutrient losses), as well as a better understanding of the bioavailability of food components, is also becoming crucial owing to the increasing use of biomarkers of diet in epidemiological research (Toniolo et al., 1997; Mayne, 2003).
The ENDB started with a list of 26 priority components, but other nutrients and bioactive compounds are of research interest such as folates, individual carotenoids and fatty acids, tocopherols, flavonoids, sulphur compounds, lignans and phytosterols. Most of them should be covered by the EuroFIR project, and it is anticipated that the ENDB will be further extended and regularly updated using the EuroFIR outcomes once they are available. In parallel, the EPIC network will investigate whether reliable biomarkers of diet (or dietary status) instead of, or in addition to, nutritional measurements should be used to strengthen the estimation of individual nutritional exposure (Kaaks et al., 1997; Prentice et al., 2002; Potischman and Freudenheim, 2003). This is particularly important for nutrients, bioactive and other components either not available in NDBs, with limited reliability and/or difficult to standardize across countries, for example, folates, sodium, selenium, individual fatty acids, contaminants and pesticides (Arab, 2003; Hambidge, 2003; Mason, 2003; Potischman and Freudenheim, 2003).
Another experience in standardizing NDBs between 14 countries world wide has been reported recently (Merchant and Dehghan, 2006). In this on-going study, the USDA database has been used as primary data source for estimating nutrient intakes from a semi-quantitative Food Frequency Questionnaire, with adaptations according to reference local food composition tables and recipe calculation for local mixed dishes. This approach assumes that using the same nutrient data source (i.e. USDA) will make the errors associated with NDBs more consistent across countries. Although it is debatable, it reflects once again the difficulties for end-users in addressing the problems of standardizing NDBs at the international level. This is particularly true in the context of this large epidemiological study that, in contrast to ENDB, involves developed and developing countries from Asia, America and Africa with sometimes no national databases available nor support from national compilers for documenting, evaluating and standardizing NDBs. Furthermore, as pointed out by the authors, one of the major limitations of this strategy is the limited number of foods with comparable data, that makes it unsuitable for detailed dietary assessment methods such as diaries and 24-HDRs.
In the context of nutritional studies, the effects on nutrient intake values of errors and/or inconsistencies in NDBs are of crucial importance. These are, however, particularly tricky to evaluate because the transformation of dietary intakes into nutrients occurs at a late stage in data processing. The net errors/uncertainties observed at the nutrient intake level are, therefore, not specific to NDBs but include also other errors occurring during data collection, coding, processing and calculation (Bingham, 1987; Cameron and van Staveren, 1988; Greenfield and Southgate, 2003). In EPIC, the interest is to evaluate the specific effects of standardizing NDBs at the population level, as the ENDB was initially developed for between-population calibration. With this purpose, published data comparing centre mean nitrogen (protein) and energy intakes using non-standardized NDSs, were re-analysed to compare the same results before (i.e. using the non-standardized NDSs) and after standardization (i.e. using ENDB, where the same reference nitrogen factor 6.25 was used to re-calculate the protein values). The same data set was used to control for potential sources of error attributed to food consumption data. This involved a convenient sample of 1103 volunteers from 12 EPIC centres from whom a single EPIC-SOFT 24-HDR and a 24-h urine sample were systematically collected (Slimani et al., 2003). When compared at the ecological level (n=22 gender-specific centres), Pearson's weighted correlation between centre mean urinary nitrogen and mean nitrogen from 24-HDRs increased slightly after standardization of NDBs from 0.86 to 0.89, whereas it remained the same when mean energy intakes were considered (∼0.91).
Overall, these preliminary results suggest that errors or differences in NDBs may have no or modest effects on the ranking of population mean nitrogen (protein) and energy intakes when compared to an independent quantitative marker such as urinary nitrogen. This may be explained by the large variability in dietary consumption, confirmed by biomarkers of diet, observed across the EPIC centres (Southgate et al., 2002; Al-Delaimy et al., 2005) that probably largely exceeds the variability owing to imprecision in protein values, which are assumed to be relatively comparable across national databases (Deharveng et al., 1999). Another explanation is a relatively systematic increase in the mean values across centres, which corroborates the absence of significant differences in correlations before or after standardization. However, the positive impact of the standardization may be much greater for other nutrients with larger day-to-day variability (e.g. β-carotene, retinol, iron, vitamin C and vitamin E), greater level of incompleteness (e.g. fatty acids, starch, sugars, retinol and β-carotene) and specific standardization difficulties (e.g. dietary fibres). These important methodological issues will be investigated further in the future. However, these preliminary results re-enforce the overall statement that the effects of random and systematic errors on dietary exposures could be minimized in study settings involving large and heterogeneous populations combined with efforts to standardize dietary instruments and NDBs. This opens encouraging perspectives for the joint efforts to conduct pan-European studies or other international nutritional research projects at the European Union level.
European Prospective Investigation into Cancer and Nutrition
EPIC Nutrient Database
DataBase Management System
24-hour diet recall
Al-Delaimy WK, Slimani N, Ferrari P, Key T, Spencer E, Johansson I et al. (2005). Plasma carotenoids as biomarkers of intake of fruits and vegetables: ecological level correlations in the European Prospective Investigation into Cancer and Nutrition (EPIC). Eur J Clin Nutr 59, 1397–1408.
Anderson E, Perloff B, Ahuja JKC, Raper N (2001). Tracking nutrient changes for trends analysis in the United States. J Food Comp Anal 14, 287–294.
Arab L (2003). Biomarkers of fat and fatty acid intake. J Nutr 133, 925S–932S.
Bergström L (1997). Nutrient Losses and Gains in the Preparation of Foods. National Food Administration: Uppsala, Sweden.
Bingham SA (1987). The dietary assessment of individuals; methods, accuracy, new techniques and recommendations. Nutr Abs Rev (series A) 57, 705–742.
Bingham SA, Day NE, Luben R, Ferrari P, Slimani N, Norat T et al. (2003). Dietary fibre in food and protection against colorectal cancer in the European Prospective Investigation into Cancer and Nutrition (EPIC): an observational study. Lancet 361, 1496–1501.
Bognar A (2002). Naehrstoffverluste bei der haushaltsmaessigen Zubereitung von Lebensmitteln. AID-Verbraucherdienst. Bognar, Tables on weight yield of foods and retention factors of food constituents for the calculation of nutrient composition of cooked foods (dishes), ISSN 0933-5463, BFE-R- -02-03 onhttp://www.bfa-ernaehrung.de/Bfe-Deutsch/Information/bfeber91.htm.
Bognar A, Piekarski J (2000). Guidelines for recipe information and calculation of nutrient composition of prepared foods (dishes). J Food Comp Anal 13, 391–410.
Brussaard JH, Lowik MR, Steingrimsdottir L, Moller A, Kearney J, De Henauw S et al. (2002). A European food consumption survey method, conclusions and recommendations. Eur J Clin Nutr 56 (Suppl 2), S89–S94.
Cameron ME, Van Staveren WA (1988). Manual on Methodology for Food Consumption Studies. Oxford Medical Publications: Oxford.
Charrondière UR, Vignat J, Møller A, Ireland J, Becker W, Church S et al. (2002). The European nutrient database (ENDB) for nutritional epidemiology. J Food Comp Anal 15, 435–451.
Chatfield C (1949). Food Composition Tables for International Use. FAO Nutritional studies No. 3. Nutrition Division, FAO UN: Washington, DC. October 1949.
Cowin I, Emmett P (1999). The effect of missing data in the supplements to McCance and Widdowson's food tables on calculated nutrient intakes. Eur J Clin Nutr 53, 891–894.
Deharveng G, Charrondière R, Slimani N, Riboli E, Southgate DAT (1999). Comparison of food composition tables available in the nine European countries participating in EPIC. Eur J Clin Nutr 53, 60–79.
Engeset D, Alsaker E, Lund E, Welch A, Khaw KT, Clavel-Chapelon F et al. (2006). Fish consumption and breast cancer risk. The European Prospective Investigation into Cancer and Nutrition (EPIC). Int J Cancer (Epub ahead of print).
FDA (1992). Langual Users' Manual. US Food and Drug Administration, Centre for Food Safety and Applied Nutrition: Washington, DC.
Franck GC, Hollatz AT, Webber LS, Berenson GS (1984). Effect of interviewer recording practices on nutrient intake – Bogalusa Heart Study. J Am Diet Assoc 84, 1432–1439.
Garcia V, Rona RJ, Chinn S (2003). Effect of the choice of food composition table on nutrient estimates: a comparison between the British and American (Chilean) tables. Publ Health Nutr 7, 577–583.
Gillanders L, Steeper A, Watts C (2002). Impact of a dynamic food supply on food composition databases. J Food Comp Anal 15, 523–526.
Greenfield H, Southgate DAT (1992). Food Composition Data: Production, Management and Use (1st edn.). Elsevier: London and New York.
Greenfield H, Southgate DAT (2003). Food Composition Data: Production, Management and Use (2nd edn). FAO: Rome.
Guilland JC, Aubert R, Lhuissier M, Peres G, Montagnon B, Fuchs F et al. (1993). Computerized analysis of food records: role of coding and food composition database. Eur J Clin Nutr 47, 445–453.
Hakala P, Knuts LR, Vuorinen A, Hammar N, Becker W (2003). Comparison of nutrient intake data calculated on the basis of two different databases. Results and experiences from a Swedish-Finnish study. Eur J Clin Nutr 57, 1035–1044.
Hambidge M (2003). Biomarkers of trace mineral intake and status. J Nutr 133, S948–955.
Hautvast JG, van Staveren WA, de Groot LC (1993). Methodologic issues in the EURONUT SENECA study. Aging (Milano) 5 (2 Suppl 1), 37–43.
Holden JM, Bhagwat SA, Patterson KY (2002). Development of a multinutrient data quality evaluation system. J Food Compos Anal 15, 339–348.
Hoover LW (1983). Computerized nutrient databases: I. Comparison of nutrient analysis systems. J Am Diet Assoc 82, 501–505.
Kaaks R, Plummer M, Riboli E, Estève J, Van Staveren WA (1994). Adjustment for bias due to errors in exposure assessments in multicenter cohort studies on diet and cancer: a calibration approach. Am J Clin Nutr 59 (Suppl.), S245–S250.
Kaaks R, Riboli E, Sinha R (1997). Biochemical markers of dietary intake. In: P Toniolo, P Boffetta, D Shuker, N Rothman, B Hulka, N Pearce (eds). Application Of Biomarkers in Cancer Epidemiology. International Agency for Research on Cancer: Lyon. (IARC Sci. Publ. No. 142), pp 103–126.
Kaaks R, Riboli E, van Staveren WA (1995). Calibration of dietary intake measurements in prospective cohort studies. Am J Epidemiol 142, 548–556.
Kim E-S, Ko Y-S, Kim J, Matsuda-Inogushi N, Nakatsuka H, Watanabe T et al. (2003). Food composition table-based estimation of energy and major nutrient intake in comparison with chemical analysis: A validation study in Korea. Tohoku J Exp Med 200, 7–15.
Klensin JC (1992). INFOODS Food Composition Data Interchange Handbook. The United Nations University: Tokyo.
Klensin JC, Feskanich D, Lin V, Truswell AS, Southgate DAT (1989). Identification of Food Components for INFOODS Data Interchange. UNU Press: Tokyo.
Kohlmeier L (1991). Problems and pitfalls of food-to-nutrient conversion. In: W Becker and E Helsing(eds). Food and Health Data. Their Use in Nutrition Policy-making European Series No. 34, WHO Regional Publications, WHO: Copenhagen. pp 73–84.
Lagiou P, Trichopoulou A, DAFNE contributors, DAta Food NEtworking (2001). The DAFNE initiative: the methodology for assessing dietary patterns across Europe using household budget survey data. Publ Health Nutr 4, 1135–1141.
LanguaL (2005) Website:www.langual.org.
Leclercq C, Valsta LM, Turrini A (2001). Food composition issues- implications for the development of food-based dietary guidelines. Publ Health Nutr 4, 677–682.
Lee RD, Nieman DC, Rainwater M (1995). Comparison of eight microcomputer dietary analysis programs with the USDA nutrient data base for standard reference. J Am Diet Assoc 95, 858–867.
Margetts BM, Pietinen P, Riboli E (1997). EPIC European Prospective Investigation into Cancer and Nutrition. Validation studies on dietary assessment methods. Int J Epidemiol 26 (Suppl 1), 1–189.
Mason JB (2003). Biomarkers of nutrient exposure and status in one-carbon (methyl) metabolism. J Nutr 133, 941S–947S.
Mayne ST (2003). Anti-oxydant nutrients and chronic disease: use of biomarkers of exposure and oxidative stress status in epidemiologic research. J Nutr 133, 933S–940S.
Merchant AT, Dehghan M (2006). Food composition database development for between country comparisons. Nutr J 1–8.
Møller A (1992). NORFOODS Computer Group – Food composition data interchange among the Nordic Countries: A report in International Food Databases and Information Exchange. In: Simopoulos AP and Butrum RR (eds). World Rev Nutr Diet. Basel: Karger. pp 94–103.
Møller A (1994). Loss of nutrients during preparation/cooking: a practical, standardised and systematic approach. In: Report of the Third Annual Meeting of the FLAIR Eurofoods-Enfant Project. University of Wageningen: Netherlands. pp 104–108.
Nieman DC, Butterworth DE, Nieman CN, Lee KE, Lee RD (1992). Comparison of six microcomputer dietary analysis systems with the USDA Nutrient Data Base for Standard Reference. J Am Diet Assoc 92, 48–57.
Norat T, Bingham S, Ferrari P, Slimani N, Jenab M, Mazuir M et al. (2005). Meat and fish consumption, and colorectal cancer risk: the European Prospective Investigation into Cancer and Nutrition (EPIC). J Natl Cancer Inst 97, 906–916.
Polacchi W (1986). Standardised food terminology : an essential element for preparing and using food consumption data on an international basis. Food Nutr Bull 8, 66–68.
Potischman N, Freudenheim JL (2003). Biomarkers of nutritional exposure and nutritional status: an overview. J Nutr 133, 873S–874S.
Prentice RL, Sugar E, Wang CY, Neuhouser M, Patterson R (2002). Research strategies and the use of nutrient biomarkers in studies of diet and chronic diseases. Publ Health Nutr 5, 977–984.
Price GM, Paul AA, Key FB, Harter AC, Cole TJ, Day KC et al. (1995). Measurement of diet in a large national survey: comparison of computerised and manual coding of records in household measures. J Hum Nutr 8, 417–428.
Puwastien P (2002). Issues in the development and use of food composition databases. Publ Health Nutr 5, 991–999.
Riboli E, Hunt K, Slimani N, Ferrari P, Norat T, Fahey M et al. (2002). EPIC study populations and data collection. Publ Health Nutr 5, 1113–1124.
Sabaté J, Jenab M, Norat T, Slimani N, Ferrari P, Mazuir M et al. (2006). Nut consumption and risk of death from coronary heart disease in the European Prospective Investigation into Cancer and Nutrition (EPIC). (in preparation).
Schakel SF, Buzzard IM, Gebhardt E (1997). Procedures for estimating nutrient values for food composition databases. J Food Comp Anal 10, 102–114.
Schlotke F (1996). Using Internet services to improve international food data exchange. (The Second International Food Database Conference). Food Chem 57, 137–143.
Schlotke F, Becker W, Ireland J, Moller A, Ovaskainen M, Monspart J et al. (2000). EUROFOODS Basic Recommendations for Food Composition Database Management and Data Interchange. COST report EUR 19538: European Commission.
Scrimshaw N (1997). INFOODS: The international network of food data systems. Am J Clin Nutr 66 (Suppl.), S1190–S1193.
Slimani N, Bingham SJ, Runswick S, Ferrari P, Day NE, Welch AA et al. (2003). Group level validation of protein intakes estimated by 24-hour diet recall and dietary questionnaires against 24-hour urinary nitrogen in the European Prospective Investigation into Cancer and Nutrition (EPIC) calibration study. Cancer Epidemiol Biomarkers Prev 12, 784–795.
Slimani N, Charrondière UR, van Staveren W, Riboli E (2000b). Standardisation of food composition databases for the European Prospective Investigation into Cancer and Nutrition (EPIC): General theoretical concept. J Food Comp Anal 13, 567–584.
Slimani N, Deharveng G, Charrondière RU, Van Kappel AL, Ocké MC, Welch A et al. (1999). Structure of the standardised computerized 24-hour diet recall interview used as reference method in the 22 centers participating in the EPIC project. Comput Methods Programs Biomed 58, 251–266.
Slimani N, Ferrari P, Ocké M, Welch A, Boeing H, van Liere M et al. (2000a). Standardisation of the 24-hour diet recall calibration method used in the European Prospective Investigation into Cancer and Nutrition (EPIC): general concepts and preliminary results. Eur J Clin Nutr 54, 900–917.
Slimani N, Kaaks R, Ferrari P, Casagrande C, Clavel-Chapelon F, Lotze G et al. (2002). EPIC calibration study: Rationale, design and population characteristics. Publ Health Nutr 5, 1125–1145.
Slimani N, Riboli E, Greenfield H (1995). Food composition data requirements for nutritional epidemiology of cancer and chronic diseases. In: H Greenfield (ed). Quality and Accessibility of Food-Related Data. AOAC International: Arlington, VA. pp 209–215.
Southgate DAT, van Staveren WA, Slimani N, Riboli E (eds) (2002). Food consumption, anthropometry and physical activity in the EPIC cohorts from 10 European countries – Food consumption data derived from the calibration study. Publ Health Nutr 5, 1111–1345.
Toniolo P, Boffetta P, Shuker DEG, Rothman N, Hulka B, Pearce N (eds) (1997). Application of Biomarkers in Cancer Epidemiology (IARC Sci. Publ. No 142). International Agency for Research on Cancer: Lyon.
Trichopoulou A (2004). Composition Tables of Foods and Greek dishes 3rd edn. ISBN 960-394-284-7. Parisianos Scientific Publications: Athens.
Truswell AS, Bateson D, Madafiglio KC, Pennington JAT, Rand WM, Klensin JC (1991). Committee report: INFOODS guidelines for describing foods: A systematic approach to describing foods to facilitate international exchange of food composition data. J Food Comp Anal 4, 18–38.
Unwin I, Becker W (1996). The component aspect identifier for compositional values. (The Second International Food Database Conference). Food Chem 57, 149–154.
Unwin ID, Becker W (2002). Software management of documented food composition data. J Food Comp Anal 15, 491–497.
USDA (2003). Table of Nutrient Retention Factors Release 5http://www.nal.usda.gov/fnic/foodcomp/Data/retn5/retn5_doc.pdf.
Weiss R (2001). Research and industry partnership in nutrient calculation software development. J Food Comp Anal 14, 253–261.
Welch AA, McTaggart A, Mulligan AA, Luben R, Walker N, Khaw KT et al. (2001). DINER (Data Into Nutrients for Epidemiological Research) – a new data-entry program for nutritional analysis in the EPIC-Norfolk cohort and the 7-day diary method. Publ Health Nutr 4, 1253–1265.
Vaask S, Pomerleau J, Pudule I, Grinberga D, Abaravicius A, Robertson A et al. (2004). Comparison of the Micro-Nutrica nutritional analysis program and the Russian food composition database using data from the Baltic nutrition surveys. Eur J Clin Nutr 58, 573–579.
van Gils CH, Peeters PHM, Bueno-de-Mesquita HB, Boshuizen HC, Lahmann PH, Clavel-Chapelon F et al. (2005). Consumption of vegetables and fruits and risk of breast cancer. J Am Med Assoc 293, 183–193.
The EPIC study was supported by grants from ‘Europe Against Cancer’ Programme of the European Commission (SANCO); Ligue contre le Cancer (France); Société 3M (France); Mutuelle Générale de l'Education Nationale; Institut National de la Santé et de la Recherche Médicale (INSERM); German Cancer Aid; German Cancer Research Center; German Federal Ministry of Education and Research; Danish Cancer Society; Health Research Fund (FIS) of the Spanish Ministry of Health; the participating regional governments and institutions of Spain; Cancer Research UK; Medical Research Council, UK; the Stroke Association, UK; British Heart Foundation; Department of Health, UK; Food Standards Agency, UK; the Wellcome Trust, UK; Greek Ministry of Health; Greek Ministry of Education; a fellowship honouring Vasilios and Nafsika Tricha (Greece); Italian Association for Research on Cancer; Dutch Ministry of Health, Welfare and Sports; Dutch Ministry of Health; Dutch Prevention Funds; LK Research Funds; Dutch ZON (Zorg Onderzoek Nederland); World Cancer Research Fund (WCRF); Swedish Cancer Society; Swedish Scientific Council; Regional Government of Skane, Sweden; Norwegian Foundation for Health and Rehabilitation. Catalan Institute of Oncology, Barcelona, Spain. Public Health Institute, Navarra. Spain Andalusian School of Public Health, Granada, Spain. Public Health Department of Gipuzkoa, Health Department of the Basque Country, Donostia-San Sebastian, Spain. Murcia Health Council, Murcia, Spain. Health and Health Services Council, Principality of Asturias, Spain
This study was also supported by contracts from the US NCI (N02-PC-25023) and the EC (Contract No SPC 2002332 for the ‘EPIC and EuroFIR NoE Contract No. 513944).
The Italian compilers and the ENDB network wish to thank INRAN-Rome (Istituto Nazionale di Ricerca per gli Alimenti e la Nutrizione, Dr Emilia Carnovale and Dr Luisa Marletta) and Prof Flaminio Fidanza for providing information about analytical methods applied to nutrient analyses of foods derived from their databases.
The British compilers and the ENDB network wish to thank Ms Wai Heen Lo for her contribution on imputing missing values (saturated, mono-, and poly-unsaturated fatty acids, vitamin C and vitamin E) in the EPIC-Norfolk NDB used to compile the UK data set in ENDB.
Guarantor: N Slimani.
Contributors: NS was the overall coordinator of the ENDB project and in charge of the preparation of the paper in collaboration with the other co-authors. GD, JV, GS, SS, MP, IU, DATS, NS were members of the ‘task force group’ involving specific managerial or technical tasks for the project and/or the preparation of reference ENDB guidelines. IU was also in charge of the development of the DBMS in collaboration with the coordinating centre. SS, MP, PG, AM, JI, WB, AF, SW, EV, JU, SC and AB were involved as the national compilers in charge of documenting, compiling and evaluating the subset of their national nutrient databases used in the ENDB project. AM, JI, WB and IU were also involved as members of the ‘ENDB expert group’ headed by DATS, in charge of revising the reference ENDB guidelines. MN, MCB-R, CS, AT, SN, IM, JR, HB, MO, PHMP, PJ, PA, DE, EL, MS de M, AT, KG, CS, SR, AW, SB were involved as local EPIC collaborators in the supervision and preparation of EPIC-specific databases relevant to the ENDB project (e.g. recipe files). CC and MvB, at the coordinating centre, were involved in tasks relevant to these EPIC databases. AFS has provided long-standing scientific collaboration and support for setting up the ENDB. ER is the overall coordinator of the EPIC study. All co-authors provided comments and suggestions on the manuscript.
About this article
Cite this article
Slimani, N., Deharveng, G., Unwin, I. et al. The EPIC nutrient database project (ENDB): a first attempt to standardize nutrient databases across the 10 European countries participating in the EPIC study. Eur J Clin Nutr 61, 1037–1056 (2007). https://doi.org/10.1038/sj.ejcn.1602679
- nutrient databases
- 24-h dietary recall
- food composition tables
Inflammatory potential of the diet and risk of breast cancer in the European Investigation into Cancer and Nutrition (EPIC) study
European Journal of Epidemiology (2021)
BMC Medicine (2021)
Development and validation of a lifestyle-based model for colorectal cancer risk prediction: the LiFeCRC score
BMC Medicine (2021)
Association between lifestyle, dietary, reproductive, and anthropometric factors and circulating 27-hydroxycholesterol in EPIC-Heidelberg
Cancer Causes & Control (2020)
Correlations between urinary concentrations and dietary intakes of flavonols in the European Prospective Investigation into Cancer and Nutrition (EPIC) study
European Journal of Nutrition (2020)