Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Tracing the use and source of racial terminology in representations of genetic research


Purpose: We examined the terminology used to describe populations in genetic research to understand how and when terminology is being used, changed, and framed.

Methods: We compiled 36 complete article sets, which included newspaper articles and corresponding press releases and journal articles. Population terminology was traced from peer-review article, to press release, to newspaper article to determine changes in language and frequency. A qualitative analysis was then conducted on a smaller sample of the article sets to shed further light on the use and source of population terminology in this context.

Results: Results indicated a wide variation in the frequency and terminology of population descriptor language used by genetic researchers and the media. The qualitative textual analysis highlighted differences in the use and the framing of population terminology between scientific literature and media representations.

Conclusions: This study demonstrates how difficult it can be to control terminology use, even within the reporting of a specific study. Further work needs to be done in this area with a focus on accuracy in defining research terms and research populations in both the scientific literature and the media representations of genetic research.


Debates over the use of race in genetic research continue to gain momentum,1,2 including a consideration of the nature and impact of media reporting on research findings.3 There is concern that the use of racial terminology in the context of genetic research will have a number of adverse implications, including legitimizing an inaccurate biological view of race, increasing discriminatory attitudes, and giving rise to poor health outcomes.4 Popular representations of genetic research, such as press releases and newspaper articles, can give rise to issues associated with definitions that can hold different meanings dependent on geographic, social, historical, and personal context. Suggestions have been made to address these challenges, including encouraging greater precision in the use of terminology during the research and reporting process and increased collaboration between scientists and journalists.5

To date, despite the ongoing policy analysis and the publication of recommendations,5 it is not clear where racial and population terminologies emerge in the science communication process. This process begins with the scientific article, authored by scientists who have chosen specific populations, arguably, for specific reasons. The press release is the next step in the communication process, often written by nonscientists who have some knowledge of the research. Studies have shown that the press release is an important step in the science communication process because a large portion of news articles use press releases as a source of information, although some journalists still refer to the original source or scientist for more information.6,7 Next, the media reinterprets the research for a larger audience using information from either or both the peer-reviewed article and press release.

The relationship between these types of texts raises a number of question about the use of racial and population terminologies in genetic research. Are the same terms used in the press release and media reports? Are the media using the same terminology as the scientists, or are they distilling complex scientific categories into broad or simplified terms? How are racial and population terms being framed in these different publications? Although past research has shown that journalists tend to offer fairly accurate representations of genetic research,8,9 subtle changes in the use of language—such as the introduction of a new term to describe a population—could have significant implications. To delve deeper into the use and source of racial terminology in genetic research, we traced the language use across peer-reviewed articles, press releases, and news articles. In addition, we conducted an in-depth examination of how population descriptors, including “racial” terminology, are being used within the different types of articles. Understanding how racial terms and other population descriptors are used should help to inform science communication policy and efforts to ensure the consistent and accurate use of population terminology.1012


Data collection

An initial search was made for newspaper articles in the Factiva database and supplemented with the Google News database. The search included English-language articles from high-circulation newspapers in Canada (The Globe and Mail, The National Post, and The Toronto Star), the United Kingdom (The Times of London, The Daily Telegraph, and The Guardian), and the United States (The New York Times, The Los Angeles Times, USA Today, and The Washington Post). The search terms used were as follows: journal and (gene or genealogy or genetic) and (race or ancestry or Caucasian or Hispanic or black or African or Asian or white or population). Terms were chosen based on previous work that addresses the use of population descriptors, including terms often associated with “racial” categories, in genetic research.2,5,13 The goal was to capture newspaper articles that refer to the genetic characteristics of (or research involving) a particular population or group, and any term used to describe a particular research population was viewed as relevant. Our plan was to exclude all articles that only used population terminology in a manner that did not relate to research, such as describing the nationality of a researcher. However, no article needed to be excluded on this basis. The search was limited to articles that were published between January 2003 and May 2009. This search yielded 1749 newspaper articles. Corresponding press releases and journal articles were then found for relevant newspaper articles. We kept only newspaper articles for which corresponding press releases and journal articles were located. The final corpus is 36 complete article sets.

The press releases came from varying sources including universities (20 press releases), research hospitals (five press releases), private genetic research companies (four press releases), medical agencies or organizations (three press releases), the Wellcome Trust (two press releases), and one press release came from a pharmaceutical company. Press releases ranged from 273 to 1697 words with an average of 863 words.

Quantitative analysis

All the articles in each article set were coded for the types and frequencies of population descriptor terminology that were used. The mean frequency of use for each term was calculated to account for word length. To trace the language use and change across the publication types, the most frequent terms in each publication were compared within article sets. Terms that were introduced in the press releases and newspaper articles were also traced, especially for terms that became the most frequently used within press releases and newspaper articles but did not appear in the peer-reviewed article (Table 1).

Table 1 Terminology traced across article types. A sample of the results of tracing racial terminology from journal article, to press release, to newspaper article (For complete data set results, see Supplemental Digital Content 1,

Qualitative analysis

A small sample of seven article sets was selected for qualitative analysis. These article sets included all the articles that were published during 2008 and 2009, which provided a snapshot within a relatively similar historical and social climate. Articles were coded by first using a preliminary list of categories and variables, but new categories were added during the analysis to reflect the patterns appearing in the texts. The qualitative analysis was first applied to the peer-reviewed journal articles and then systematically applied to the press releases and newspaper articles. As new categories and variables emerged at all stages of the analysis, all articles of each type were checked and rechecked to maintain consistency. After all the articles were coded, categories were combined where appropriate, from which themes were developed.


Quantitative results

In the entire corpus of articles, 27 different terms were found. (Table 2). Terminology shifts in frequency as the genetic research that is presented in peer-reviewed literature is reinterpreted in press releases and newspaper articles. For example, the term “population” appears most frequently in the peer-reviewed literature. However, “population” is the second most frequent term in the press releases and is only the fourth most frequent term in the newspaper articles. There seems to be a greater focus on terms such as “African” and “black” in the newspaper articles, especially in comparison with the peer-reviewed articles. The term “African” is the second most frequent term in the journal articles, and the term “black” is the eighth most frequent term; however, in the newspaper articles, these terms are first and second most frequent terms, respectively.

Table 2 Mean frequency of racial terminology in each article type in descending order

By tracing the terminology from journal article to press release to newspaper article, patterns showed that new terms were introduced, terms omitted, and that some terms were used with varying frequency within article sets (Table 1). Only 7 (19.44%) of the 36 article sets maintained the same term as the most frequently used term throughout the types of articles. Overall, 61.11% of the article sets had new terminology that was introduced in either or both the press release and newspaper article. Of the article sets that introduced new terminology in the press release or news article, 40.91% of the articles introduced new terms that became the most frequent term (or 27.78% of all the article sets). It is also interesting to note that 27.78% of the article sets introduced new terms in the press release that were not used in the newspaper article (52.94% of terms that were introduced in the press release are also used in the newspaper article). Of that 27.78%, three press releases introduced terms that were the most frequent term in that press release, but these terms were not used in the newspaper article (i.e., only 17.65% of all press releases that introduced new terms). Only one press release introduced a term that became the dominant term in a newspaper article.

Qualitative results

The qualitative analysis yielded three themes or categories that help explain the differences in the frequency and use of population descriptors, including “racial” terminology, across the different types of publications. They include intertextuality, term definitions across publication types, and exclusion of terms.


Intertextuality refers to the way in which texts refer to or relate to each other. For the purposes of this study, it will specifically mean the use of references and citations of other texts or bodies of knowledge. In scientific and academic writing, the use of citations and references is a common practice that helps to define and establish ideas and communities of practice.14,15 The peer-reviewed articles in our article sets are no exception to this practice. Citations and in-text references (e.g., HapMap or Human Genome Diversity Project) were often used to support or associated with the use of a particular population descriptor, categories, and/or methodology in genetic research. Population terminology, including “racial terminology,” in the peer-reviewed articles from our data is used to describe the research populations and often treated as a scientific concept that needs to be supported by previous research as part of a research methodology. How well scientists justify their choices is up for debate (and out of scope of this study); indeed, there is at least some evidence that, from a scientific perspective, this is often done poorly.16,17

All journal article authors provided at least one reference, with one article providing 25 references or citations, to another study or organization such as HapMap or Human Genome Diversity Project that outlined how population descriptors were determined and used. When this variable is applied to press releases and newspaper articles, population terminology is not accompanied by citations and references and terms become open to interpretation (Table 3).

Table 3 Example of intertextuality and racial terminology in different types of articles

Defining scientific terms

It is common in popular representations of science, especially in newspaper articles, for specialized scientific terms and concepts to be interpreted for a wider and more varied audience.18,19 This is often done through, but not limited to, providing simplified explanations or providing alternative terms. An example from the data is “… those with this mutation of the APOC3 gene have higher levels of HDL-cholesterol, the so-called ‘good’ cholesterol, and lower levels of LDL-cholesterol, the ‘bad’ cholesterol” (emphasis added). In this example, the author provided more commonly known terms for the types of cholesterol to aid readers' understanding.

However, despite many attempts to explain complex scientific concepts to readers, in all the press releases and newspaper articles, there were no explicit explanations of the complexities of the population categories, such as terminology associated with race, within genetic research. Because population terminology was not treated as a concept requiring further clarification, readers are able to bring their own definitions to the words. Definitions for terms such as race, ancestry, ethnicity, and even specific population identifiers can vary widely according to geographical location, culture, education, and personal history.2022

Exclusion of terms

Many of the peer-reviewed articles mentioned multiple groups or populations to justify a focus on a particular population or for drawing comparisons. However, many of the press releases and newspaper articles would not mention all the groups or discuss only one particular group. For example, in article set 16, the peer-reviewed journal article clearly justifies a focus on an African American population and includes 11 different terms that appeared to represent three major ethnic groupings. However, the press release narrowed this to four terms representing two major ethnic groupings, and the newspaper article distills these categories into only “black” and “white.” This may be for brevity or simplicity, or it could also be a result of a “hype-space conflict,”3 which occurs when journalists must balance the need to “hype” a story to remain competitive against maintaining the journalistic values of accuracy and objectivity. Such exclusions have the potential adverse affect of singling out or emphasizing certain groups. In contrast, new terms may be introduced in the press release or newspaper article, which may introduce the concept of race into a study or which distills specific research populations into broad categories (Table 3).


As has been found in a variety of other studies, there is little standardization and consistency in the use of population terminology in peer-reviewed literature, especially in the context of the concept of “race.”16,17,23 Our study found that this is true not only between articles but also with respect to the communication of a specific study (e.g., from the peer-review article, through the press release, and into the newspaper article). In addition to the inconsistencies in terminology being used, it seems that a considerable amount of new terminology (more than half of the article sets) is being introduced after the peer-reviewed articles, especially in the newspaper articles (Table 3). It is unclear whether journalists are using new terms for the purposes of brevity and readers' understanding or whether they are (knowingly or unknowingly) “hyping” racial categories in genetic research (i.e., selecting terms that will make the article sound more interesting and newsworthy).

At the stage of peer-reviewed literature, a methodological justification for studying particular groups, and for using the associated terms, is often provided, including the use of references (although how well scientists justify their choices is debatable16,22). However, when the study is discussed in press releases and newspaper articles, only terms and not justification for the terms appear. So, although terminology may be consistent from scientist to journalist, the context for a term can change. This may change how the term is perceived by the reader (especially when mixed with readers' interpretations).4,24 Although many specialized scientific terms are explained and simplified in the media, population terminology is not treated as specialized, and, as such, readers are able to bring their own definitions to the words, which can vary widely according to geographical location, culture, personal history, and education.2022

Our study has clear limitations, the sample size is relatively small (a result of seeking complete sets that include peer-reviewed article, press release, and media report). Also, we only looked at newspaper articles. Other media sources, such as television, magazines, and the internet, are obviously important sources of information about genetic research. Despite such limitations, the data provide a picture that is both consistent with previous work regarding the lack of consistency17,2527 and provides some insight into the source and nature of that inconsistency. In addition, it raises some interesting questions that are worth further consideration. This study, which was solely a text analysis, highlights the need for more research on the significance, if any, in the shift in terminology and how individuals interpret and react to the various terms. For example, even if new terms are introduced in the press releases or newspaper articles, but do not become the most frequently used terminology, certain terms may hold more “weight” for readers. For example, is “race” a more loaded term than “ancestry”? How do individuals interpret these terms? (e.g., do the public view terms such as “ancestry” as proxies for old racial terminology?). It may be that simply avoiding sociocultural categories for populations in genetic research will not exempt terms from still being interpreted within varying social climates as “racial categories” (indeed, this may be why we are seeing slippage between the peer-reviewed literature and the media reports).13,28

In sum, we found a lack of consistency in the use of terminology and that some terminology is introduced or omitted in the different interpretations of genetic research. As terms are presented in a peer-reviewed article and then interpreted by the media, certain groups become emphasized. Given the long-standing concerns about the use of population terminology in the context of genetic research, further work in this area seems essential. This study demonstrates how difficult it can be to control terminology use, even within the reporting of a specific study. If consistency and precision in the use of population terminology is the goal—as suggested by a variety of commentators—then more robust efforts are needed to ensure the use of scientifically justified, and clinically relevant, descriptors. This should include, at a minimum, a more collaborative effort of all involved in the science communication process to ensure the accurate and consistent translation of research results.29


  1. 1

    Malinowski MJ . Dealing with the realities of race and ethnicity: a bioethics-centered argument in favor of race-based genetics research. Hous L Rev 2009; 45: 1415–1473.

    Google Scholar 

  2. 2

    Koenig BA, Lee SS, Richardson SS . Revisiting race in a genomic age. New Jersey, Rutgers University Press 2008;

  3. 3

    Lynch J, Condit CM . Genes and race in the news: a test of competing theories of news coverage. Am J Health Behav 2006; 30: 125–135.

    Article  Google Scholar 

  4. 4

    Lynch J, Bevan J, Achter P, Harris T, Condit CM . A preliminary study of how multiple exposures to messages about genetics impact on lay attitudes towards racial and genetic discrimination. New Genet Soc 2008; 27: 43–51.

    Article  Google Scholar 

  5. 5

    Caulfield T, Fullerton SM, Ali-Khan SE, et al. Race and ancestry in biomedical research: exploring the challenges. Genome Med 2009; 1: 8.1–8.8.

    Article  Google Scholar 

  6. 6

    Schwitzer G . How do US journalists cover treatments, tests, products, and procedures? An evaluation of 500 stories. PLoS Med 2008; 5: 700–704.

    Article  Google Scholar 

  7. 7

    Woloshin S, Schwartz LM, Casella SL, Kennedy AT, Larson RJ . Press releases by academic medical centers: not so academic?. Ann Intern Med 2009; 150: 613–619.

    Article  Google Scholar 

  8. 8

    Caulfield T, Harry S . Popular representations of race: the news coverage of BiDil. J Law Med Ethics 2008; 36: 485.

    Article  Google Scholar 

  9. 9

    Bubela TM, Caulfield T . Do the print media “hype” genetic research? A comparison of newspaper stories and peer-reviewed research papers. Can Med Assoc J 2004; 170: 1399–1407.

    Article  Google Scholar 

  10. 10

    Cho M . Racial and ethnic categories in biomedical research: there is no baby in the bathwater. J Law Med Ethics 2006; 34: 497–499.

    Article  Google Scholar 

  11. 11

    Kahn J . Genes, race and population: avoiding a collision of categories. Am J Public Health 2007; 96: 1965–1970.

    Article  Google Scholar 

  12. 12

    Lee SS . Racializing drug design: implications of pharmacogenomics for health disparities. Am J Public Health 2005; 95: 2133–2138.

    Article  Google Scholar 

  13. 13

    Fullwiley D . Race and genetics: attempts to define the relationship. BioSocieties 2007; 2: 221–237.

    Article  Google Scholar 

  14. 14

    Bazerman C, Intertextuality: how texts rely on other texts. Bazerman C, Prior P editors What writing does and how it does it: an introduction to analyzing texts and textual practices. Mahwah, NJ, Lawrence Erlbaum Associates 2004; 83–96.

    Google Scholar 

  15. 15

    Hyland K . Academic attribution: citation and the construction of disciplinary knowledge. Appl Linguist 1999; 20: 341–367.

    Article  Google Scholar 

  16. 16

    Ellison G, Smart A, Tutton A, Outram S, Ashcroft R . Racial categories in medicine: a failure of evidence-based practice?. PLOS Med 2007; 4: 1434–1436.

    Article  Google Scholar 

  17. 17

    Shanawani H, Dame L, Schwartz DA, Cook-Deegan R . Non-reporting and inconsistent reporting of race and ethnicity in articles that claim association among genotype, outcome and race or ethnicity. J Med Ethics 2006; 32: 724.

    CAS  Article  Google Scholar 

  18. 18

    Myers G . Lexical cohesion and specialized knowledge in science and popular science texts. Discourse Processes 1991; 14: 1–26.

    Article  Google Scholar 

  19. 19

    Condit CM . How geneticists can help reporters to get their story right. Nat Rev Genet 2007; 8: 815–820.

    CAS  Article  Google Scholar 

  20. 20

    Surratt HL, Inciardi JA . Unraveling the concept of race in Brazil: issues for the Rio de Janiero Cooperative Agreement site. J Psychoactive Drugs 1998; 30: 255–260.

    CAS  Article  Google Scholar 

  21. 21

    Eschbach K, Supple K, Snipp CM . Changes in racial identification and the educational attainment of American Indians. Demography 1998; 35: 35–43.

    CAS  Article  Google Scholar 

  22. 22

    Hitlin S, Scott JB, Elder GHJ . Racial self-categorization in adolescence: multiracial development and social pathways. Child Dev 2006; 77: 1298–1308.

    Article  Google Scholar 

  23. 23

    Outram S, Ellison G . Improving the use of race and ethnicity in genetic research: a survey of instructions to authors in genetics journals. Sci Ed 2006; 29: 78.

    Google Scholar 

  24. 24

    Condit CM, Parrott R, Bates BR, Bevan J, Achter PJ . Exploration of the impact of messages about genes and race on lay attitudes. Clin Genet 2004; 66: 402–408.

    CAS  Article  Google Scholar 

  25. 25

    Sankar P, Cho MK, Mountain J . Race and ethnicity in genetic research. Am J Med Genet 2007; 143: 961–970.

    Article  Google Scholar 

  26. 26

    Ma IW, Khan NA, Kang A, Zalunardo N, Palepu A . Systematic review identified suboptimal reporting and use of race/ethnicity in general medical journals. J Clin Epidemiol 2007; 60: 572–578.

    Article  Google Scholar 

  27. 27

    Hunt LM, Megyesi MS . The ambiguous meanings of the racial/ethnic categories routinely used in human genetics research. Soc Sci Med 2008; 66: 349–361.

    Article  Google Scholar 

  28. 28

    Sankar P, Cho MK . Genetics: toward a new vocabulary of human genetic variation. Science 2002; 298: 1337–1338.

    CAS  Article  Google Scholar 

  29. 29

    Lee S, Mountain J, Koenig B . The ethics of characterizing difference: guiding principles on using racial categories in human genetics. Genome Biol 2008; 9: 404.

    Article  Google Scholar 

Download references


This work was supported by Genome Alberta and The Alberta Heritage Foundation for Medical Research. The authors thank Robyn Hyde-Lay and Amy Zarzeczny for helpful feedback on the manuscript.

Author information



Corresponding author

Correspondence to Timothy Caulfield.

Additional information

Disclosure: The authors declare no conflict of interest.

Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal's Web site (

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Rachul, C., Ouellette, C. & Caulfield, T. Tracing the use and source of racial terminology in representations of genetic research. Genet Med 13, 314–319 (2011).

Download citation


  • genetics
  • race
  • ethnicity
  • media
  • peer reviewed journals
  • press releases

Further reading


Quick links