Debates over the use of race in genetic research continue to gain momentum,1,2 including a consideration of the nature and impact of media reporting on research findings.3 There is concern that the use of racial terminology in the context of genetic research will have a number of adverse implications, including legitimizing an inaccurate biological view of race, increasing discriminatory attitudes, and giving rise to poor health outcomes.4 Popular representations of genetic research, such as press releases and newspaper articles, can give rise to issues associated with definitions that can hold different meanings dependent on geographic, social, historical, and personal context. Suggestions have been made to address these challenges, including encouraging greater precision in the use of terminology during the research and reporting process and increased collaboration between scientists and journalists.5

To date, despite the ongoing policy analysis and the publication of recommendations,5 it is not clear where racial and population terminologies emerge in the science communication process. This process begins with the scientific article, authored by scientists who have chosen specific populations, arguably, for specific reasons. The press release is the next step in the communication process, often written by nonscientists who have some knowledge of the research. Studies have shown that the press release is an important step in the science communication process because a large portion of news articles use press releases as a source of information, although some journalists still refer to the original source or scientist for more information.6,7 Next, the media reinterprets the research for a larger audience using information from either or both the peer-reviewed article and press release.

The relationship between these types of texts raises a number of question about the use of racial and population terminologies in genetic research. Are the same terms used in the press release and media reports? Are the media using the same terminology as the scientists, or are they distilling complex scientific categories into broad or simplified terms? How are racial and population terms being framed in these different publications? Although past research has shown that journalists tend to offer fairly accurate representations of genetic research,8,9 subtle changes in the use of language—such as the introduction of a new term to describe a population—could have significant implications. To delve deeper into the use and source of racial terminology in genetic research, we traced the language use across peer-reviewed articles, press releases, and news articles. In addition, we conducted an in-depth examination of how population descriptors, including “racial” terminology, are being used within the different types of articles. Understanding how racial terms and other population descriptors are used should help to inform science communication policy and efforts to ensure the consistent and accurate use of population terminology.1012


Data collection

An initial search was made for newspaper articles in the Factiva database and supplemented with the Google News database. The search included English-language articles from high-circulation newspapers in Canada (The Globe and Mail, The National Post, and The Toronto Star), the United Kingdom (The Times of London, The Daily Telegraph, and The Guardian), and the United States (The New York Times, The Los Angeles Times, USA Today, and The Washington Post). The search terms used were as follows: journal and (gene or genealogy or genetic) and (race or ancestry or Caucasian or Hispanic or black or African or Asian or white or population). Terms were chosen based on previous work that addresses the use of population descriptors, including terms often associated with “racial” categories, in genetic research.2,5,13 The goal was to capture newspaper articles that refer to the genetic characteristics of (or research involving) a particular population or group, and any term used to describe a particular research population was viewed as relevant. Our plan was to exclude all articles that only used population terminology in a manner that did not relate to research, such as describing the nationality of a researcher. However, no article needed to be excluded on this basis. The search was limited to articles that were published between January 2003 and May 2009. This search yielded 1749 newspaper articles. Corresponding press releases and journal articles were then found for relevant newspaper articles. We kept only newspaper articles for which corresponding press releases and journal articles were located. The final corpus is 36 complete article sets.

The press releases came from varying sources including universities (20 press releases), research hospitals (five press releases), private genetic research companies (four press releases), medical agencies or organizations (three press releases), the Wellcome Trust (two press releases), and one press release came from a pharmaceutical company. Press releases ranged from 273 to 1697 words with an average of 863 words.

Quantitative analysis

All the articles in each article set were coded for the types and frequencies of population descriptor terminology that were used. The mean frequency of use for each term was calculated to account for word length. To trace the language use and change across the publication types, the most frequent terms in each publication were compared within article sets. Terms that were introduced in the press releases and newspaper articles were also traced, especially for terms that became the most frequently used within press releases and newspaper articles but did not appear in the peer-reviewed article (Table 1).

Table 1 Terminology traced across article types. A sample of the results of tracing racial terminology from journal article, to press release, to newspaper article (For complete data set results, see Supplemental Digital Content 1,

Qualitative analysis

A small sample of seven article sets was selected for qualitative analysis. These article sets included all the articles that were published during 2008 and 2009, which provided a snapshot within a relatively similar historical and social climate. Articles were coded by first using a preliminary list of categories and variables, but new categories were added during the analysis to reflect the patterns appearing in the texts. The qualitative analysis was first applied to the peer-reviewed journal articles and then systematically applied to the press releases and newspaper articles. As new categories and variables emerged at all stages of the analysis, all articles of each type were checked and rechecked to maintain consistency. After all the articles were coded, categories were combined where appropriate, from which themes were developed.


Quantitative results

In the entire corpus of articles, 27 different terms were found. (Table 2). Terminology shifts in frequency as the genetic research that is presented in peer-reviewed literature is reinterpreted in press releases and newspaper articles. For example, the term “population” appears most frequently in the peer-reviewed literature. However, “population” is the second most frequent term in the press releases and is only the fourth most frequent term in the newspaper articles. There seems to be a greater focus on terms such as “African” and “black” in the newspaper articles, especially in comparison with the peer-reviewed articles. The term “African” is the second most frequent term in the journal articles, and the term “black” is the eighth most frequent term; however, in the newspaper articles, these terms are first and second most frequent terms, respectively.

Table 2 Mean frequency of racial terminology in each article type in descending order

By tracing the terminology from journal article to press release to newspaper article, patterns showed that new terms were introduced, terms omitted, and that some terms were used with varying frequency within article sets (Table 1). Only 7 (19.44%) of the 36 article sets maintained the same term as the most frequently used term throughout the types of articles. Overall, 61.11% of the article sets had new terminology that was introduced in either or both the press release and newspaper article. Of the article sets that introduced new terminology in the press release or news article, 40.91% of the articles introduced new terms that became the most frequent term (or 27.78% of all the article sets). It is also interesting to note that 27.78% of the article sets introduced new terms in the press release that were not used in the newspaper article (52.94% of terms that were introduced in the press release are also used in the newspaper article). Of that 27.78%, three press releases introduced terms that were the most frequent term in that press release, but these terms were not used in the newspaper article (i.e., only 17.65% of all press releases that introduced new terms). Only one press release introduced a term that became the dominant term in a newspaper article.

Qualitative results

The qualitative analysis yielded three themes or categories that help explain the differences in the frequency and use of population descriptors, including “racial” terminology, across the different types of publications. They include intertextuality, term definitions across publication types, and exclusion of terms.


Intertextuality refers to the way in which texts refer to or relate to each other. For the purposes of this study, it will specifically mean the use of references and citations of other texts or bodies of knowledge. In scientific and academic writing, the use of citations and references is a common practice that helps to define and establish ideas and communities of practice.14,15 The peer-reviewed articles in our article sets are no exception to this practice. Citations and in-text references (e.g., HapMap or Human Genome Diversity Project) were often used to support or associated with the use of a particular population descriptor, categories, and/or methodology in genetic research. Population terminology, including “racial terminology,” in the peer-reviewed articles from our data is used to describe the research populations and often treated as a scientific concept that needs to be supported by previous research as part of a research methodology. How well scientists justify their choices is up for debate (and out of scope of this study); indeed, there is at least some evidence that, from a scientific perspective, this is often done poorly.16,17

All journal article authors provided at least one reference, with one article providing 25 references or citations, to another study or organization such as HapMap or Human Genome Diversity Project that outlined how population descriptors were determined and used. When this variable is applied to press releases and newspaper articles, population terminology is not accompanied by citations and references and terms become open to interpretation (Table 3).

Table 3 Example of intertextuality and racial terminology in different types of articles

Defining scientific terms

It is common in popular representations of science, especially in newspaper articles, for specialized scientific terms and concepts to be interpreted for a wider and more varied audience.18,19 This is often done through, but not limited to, providing simplified explanations or providing alternative terms. An example from the data is “… those with this mutation of the APOC3 gene have higher levels of HDL-cholesterol, the so-called ‘good’ cholesterol, and lower levels of LDL-cholesterol, the ‘bad’ cholesterol” (emphasis added). In this example, the author provided more commonly known terms for the types of cholesterol to aid readers' understanding.

However, despite many attempts to explain complex scientific concepts to readers, in all the press releases and newspaper articles, there were no explicit explanations of the complexities of the population categories, such as terminology associated with race, within genetic research. Because population terminology was not treated as a concept requiring further clarification, readers are able to bring their own definitions to the words. Definitions for terms such as race, ancestry, ethnicity, and even specific population identifiers can vary widely according to geographical location, culture, education, and personal history.2022

Exclusion of terms

Many of the peer-reviewed articles mentioned multiple groups or populations to justify a focus on a particular population or for drawing comparisons. However, many of the press releases and newspaper articles would not mention all the groups or discuss only one particular group. For example, in article set 16, the peer-reviewed journal article clearly justifies a focus on an African American population and includes 11 different terms that appeared to represent three major ethnic groupings. However, the press release narrowed this to four terms representing two major ethnic groupings, and the newspaper article distills these categories into only “black” and “white.” This may be for brevity or simplicity, or it could also be a result of a “hype-space conflict,”3 which occurs when journalists must balance the need to “hype” a story to remain competitive against maintaining the journalistic values of accuracy and objectivity. Such exclusions have the potential adverse affect of singling out or emphasizing certain groups. In contrast, new terms may be introduced in the press release or newspaper article, which may introduce the concept of race into a study or which distills specific research populations into broad categories (Table 3).


As has been found in a variety of other studies, there is little standardization and consistency in the use of population terminology in peer-reviewed literature, especially in the context of the concept of “race.”16,17,23 Our study found that this is true not only between articles but also with respect to the communication of a specific study (e.g., from the peer-review article, through the press release, and into the newspaper article). In addition to the inconsistencies in terminology being used, it seems that a considerable amount of new terminology (more than half of the article sets) is being introduced after the peer-reviewed articles, especially in the newspaper articles (Table 3). It is unclear whether journalists are using new terms for the purposes of brevity and readers' understanding or whether they are (knowingly or unknowingly) “hyping” racial categories in genetic research (i.e., selecting terms that will make the article sound more interesting and newsworthy).

At the stage of peer-reviewed literature, a methodological justification for studying particular groups, and for using the associated terms, is often provided, including the use of references (although how well scientists justify their choices is debatable16,22). However, when the study is discussed in press releases and newspaper articles, only terms and not justification for the terms appear. So, although terminology may be consistent from scientist to journalist, the context for a term can change. This may change how the term is perceived by the reader (especially when mixed with readers' interpretations).4,24 Although many specialized scientific terms are explained and simplified in the media, population terminology is not treated as specialized, and, as such, readers are able to bring their own definitions to the words, which can vary widely according to geographical location, culture, personal history, and education.2022

Our study has clear limitations, the sample size is relatively small (a result of seeking complete sets that include peer-reviewed article, press release, and media report). Also, we only looked at newspaper articles. Other media sources, such as television, magazines, and the internet, are obviously important sources of information about genetic research. Despite such limitations, the data provide a picture that is both consistent with previous work regarding the lack of consistency17,2527 and provides some insight into the source and nature of that inconsistency. In addition, it raises some interesting questions that are worth further consideration. This study, which was solely a text analysis, highlights the need for more research on the significance, if any, in the shift in terminology and how individuals interpret and react to the various terms. For example, even if new terms are introduced in the press releases or newspaper articles, but do not become the most frequently used terminology, certain terms may hold more “weight” for readers. For example, is “race” a more loaded term than “ancestry”? How do individuals interpret these terms? (e.g., do the public view terms such as “ancestry” as proxies for old racial terminology?). It may be that simply avoiding sociocultural categories for populations in genetic research will not exempt terms from still being interpreted within varying social climates as “racial categories” (indeed, this may be why we are seeing slippage between the peer-reviewed literature and the media reports).13,28

In sum, we found a lack of consistency in the use of terminology and that some terminology is introduced or omitted in the different interpretations of genetic research. As terms are presented in a peer-reviewed article and then interpreted by the media, certain groups become emphasized. Given the long-standing concerns about the use of population terminology in the context of genetic research, further work in this area seems essential. This study demonstrates how difficult it can be to control terminology use, even within the reporting of a specific study. If consistency and precision in the use of population terminology is the goal—as suggested by a variety of commentators—then more robust efforts are needed to ensure the use of scientifically justified, and clinically relevant, descriptors. This should include, at a minimum, a more collaborative effort of all involved in the science communication process to ensure the accurate and consistent translation of research results.29