Racially diverse cohorts are critical to ensuring that findings from exome or genome sequencing (EGS) research have the broadest possible medical, social, and behavioral applicability. However, minorities have historically been underrepresented in medical1 and genetics research2,3 for a variety of reasons, including low recruitment,1,4 concerns about trustworthiness of researchers and their sponsors,1,5,6 a desire for personal results from research,5,7 and prioritization of their privacy.5,8 This disparity limits scientific discovery, and makes the promise of genomic medicine less likely to apply to diverse populations.9 Several EGS studies are funded that will enroll diverse populations10 and the All of Us Research Program intends to recruit a study population that reflects the racial diversity of the United States.11 These endeavors face the dual challenges of recruiting for EGS research, which is novel due to its wide scope and potential to generate uncertain and personally identifying results, and enrolling historically underrepresented participants.

This study recruited individuals of African descent, who are members of one of the largest minority communities in the country and may have unique perspectives on genetics research given the troubling history of such research and services in African-American communities.12 Two studies of African Americans’ willingness to participate in hypothetical EGS research found that most individuals were willing to enroll.6,7 However, Halbert et al. found that when study details were provided to approximate the study design for the All of Us Research Program, only about 30% of African Americans surveyed intended to participate.5 They found that distrust was significantly associated with lower likelihood of anticipated participation and suggested that reading specific details about how biospecimens might be shared could be prompting concern about privacy, which is a known barrier to participation. Future studies should address concerns about privacy and other known barriers to maximize participation in genomics research amongst individuals of African descent. Furthermore, we should interpret intended participation rates from hypothetical studies carefully given that differences in study design may influence these rates and that they may not correlate with actual behavior. Understanding the characteristics of African-descended participants in existing EGS research may inform effective recruitment efforts. Several studies enrolling primarily non-Hispanic White participants, such as the HealthSeq,13 MedSeqTM,14 NextGen,15 and CanSeq16 projects and the Coriell Personalized Medicine Collaborative (CPMC)®,17 have reported information about the characteristics of participants to improve recruitment strategies, the informed consent process, and return of result policies.

The ClinSeq® study pilots the use of exome sequencing and return of individual testing results with mostly healthy participants. The original cohort was 1001 participants who were mostly healthy and recruited with passive strategies (e.g., self-referral after viewing fliers or brochures) from 2007 to 2012 (ref. 18). The original cohort was predominantly comprised of White, not Hispanic or Latino individuals with at least a college education.19 Subsets of the original cohort have been characterized with regard to their knowledge,20 motivations for enrolling in the study and expectations of sequencing,21 and personality traits.19 We recently completed the targeted recruitment of a new ClinSeq® cohort of 467 individuals who self-identified as African, African American, or Afro-Caribbean. We used strategies that have been reported to improve the recruitment of African-descended participants to genetics research, such as developing targeted recruitment materials,22 hiring a recruiter with similar demographic characteristics as the target population,23 focusing on interactive recruitment,24 and offering individual results.5

The primary aim of this study was to describe the knowledge, motivations, expectations, and personality traits of the new cohort. The secondary aim was to compare these data with published data on the same attributes from the original cohort. Because both cohorts were recruited to the same study, differences in their characteristics may help identify factors relevant to the design of recruitment strategies, informed consent processes, and return of result policies in future EGS research.

Materials and methods

ClinSeq® eligibility criteria and recruitment

Participants were eligible for the new cohort of the ClinSeq® study (NCT00410241) if they self-identified as African American, African, or Afro-Caribbean; were 45–65 years old at the time of consent; had not smoked over the past year; lived in the Washington, DC area; and were not enrolled in another sequencing study that returned individual results. Other than racial identity, these eligibility criteria were the same for the original cohort. Informed by published examples of successful research recruitment, a full-time, experienced African-American outreach coordinator (S.E.) was hired to oversee the recruitment, eligibility screening, and retention of the new cohort. A variety of recruitment strategies were used at the outset of the study including posting fliers in local businesses, staffing tables at community events, and advertising on local radio stations. Over the course of the study, the coordinator increasingly used the most effective recruitment strategies, which were in-person recruitment at health fairs and church groups, and word-of-mouth referrals by enrolled participants. Interested individuals gave their contact information to the outreach coordinator, who answered questions and completed eligibility screening. If a potential participant was interested and eligible, the outreach coordinator completed a verbal consent over the telephone and scheduled his/her enrollment visit. During their enrollment visits, participants had clinical blood and urine testing, DNA collection for sequencing, an electrocardiogram, an echocardiogram, and a cardiac computed tomography (CT) scan. After completing their visit, they received results from their clinical testing (excluding exome sequencing) in a letter and $100. This study was approved by the National Human Genome Research Institute Institutional Review Board. All participants provided written informed consent for the parent study.

Survey recruitment

Participants consented to the new cohort before October 2014 were contacted up to three times via telephone or mail after consenting to the study, but before receiving genetic testing results, and asked to complete a survey either electronically or on paper. Each participant completed the survey once. Participants who consented during or after October 2014 completed the survey verbally during their enrollment visit and a trained staff member keyed their responses into an electronic platform.

Survey measures

The survey took approximately 50 minutes to complete and included measures of several social and behavioral constructs, many of which were assessed by a similar survey administered to the original cohort.19 The constructs analyzed in the current manuscript included:

  • Knowledge, which was assessed using an established 10-item measure with subscales about the benefits and limitations of sequencing.20 Participants rated each knowledge statement (e.g., “genome sequencing may find variants in people’s genes that they can pass on to their children”) on a 5-point scale (definitely no, probably no, uncertain, probably yes, or definitely yes). Correct responses rated as “definitely” were scored as 2, correct responses rated as “probably” were scored as 1, and all other responses were scored as 0, thus giving an opportunity to evaluate knowledge and certainty via responses to this scale. Responses were summed to create limitations (Cronbach’s α = 0.74) and benefits (Cronbach’s α = 0.76) subscale scores.

  • Motivations for joining the study, which were assessed using a single open-ended question, “What are your reasons for wanting to participate in this study?”21

  • Expectations of sequencing, which were assessed in two ways.21 First, participants responded to a multiple-choice question about “what testing for many genes can do” by checking all responses that applied, including (1) find a genetic risk for a disease that you do not have but could develop in the future; (2) find a genetic cause or contribution for a disease that you have; (3) give you a clean bill of health; (4) give you information not only about you, but also your relatives; (5) none of the above; or (6) don’t know. Second, they were asked, “What else, if anything, could be learned from testing many genes?”

  • Tolerance for uncertainty, which was measured using the modified tolerance for ambiguity (TFA) scale.25 The scale consisted of seven items (e.g., “Before any important task, I must know how long it will take”) rated on a scale from 1 (“Not at all characteristic of me”) to 5 (“Entirely characteristic of me”) and averaged (Cronbach’s α = 0.76).

  • Optimism, which was measured using three items (e.g., “In uncertain times, I usually expect the best”) from the optimism subscale of the Life Orientation Test,26 which were rated on a five-point scale (0 – strongly disagree, 1 – disagree, 2 – neutral, 3 – agree, or 4 – strongly agree) and summed (Cronbach’s α = 0.80).

  • Resilience, which was measured using a revised version of the Connor–Davidson Resilience Scale.27 The scale consisted of ten items (e.g., “I am able to adapt to change”) that were rated on a five-point scale (0 – never true, 1 – seldom true, 2 – sometimes true, 3 – often true, 4 – always true) and summed (Cronbach’s α = 0.89).

  • Big Five personality traits, which were measured using the Big Five Inventory.28 This scale included 44 items rated on a five-point scale (1 – disagree strongly, 2 – disagree a little, 3 – neither disagree nor agree, 4 – agree a little, 5 – agree strongly). Scores for five subscales measuring extraversion (Cronbach’s α = 0.78), agreeableness (Cronbach’s α = 0.72), conscientiousness (Cronbach’s α = 0.77), openness (Cronbach’s α = 0.76), and neuroticism (Cronbach’s α = 0.74) were calculated by averaging a participant’s responses to the relevant items.

Data analyses

Responses to the open-ended questions concerning motivations and expectations were analyzed qualitatively. The primary (ARH) and secondary (CLH) coders applied a codebook developed for analyzing responses to the same questions asked of the original cohort21 to responses from 80 participants in the new cohort to facilitate comparison between the cohorts. The coders also took an inductive approach to the data, searching for content that led to the development of four novel codes for the new cohort (e.g., understanding differences between and among racial groups and uncovering ancestry information) and revising the codebook. They then reconciled minor, semantic differences in their application of the codebook and had high levels of agreement for both the expectations (96%) and motivations (91%) data. The primary coder then applied the codebook to the remaining responses and conducted thematic analysis to identify common patterns in the responses.

Responses to the closed-ended item about expectations and the scales for knowledge and personality traits were analyzed using quantitative methods. Descriptive statistics were obtained on the frequency and distribution of responses. Chi-square (using and t (using statistics were calculated to compare survey respondents with decliners and the new with the original cohort.19,20,21 Cronbach’s α values were calculated for all scales (using


Recruitment and survey completion

Recruitment of the new cohort was conducted between February 2012 and October 2017 (Fig. 1). The outreach coordinator recorded contact information for 1058 potential participants, 924 of whom were eligible. Of the eligible participants who did not consent, 90% (400/444) passively declined without providing a reason; 8.6% (38/444) provided reasons for their decline, most commonly disinterest (n = 12) or concerns about insurance discrimination (n = 6); and 1.4% (6/444) were deemed ineligible at their enrollment visit. Only 1.4% of participants withdrew after consent (7/505) and 1.2% (6/505) were lost to follow-up four months after recruitment was completed. Most enrolled participants (390/467, 84%) completed the survey.

Fig. 1
figure 1

New cohort recruitment flow. Targeted recruitment process for enrolling 467 African, African-American, and Afro-Caribbean participants into the new cohort of the ClinSeq® study.

Sociodemographic characteristics

Most participants in the new cohort were female (74.7%), college graduates or beyond (64.2%), and not Hispanic or Latino (99.2%). The average age of participants at enrollment was 56.3 years. Survey respondents were more likely than nonrespondents to have an annual household income greater than $100,000 (χ2 = 10.85; p < 0.01). The new cohort had significantly more participants than the original cohort who were female (χ2 = 94.5; p < 0.01), had less than a college education (χ2 = 59.0; p < 0.01), had an annual household income less than $100,000 (χ2 = 175.9; p < 0.01) and did not have coronary artery disease (χ2 = 78.6; p < 0.01) (Table 1).

Table 1 Comparison of new cohort survey respondents with decliners and new cohort with original cohort


Participants in the new cohort had knowledge about the benefits and limitations of EGS that fell around the subscales’ midpoints (Table 2, s = 5.1 and range: 0–10 for both, SD = 2.2 and 2.8, respectively). Participants were most likely to correctly agree (either definitely or probably) that genome sequencing can identify heritable variants and risk-increasing variants. They were least likely to correctly respond to statements that sequencing can identify risk-decreasing variants and that genetic diseases can always be prevented or cured (Table S1). The knowledge subscale scores are lower than those reported for the original cohort ( = 7.5 and 7.7, respectively; SD not reported for either).20

Table 2 Knowledge scores


Most survey respondents (341/390, 87%) answered the question about their motivations for joining the study. The most common theme was learning information about personal health (50%, 171/341), which included general information about health (e.g., “I’d rather know than not know what could be uncovered about my health”) as well as information about personal health risks (e.g., “To find out what genes I may have that may cause diseases I am not yet aware of”). Many participants in the new cohort were motivated by reasons related to their families (33%, 111/341), such as learning about genetic variants that explained a family history of disease (e.g., “Curious about genetic links to autoimmune diseases that run in the family”) or related to future health risks for their families (e.g., “get info [sic] to help my daughters and future grandchildren”). Fewer participants in the new cohort mentioned contributing to scientific discovery (57/341, 17%) or helping others (11%). Finally, 10% of participants cited contributing to knowledge about population genetics as a motivation. Several participants were motivated to offset the underrepresentation of minority populations in previous research studies (e.g., “contributing to being a part of something that my culture doesn’t normally participate in”) whereas others hinted at the precision medicine implications of their participation (e.g., “to better or enhance the knowledge of genetic traits in African Americans”). For additional exemplary quotes, see Table S2. By comparison, the most common motivation amongst original cohort participants21 was also learning personal health information. However, original cohort participants were less likely to cite motivations related to family members (13%, 42/313, χ2 = 33.3; p < 0.01) and more likely to cite altruistic motivations (44%, 141/313) than the new cohort. Contributing to knowledge about population genetics was not identified as a motivation amongst the original cohort.


In response to the multiple-choice question about expectations of sequencing (Table 3), the majority agreed that it could find a genetic risk for a disease they do not have, but could develop (90%), give them information about not only themselves but also their relatives (81%), or find a genetic cause or contribution for a disease that they have (82%). Thus, most participants in the new cohort had realistic expectations of sequencing, which is similar to the original cohort. However, 29.5% of participants in the new cohort indicated that sequencing could offer them a clean bill of health, whereas only 8.1% of participants in the original cohort did so (χ2 = 50.4; p < 0.01) (ref. 21).

Table 3 Expectations of sequencing

Fifty-four percent of the new cohort participants (211/390) responded to the open-ended question about what else they expected of sequencing. The most common expectations were learning about future health problems (45/211, 21%) and general benefits to personal health (42/211, 20%, e.g., “it will help me in my health”). Participants also expected their results to have implications for their family members (38/211, 18%, e.g., “things that I’m going through—will my family members go through the same?”) and to contribute to science (31/211, 15%, e.g., “Doctors will have much more interpretive [sic] data to work with for the overall health of patient”). Notably, 11% of participants expected that their results would contribute to knowledge about population genetics (24/211) by enhancing knowledge about similarities within and between groups (e.g., “find commonalities between groups of people”) and improving the capacity for precision medicine (e.g., “give more information about African Americans [sic] risk assessment for medical/mental conditions”). By contrast, the most common themes amongst the responses from the original cohort to this question were that it could lead to a better understanding of diseases (39%), advancement of precision medicine (28%), or comprehension of genetic mechanisms (23%) (ref .21).

Personality traits

Participants in the new cohort were highly optimistic, resilient, agreeable, conscientious, and open. They had moderate tolerance for uncertainty and extraversion and low levels of neuroticism (Table 4). The new cohort participants had significantly higher levels of optimism (t = 8.7; p < 0.01), extraversion (t = 6.4; p < 0.01), agreeableness (t = 9.1; p < 0.01), and conscientiousness (t = 5.0; p < 0.01) than original cohort participants,19 and significantly lower levels of neuroticism (t = 4.1; p < 0.01, Table 4). The participants in both cohorts had the same levels of openness and tolerance for uncertainty (Table 4).

Table 4 Personality traits


This cohort demonstrates that participants of African descent can be recruited to EGS research. Existing literature and results from studies like this inform what researchers can do to address established barriers to recruitment of African-descended participants. For example, our results demonstrate that results with implications for family members’ health may be as motivating as information relevant to personal health. Offering results is one way that researchers can respect participants’ preferences and may be effective in addressing the barrier of mistrust in researchers. Researchers can build upon this value by surveying their target populations about their motivations and concerns and designing studies to address them. For example, our participants were motivated to inform population genetics research, which could be incorporated into our recruitment materials. Including participants’ perspectives in research design may be one way to maximize the potential to find willing participants amongst groups of individuals who may be challenging to recruit.

Participants’ knowledge about the benefits and limitations of sequencing fell near the subscales’ midpoints and was lower than that of the original cohort. This may be partially accounted for by the difference between the education levels of the cohorts.20 However, most participants in the new cohort have at least a college education, so we hypothesize that other factors contribute to the differences in sequencing knowledge scores. Because the original cohort was passively recruited, this may have led to self-selection of participants with a personal or professional interest in genomics. Perhaps using targeted recruitment for the new cohort made a broader range of individuals aware of the study and resulted in recruitment of participants with less knowledge about sequencing. Research has also shown that racial bias may alter the counseling techniques used29 or treatments recommended30 by providers. Racial bias may also affect the education or counseling provided during recruitment and consent, which could contribute to differences in knowledge between the cohorts.

Knowledge has implications for informed choice throughout research participation. An informed decision requires that an individual has sufficient knowledge to make a choice consistent with his or her values.31 Individuals with limited knowledge about genetics may not be aware of genomics research projects or what they offer and thus may enroll at a slower pace than their more knowledgeable peers. The knowledge scale used in this study also captured participants’ certainty about their answers, and it is possible that the lower scores in the new cohort reflect a lack of certainty when compared with the original cohort. This is consistent with the finding that individuals were more likely to respond “I don’t know” when asked risk perception questions if they had less education or did not self-identify as White.32 Individuals who are more certain of their knowledge are more likely to use it,33 so certainty in knowledge may be another barrier to participation. Although our study design does not allow us to investigate the basis of the difference in sequencing knowledge between the original and new ClinSeq® cohorts, our findings highlight the need for research on the correlates of sequencing knowledge in diverse cohorts.

Most participants recruited to this cohort described realistic motivations for participation in the study and expectations of sequencing. One of the most common motivations and expectations was learning information pertinent to their personal health, which was also true for the original cohort21 and other EGS cohorts.13,15 Thirty-three percent of new cohort participants were motivated by the potential relevance of their results to their family members, which is much higher than in the original cohort (13%) (ref. 21), but similar to what was reported in the HealthSeq project (31%, 11/35) (ref. 13). In a multiple-choice question asked by the CPMC®, 67% of potential participants rated information relevant to “health conditions for children or grandchildren” as either very or somewhat important.34 The high rate of endorsement when using a multiple-choice question suggests that participants may require prompting to consciously attend to this motivation. Alternatively, the high endorsement of information relevant to one’s family as a motivation in both the new ClinSeq® and prospective CPMC® cohorts may be attributed to the comparatively high proportion of female participants in both groups. Women are often disseminators of health information within families,35 which may lead them to readily consider the implications of health information for their relatives. Thus, the relevance of EGS results for family members may not be obvious to potential participants, and it may be necessary to list this as a potential benefit on recruitment materials or scripts.

Thirty percent of new cohort participants endorsed the unrealistic expectation that sequencing could give them a “clean bill of health,” which was substantially greater than the proportion in the original cohort who expected this.21 Studies are needed to better understand individuals’ motivations and to promote realistic expectations. Individually, high expectations may reflect dispositional optimism,36 which is a common trait of early adopters of technology. Understanding and promoting realistic expectations are key to ensuring satisfaction amongst study participants and may also influence how sequencing technology is accepted by broader communities. Early adopters play a key role in the diffusion of technology,37 but are unlikely to promote EGS research if their expectations are unmet. Aggregate data on expectations can also provide evidence about whether participants are making informed choices about enrollment and identify misconceptions that can be addressed by improving recruitment materials and informed consent processes.

Finally, participants in the new cohort had personality traits of early adopters of technology, including high levels of optimism, resilience, and openness to experiences, and were similar to the original cohort in this regard.19 This supports the notion that individuals self-select for genetic testing, such that people with traits or attitudes that are likely to help them cope well with their results, such as resilience, are more likely to seek testing.19,38 The trend toward self-selection in the original cohort was the same in the new cohort in spite of our targeted recruitment strategies. This suggests that targeted recruitment of individuals based on their race, ancestry, or perhaps even disease status will not necessarily result in enrollment of individuals with different traits than what has already been observed.

This study recruited participants to a specific protocol that involved spending one day at the National Institutes of Health (NIH), and the participants were well-educated compared with the general population, which limits the generalizability of our findings. Motivations and expectations were assessed using open-ended questions, so participants may not have mentioned certain factors because they did not think of them. Future studies should assess motivations and expectations quantitatively and directly measure informed choice.

Participants in the new ClinSeq® cohort are similar to other early adopters of EGS in several personality traits, which suggests that individuals who are optimistic, resilient, and open to experiences are more likely to enroll in research. The same characteristics are likely to help individuals cope effectively with their results. However, our data show that very high levels of genomics knowledge, as were present in the original cohort, are not a prerequisite for early adoption of sequencing technology, and that knowledge may be an important need to address in targeted recruitment processes. Future studies should determine whether recruitment and consent procedures are adequate to promote informed choices amongst individuals with less knowledge, and how to ensure realistic expectations of EGS. These goals are vital to the ongoing success of recruiting minority participants to EGS research.