Introduction

Participating in decisions about health care is impossible without adequate information, and yet poor quality information is repeatedly described across a range of health topics.1 Recently, this has become a concern with the delivery of genetic services.2, 3 As more mutations are identified, and the availability and relevance of genetic tests to clinical practice increases, the public will rely on diverse clinical services and mass media sources for information about the use and consequences of genetic technology.4 Criteria to assess the quality of information will provide clinicians with a mechanism for involving patients in decisions about genetic testing,5 make explicit the gaps in available information, and will help the public use available resources.

Robust methods for appraising and integrating evidence into clinical decision making are widely available,6 whereas methods for appraising information produced for the public are still being developed.7 Quality criteria or rating schemes exist, but tend to focus on general aspects of quality,8, 9 and have been produced through the consensus of experts or feedback from patients or the public rather than using an empirical approach to test for reliability or validity.10 The DISCERN criteria for appraising information on treatment have good levels of inter-rater agreement and validity, and provide a framework for assessing the evidence base of lay health information.11, 12 The criteria are widely used as a benchmark to appraise13, 14, 15, 16, 17 and guide the production of lay health information on treatment,18, 19 have been used to train health professionals in appraisal skills in a variety of settings,20, 21, 22 and have been translated into five languages. The need for high-quality information that deals with the complex issues raised by genetic testing will increase as genetic knowledge continues to evolve. We followed the DISCERN methodology (described below) to develop criteria to assess the quality of information produced for the public on genetic screening and testing.

Materials and methods

We recruited providers and producers of genetic information, and lay people with and without experience of a genetic condition (see Box 1) to appraise a sample of information on genetic screening and testing.

Box 1 Appraisers

We collected information on the following genetic conditions: cystic fibrosis, Down's syndrome, familial breast cancer, familial colon cancer, haemochromatosis, Huntington's disease, sickle cell disease, and thalassaemia. The conditions were selected to include different populations, different disease pathways, and treatment decisions (see Flow Chart).

Sources of information

During 2003/2004, we collected information in English in a variety of forms (written, online, CD, audio, and video) from voluntary organisations, charities, commercial publishers, professional associations, individual health-care professionals, and NHS Trusts. Organisations were identified through professional associations (Sickle Cell and Thalassaemia Societies and Haemoglobinopathy Centres in the UK), the Genetic Interest Group {http://www.gig.org.uk/} for Clinical Genetic Centres and Voluntary Organisations, GeneWatch UK (for information on manufacturers of gene testing kits), registries and databases (Birth Choice UK for Midwifery Departments, the Popular Medical Index,23 COPAC (www.copac.ac.uk), and the internet for online booksellers (www.amazon.co.uk; http://bookshop.blackwell.co.uk/jsp/welcome.jsp www.thebookplace.com); videos {www.videosforpatients.co.uk/; http://www.emol.ac.uk; http://library.wellcome.ac.uk/}; newspapers (http://bubl.ac.uk/link/n/newspapers.htm); and support groups. In addition, meta search engines (http://www.surfwax.com; http://www.ixquick.com), Google, and health information portals were used to identify relevant material. Initially, we searched for information using the terms patient information/genetic testing/genetic screening combined with terms for the specific conditions; we then broadened our search by using terms for each of the conditions. We also contacted the BBC Information and Archives service, Channel 5, and the Digital Discovery Health Channel.

We obtained 431 items of information from these searches, 19 of these were duplicates, and only 118 were relevant, that is, they specifically described an aspect of genetic screening and testing related to one of the conditions. VG, PR, and SS reviewed the 118 items of information and selected 26 to represent each of the conditions in a variety of formats (one book, one book chapter, one video, 13 web pages, 10 leaflets), from different producers (public and commercial) and country of origin.

First appraisal

We sent copies of the 26 items of information to each of the appraisers to critique using their individual experience and expertise. Having completed the task, they were asked to list and explain the criteria they used; they had 6 weeks to complete the exercise. We (VG/PR/SS) independently sorted the criteria into common themes,24 which were turned into questions. Criteria related to each question were written as hints to help the user apply the questions. This was carried out iteratively until consensus was reached. The appraisers met to discuss the results of the initial analysis, the meeting was chaired by SO, audio-taped, and transcribed.

Testing the questionnaire

Following the meeting, the appraisers independently applied the resulting questionnaire to a new sample of 26 items of information about the same conditions (one book chapter, one interactive CD, two newspaper/magazine articles, 11 web pages, 11 leaflets). They had 6 weeks to complete the exercise. We analysed the data using a measure of inter-rater agreement (see Statistical analysis). The appraisers met again to discuss the results of the analysis and to re-draft the questionnaire for areas where there was poor agreement; as before, the meeting was chaired by SO. Questions were modified or excluded if they produced agreement scores below an acceptable level (k<0.40) (see Statistical analysis) or they represented overlapping themes.

Evaluation of the DISCERN_GENETICS questionnaire

Thirty participants who dealt with health information in a professional capacity, or were users of health information, applied the revised questionnaire to 12 items of information covering a wider range of conditions requiring genetic screening and testing. The inter-rater agreement was tested (see Statistical analysis).

Statistical analysis

We tested the reliability of the questionnaire at each phase by calculating agreement between raters for each DISCERN item using κ with quadratic weights, a chance corrected measure of agreement. Weighted κ is appropriate for the analysis of data in ordered categories, such as the five-point Likert scale used to rate each DISCERN item, because it does not treat all disagreements equally. Different weights are given to disagreements between raters according to the magnitude of the discrepancy. In the case of multiple raters, weighted κ is calculated by generating a κ score for each possible pair of raters for each item being rated. An overall κ score is then generated by calculating the average of these individual κ with an appropriate overall standard error. The cutoff point for an acceptable level of agreement with multiple raters was set at κ0.4.25

Sample size

A sample size of 390 rated articles (15 raters × 26 articles) was selected for the appraisers, to produce confidence intervals for weighted kappa with a width of less than 0.1.

Results

The first draft of the questionnaire had 26 questions related to the content of information and 10 to layout and design. Each question was followed by a hint or prompt question, which was taken verbatim from the criteria generated by the appraisers and represented specific aspects of each question.

First meeting of the appraisers

During the first meeting, the 26 questions were refined to 23, each rated on a five-point Likert scale (1=no, the criteria has not been filled, and 5=yes, the criteria has been filled). A question rating the overall quality of the publication was added to the end of the questionnaire, with the instruction that the rating of overall quality should be based on responses to the previous questions.11

Testing of the draft questionnaire

The level of agreement for the questions related to layout and design was poor (κ 0.11–0.24); eight of the content questions achieved κ scores >0.4, including the rating for overall quality (κ=0.44, 95% CI 0.41–0.46) (see Table 1).

Table 2 Summary of agreement from each testing of the DISCERN-Genetics questionnaire

Second meeting of the appraisers

During the second meeting, the questionnaire was re-drafted, incorporating the results of the analysis. Modifications consisted of rewording questions if the level of agreement was poor (<0.4) and merging some of the overlapping questions (‘uncertainty in testing’ was combined with ‘test accuracy’; and ‘informed decision making’ with ‘shared decision making’). The wording for the risk criterion was one area where it was difficult to reach agreement. Discussions explored the concepts behind a summary estimate of risk, and increased risk. The initial question asked if a summary of risk was explained, this was changed to ‘Is risk explained in simple terms’ as there was no agreement on the best way to present risk information. The appraisers recommended that no items were dropped, and a ‘not applicable’ box was added to a question about information on the local availability of services. Instructions to guide the user were made clearer, and it was agreed to draft a glossary of genetic terms to accompany the questionnaire. The re-drafted questionnaire consisted of 19 questions plus the overall quality rating. The appraisers strongly advocated that the 10 questions on layout and design were reduced to three questions covering readability, language, and style and structure. These questions were retested by seven of the appraisers and the level of agreement remained poor (κ=0.14; κ=0.26; κ=0.12). It was decided to drop these questions from the overall questionnaire.

Evaluation of the questionnaire

The results from the evaluation and from the earlier testing are presented in Table 1. The level of agreement improved across the majority of questions, with the overall quality rating increasing from 0.44 (95% CI 0.41–0.46) to 0.61 (95% CI 0.60–0.62). Eighteen of the 20 questions achieved an acceptable level of reliability, one of the questions falling below the threshold of 0.4 was dependent on the previous criteria (clear aims) being fulfilled, and the other (information about local services) was not always applicable. The final questionnaire and handbook will be available online at www.discern-genetics.org.

Discussion

The DISCERN-Genetics criteria provide the first standardised method to assess the quality of information for the public on genetic screening and testing. The criteria were developed from information covering a spectrum of genetic screening and testing situations to facilitate application to a wide range of conditions and settings, and were empirically tested by lay people, producers, and providers of health information. Genetic tests are available for all of the genetic conditions selected. For some of the conditions, such as haemochromatosis and cystic fibrosis, the tests are part of standard clinical practice, for others current policy and provision are being debated. A key concern with all of the conditions is the level of public knowledge in this rapidly evolving field.

We used qualitative methods to obtain the views of a wide range of users of information on genetic screening and testing, and quantitatively tested the reliability of the criteria. By including the views of users of genetic services, we were able to identify and address the complex issues faced by those considering whether to consent to a genetic test, and include aspects of evidence valued by end users. These included concerns about discrimination and privacy, how risks and benefits should be expressed, and variability in test performance. The initial lack of agreement on the wording for the risk criterion reflected variation in the interpretation of risk information, which is consistent with previous research.6, 26 Discussions explored the concepts behind a summary estimate of risk, and increased risk. Once these criteria have been made public, comparisons between different users of information on genetic screening and testing should be made.

The results of the quantitative analysis provided empirical evidence to guide discussion about which criteria should be dropped or changed. Testing for agreement between raters demonstrated that initially some of the criteria were not interpreted in the same way, and changes to the wording were required to remove ambiguity and improve the level of agreement. This is not unusual when measurements rely on some subjective assessment, hence the need for formal testing to avoid confusion and misinformed decision making. Even with concepts that are readily endorsed, such as the nature of the test or layout and design, the meaning of the concept can differ between users.27 This will not only affect the appraisal of information but also the content included in production. Interestingly, the appraisers strongly advocated the inclusion of criteria related to the presentation of information, and despite changes to the wording and format of these criteria, the level of agreement remained poor.

We were surprised, given the investment in genetic research, by the low volume of detailed information available to the public on genetic screening and testing. We searched multiple sources of information on genetic testing and found few articles on the wide range of selected topics, even in settings where some tests are compulsory. This confirms the findings of a recent UK survey reporting that information on newborn bloodspot screening is incomplete and biased,28 elsewhere it has been observed that information is nonexistent.29 Without a sound knowledge base, informed decisions are impossible, particularly in the context of unknown risks and benefits. Recommendations on how to develop information to help informed decision making in the area of population-based research involving genetics have been widely discussed, 5, 30 but do not address the range of information needs outside a research setting. DISCERN-Genetics will provide a mechanism for the assessment of high-quality information in this complex area by ensuring that patients and their families receive information about a genetic test in a consistent manner, irrespective of who is providing the information. The application of the criteria to existing information, with support from online training (www.discern-genetics.org), will help users readily identify gaps in information provision.