Introduction

Assessment is widely regarded as a vital component of training. Recently the interest in assessment within health professions has increased dramatically as evidence based and effective instruments have been developed for use within assessment systems designed to measure generic and clinical competence.2

In keeping with this trend and other quality assurance drivers,3 there has been a move to introduce mandatory assessment of dental vocational training with the aims of ensuring minimum levels of competence. A robust system of assessment will improve the quality of health care and will inform and empower trainees to derive maximum benefit from the training process.4 Whilst a relatively new science, the use of patients in the assessment of health care professionals is a means of obtaining feedback that benefits from the fact that judgements are made on each individual practitioner's performance in real practice.5

The Wanless report in describing projected long term patient and public expectations of the NHS, highlighted the need for a 'patient centred service'.6 The use of patients in the evaluation of specific aspects of individual practitioners' competence addresses this issue and also conforms with government initiatives to create an NHS not only designed for but involving the users of the service.7 While many studies have shown that patients have difficulty rating a practitioner's technical ability patients have a unique perspective which allows them to contribute to the assessment of inter-personal skills.8,9,10

A plethora of confounding factors have been reported as influencing or biasing the decisions of untrained observers such as patients and this has led to concerns about whether patient evaluations can be sufficiently robust to be of value in the assessment process.11 This study sought to determine whether a specifically designed patient questionnaire could prove an effective tool for evaluating the interpersonal skills of individual VDPs. The effectiveness would be judged against recognised criteria for a robust assessment and whether or not it could form part of a system of assessment developed to address all round competence.

Method

Questionnaire design

The item format and scale structure for the PAQ instrument were based on the American Board for Internal Medicine patient satisfaction questionnaire which was developed for the evaluation of physicians' interpersonal skills.12

The PAQ is divided into three sections: A, B and C. (Fig. 1) Section A consists of 13 items relating specifically to VDPs' communication skills and professionalism. Item content took into account recommendations and priorities of stakeholders in NHS dentistry including the public, the profession and the regulatory authority. Items map to training objectives covering the communication and the professionalism domains as described in the VT/GPT competency document.13 Patients are asked to rate their dentist's skills on a one to five scale corresponding to the descriptors excellent, very good, good, fair and poor. A sixth 'can't say' option is also included.

Figure 1
figure 1

Patient assessment questionnaire.

Section B consists of patient 'intention' questions representing known consequences of patient satisfaction also rated on a five-point scale. These items are included to enable an estimation of the validity of ratings from the PAQ. Section C includes questions relating to patient socio-demographic details including gender, age, education and income which have been identified as potential sources of systematic bias in recent surveys of dental patients.14,15 In order to protect patient anonymity and promote honest responses the PAQ does not seek identifying information from patients

The questionnaire was the subject of an initial small-scale pilot study with 22 new graduates in vocational training.16 Evaluation of the data from this pilot resulted in a second draft of the questionnaire piloted with 99 vocational dental practitioners from across Scotland.

Research design

Practice receptionists were asked to distribute 50 questionnaires to consecutive adult patients at each of two time points during the training year. In the event of low patient compliance, practices were sent additional questionnaires for distribution until a minimum of 20 questionnaires had been returned for each VDP for analysis. A similar study showed that a minimum of 20 responses is required to allow a sufficiently reliable interpretation of ratings received.12 Separate detailed guidelines were produced for receptionists and dentists participating in the pilot study. A reply-paid envelope and letter briefly describing the study were distributed with each questionnaire. Patients were given the option of completing the form on-site or in their own time.

Results were fed back to VDPs via their trainers after each data collection point in the form of number of PAQs returned, mean score per question and a histogram showing frequency of responses. Data from validity and patient demographic sections while not provided routinely to VDPs, was available on request.

Results

The results describe performance of the patient assessment questionnaire instrument against accepted criteria for a robust assessment tool namely reliability, validity and feasibility and VDP performance in the PAQ.

A total of 5767 PAQs were completed during the training year. This equates to a mean of 58 PAQs returned per dentist/year (across two time points). A target of 20 PAQ returns per dentist for each time point was set at the start of the pilot. Ninety eight percent of VDPs reached this target for at least one data collection point. Seventy nine percent of VDPs reached the target at both collection points. Eighty seven VDPs reached target for first data collection point and 88 reached target for the second. Our analyses have shown that we would require a minimum of 20 PAQs returned per dentist for reliable scores. The feasibility of obtaining such a response was very encouraging based on the returns from this pilot, especially so considering that the study was conducted on a voluntary basis. Although 50 PAQs were issued to each VDP, not all were distributed to patients thus precluding accurate calculation of percentage patient response rate.

Reliability

In order to minimise measurement error, the PAQ is designed such that scores for individual VDPs are based on the average ratings from a number of different patients. Reliability/generalisability coefficients are calculated as a means of estimating the error inherent in assessment scores. A coefficient of one denotes perfect reliability (an absence of measurement error) and zero denotes no reliability.17 The degree of acceptable error is dependent upon the purpose for which the assessment was designed. One purpose of the assessment system was to ensure identification of problem areas to allow prompt implementation of any necessary remedial training. It is generally accepted that robust assessment should confer a reliability coefficient of 0.5 or above if it is to be capable of identifying group outliers.

Reliability coefficients were calculated across all dentists for section A questions individually and collectively using estimates of variance components from a one way analysis of variance. These were then used to calculate generalisability coefficients as a function of the number of patients per dentist. Reproducibility of ratings is illustrated in Figure 2. The mean reliability coefficient for individual items in section A based on 20 returns per subject is 0.6 (ranging from 0.53-0.65). This was estimated to increase to 0.82 (range: 0.77-0.86) should 60 PAQs be available per dentist. These results indicate that 20 PAQs per subject would be sufficient to identify VDPs requiring additional training for specific aspects of interpersonal skills. When ratings for individual items were totalled for each subject to give a cumulative PAQ score (section A) 20 questionnaires were sufficient to confer a reliability coefficient of 0.72 (Figure 3). Although not the purpose of our assessment, sufficient reliability to rank order subjects could be achieved (demonstrable by r coefficient of at least 0.8) by taking PAQ cumulative scores for interpersonal skills and ensuring minimum return of 60 PAQs/dentist. Standard error of measurement and 95% confidence intervals were calculated from variance estimates to allow an interpretation of reliability in terms of individual scores.

Figure 2
figure 2

Reproducibility of PAQ ratings on individual items based on 20 PAQs/subject.

Figure 3: Reproducibility of PAQ ratings as a function of number of patients/subject.
figure 3

(based on cumulative PAQ scores)

Internal consistency across all items in section A was high (Cronbach alpha of 0.95), perhaps reflecting the fact that all items relate to some aspect of same construct and raising the possibility of using a cumulative score for interpersonal skills across all items in section A. Individual items were positively correlated. Average inter-item correlation coefficient (Pearsons) was 0.63 across all of section A with individual correlations ranging from 0.43 to 0.79. The strongest correlation was between the two items that dealt specifically with 'humanistic' behaviour. A question relating to provision of information on cost of treatment showed weakest correlation with other items from section A possibly reflecting that this behaviour can be procedure specific.

Validity

While the use of mean scores from multiple observers results in more reliable ratings by 'averaging out' the random error component, it does not take into account the threat to validity of scores as presented by systematic error or bias.

The most commonly cited confounding factors likely to influence patients' assessments of primary health care are patient age, gender, ethnicity and socio-economic factors, although reports are often conflicting.14 To examine whether any of the differences found between dentists could be attributed to case mix, five demographic variables (age, gender, income, qualifications and ethnic origin) were tested for association with each item in section A using Pearsons correlation which describes degree of correlation on a scale of −1 to +1, with 0 indicating no correlation and 1/−1 representing a perfect positive or negative correlation respectively. The correlations were generally very low, with only one being greater than 0.1. This amount of bias in scores from case mix differences is so small that we conclude that patient demographic variations do not need to be taken into account in comparing dentists' mean scores.

Validity of ratings was investigated further through analysis of association between patient ratings for specific communication and interpersonal skills (section A) and response to patient intended behaviour questions (section B). This approach to investigating the validity of our assessment tool is based on the premise that should the PAQ instrument be truly measuring interpersonal skills as intended the resultant scores should correlate positively with those from related scales.17 Many studies have shown that a health practitioner's interpersonal skills play a role along with other factors in influencing aspects of patient behaviour including recommendations, care seeking and provider switching. These patient behaviours provided the basis for the two questions comprising section B of the PAQ.

Items from the two sections showed on average a moderate positive correlation (0.463) with a range of 0.333 to 0.528 depending on the pair of questions being analysed. Ratings from section A items showed a stronger correlation with those from question B1 — 'would you recommend this dentist...?' than question B2 'would you ask to see this dentist again?' Comparing section A questions, ratings from the item relating to provision of cost for treatment showed the weakest correlation with those in section B whereas items regarding (1) dentists' demonstration of sensitivity and understanding towards patient and (2) ability to inspire trust and confidence have the strongest correlations.

Feasibility

While an assessment tool can be used to generate scores of sufficient validity and reliability to be fit for purpose it must also be feasible, in so much as it must be relatively easy to implement and acceptable to those for whom it was designed. Any assessment can generate unease in those at the receiving end and patient assessment can be viewed as a particularly daunting prospect to the new graduate. The PAQ was designed to give patients an opportunity to contribute to the health care process, provide VDPs and their trainers with valuable feedback and be considerate of trainer, VDP and administrator time.

All participants in the pilot were sent a survey questionnaire to investigate acceptability and educational value of the PAQ in vocational training. Eighty three percent of those responsible for delivering and monitoring training (trainers, advisers, directors) and 75% of VDPs involved in pilot studies responded.

Support for the PAQ among dental professionals was generally very high and it was encouraging to learn that the vast majority (73%) of trainers and VDPs agreed with the principle of using patients in the assessment of communication skills and professionalism. Seventy two percent of VDP respondents found results and feedback useful, 73% reported a heightened awareness of how patients perceived them and 76% stated that they would try to change the manner in which they practice if given negative feedback. From the respondents involved in training delivery 74% reported that the PAQ was easily implemented within their practice.

VDP performance

PAQ results were analysed to determine mean score and range achieved by VDPs as a group. Only scores based on a minimum return of 20 PAQs were used in this analysis meaning that two VDPs' scores were excluded (n=97). On the whole patients rated the VDPs highly. Cumulative scores for section A ranged from 46.15 to 63.25 out of a maximum possible score of 65 (mean 56.81). For individual items scores ranged from 2.44 to 5 out of a possible 5 (mean score 4.39). Standard error of measurement for individual questions ranged from 0.14-0.26 (mean 0.17). Using these estimates the 95% confidence intervals around a hypothetical mean score of 3.5 would be +/− 0.34 (or put another way the true score would fall between 3.16 and 3.84 95% of the time).

Discussion

The purpose of this study was to design a patient questionnaire for evaluating the interpersonal skills of individual VDPs, which would be judged against recognised criteria for a robust assessment.

Refinement of the PAQ was considered with regard to question content. A high level of internal consistency often raises the possibility of decreasing the total number of questions in a questionnaire. In this case the most obvious course of action would be to combine the two most strongly correlated questions. The benefit of such a change would be to enhance feasibility by increasing patient compliance. However, it is likely that arrangement of questions in section A will remain unchanged as patient compliance for the pilot study was more than adequate and as each question effectively represents a separate observation or sample of performance a decrease in the number of items would negatively affect reliability of total PAQ score.17

Whilst the results of these investigations provide some of the necessary evidence for the validity of the PAQ, evidence of construct validity would no doubt prove useful in strengthening the case. One approach to this, used in a similar study, was to have a group of patients rate a number of simulated dentist-patient encounters, each of which feature a different level of interpersonal skill or combination of skills.12 Further evidence could be generated through triangulation of data from the PAQ with that from other instruments designed to measure similar traits.

Adaptation of the PAQ for use within the hospital setting is currently underway. It is possible that the PAQ could also be adapted for use in assessing interpersonal skills in other dental arenas or disciplines eg as part of a revalidation process. However, pilot studies and rigorous analysis aimed at evaluating instrument suitability and effectiveness in the new setting would undoubtedly be required before implementation.

With regard to VDP performance, analysis of average scores for each question taking the cohort of 99 VDPs as a whole revealed that one question in particular was rated significantly lower (P=0.001) than other items. This question related to the provision of information on cost of treatment, an issue for NHS and private dental services which has received a great deal of attention in recent years.18

All indications to date are that the PAQ is a useful assessment tool, which is fit for its intended purpose as part of an assessment system to address all round competence of VDPs. It is increasingly recognised that as all assessment tools have their strengths and a combination of different methods should be used especially when addressing such complicated constructs as all round competence of our health professionals.2 In view of this, final evaluation of the PAQ is to be undertaken in context of entire assessment system.