Introduction

Many dentists in the UK enter Dental Core Training (DCT) after the one year Dental Foundation Training (DFT). DCT provides an opportunity to develop competencies to enhance their skills for general dental practice or to enter speciality training. DCT helps to consolidate skills acquired during undergraduate foundation training, and to develop and enhance new skills in specific areas of clinical practice across different environments. Entry to DCT is highly competitive, and until recently (2017), the selection process for DCT has varied by training region. Typically, however, shortlisting has comprised 'white space' application forms which have been scored against the person specification. Successful candidates have then been invited to a selection centre which assesses clinical and academic skills. There is increasing recognition that non-academic attributes are important to assess as part of the selection process.

A popular method of assessing non-academic attributes as part of selection is a situational judgement test (SJT), the use of which has been gaining popularity across a range of occupations in recent years.1,2 To the authors' knowledge, no previous study has explored the use of SJTs for entry into dental postgraduate training beyond foundation training. SJTs are used to assess individuals' reactions to a number of hypothetical job relevant scenarios, which reflect situations candidates are likely to encounter in a job role. SJTs are scored using pre-determined keys, developed in consultation with subject matter experts (SMEs).3 Since SJTs are machine-marked, they have the potential to be cost-effective and efficient as either a shortlisting tool, or as an element of selection.4 Machine marking also reduces concerns around inter-rater reliability inherent in other selection methods such as references or 'white space' application forms.

An SJT was successfully implemented for selection into DFT in the UK,5 which performed well psychometrically and received largely positive candidate feedback,5 despite concerns from some stakeholders.6 DFT is a one year period of training including experiential learning within general dental practice, compared to the more advanced level of training completed during DCT. As a result, the success of the DFT SJT suggested that an SJT may also be suitable as part of DCT selection, however, since there is a difference in experience level and context across DFT and DCT, it would not be appropriate to use the same SJT for both.

SJTs have been used extensively in medical selection, showing good evidence of predictive validity at the point of entry to medical school,7 foundation training8 and speciality training.9,10 The research evidence showing the utility of SJTs in other postgraduate healthcare settings prompted consideration of an SJT for DCT selection to assess non-academic attributes alongside the existing selection centres. With any new selection methodology, various psychometric and legal criteria must be satisfied including standardisation, reliability and fairness.5 This paper reports the development and evaluation of a pilot SJT for DCT and explores the psychometric properties, fairness in relation to demographic differences in test score, and face validity of the SJT for candidates, in order to ascertain its suitability for use in selection for DCT.

Specifically, we addressed the following research questions:

  • Are the psychometric properties of the SJT robust?

  • Does the SJT produce scoring differences based on demographic groups (gender and ethnicity)?

  • Do candidates react to the SJT positively and view it as fair – does the SJT have good face validity?

Method

Materials

Situational judgement test

The SJT targets four key professional attributes which were identified following a thorough job analysis of the role of a DCT in collaboration with SMEs; these are outlined in Table 1, with behavioural indicators listed for each.

Table 1 Professional attributes targeted by the SJT

SJT content was designed in line with best practice principles (see Figure 1 and Patterson et al.4). The SJT scenarios were set in the context of DCT, and were developed by trained psychologists in collaboration with SMEs, such as trainers. SMEs provided scenario material and initial scoring keys during telephone interviews, which were subsequently developed into full SJT scenarios and response options by psychologists. Scenarios were then reviewed in focus groups with SMEs, to ensure their appropriateness and relevance to the DCT role. Following this, a different group of SMEs answered all scenarios in test conditions to finalise scenario material and scoring keys before piloting.

Figure 1
figure 1

Design stages for SJTs

Two response formats were used, both of which require relative judgements to be made across response options:

  • Ranking response format: candidates are required to rank five possible responses in order of appropriateness

  • Multiple response format: candidates are required to select the three most appropriate response options from eight possible actions.

Response instructions were knowledge-based (candidates respond in terms of what they should do in the given situation), which have been shown to have a higher cognitive loading than behavioural tendency instructions (that is, what they would do),11 and are less susceptible to biases and faking.12,13

The final pilot SJT consisted of 35 scenarios (22 ranking response format with a total possible score of 20 points per scenario, and 13 multiple choice response format with a total possible score of 12 points per scenario) (Fig. 1).

Candidate feedback questionnaire

All candidates who completed the SJT were invited to respond to a previously validated feedback questionnaire based on procedural justice theory, presenting eight statements about their experience of completing the SJT. Candidates rated their level of agreement with each statement on a five-point Likert scale (1 = strongly disagree, 5 = strongly agree). We were interested in candidates' responses to individual statements such as: 'The content of the situational judgement test was clearly relevant to Dental Core Training' and 'The content of the situational judgement test appeared to be fair in order to determine the level of agreement for these concepts. As in previous research,13 this would enable the recruiters to better understand the candidates' perceptions of and reactions to the SJT. This, in turn, helps to facilitate any necessary interventions from the candidates' perspective.

Sample and procedure

Data collection took place during the 2015 selection process for DCT years 1, 2, 3 and academic positions in four Health Education England local teams (Yorkshire and the Humber, London, South West and West Midlands) and Scotland. All candidates at the five centres were invited to complete the SJT on the day of the selection in invigilated conditions, and they had 60 minutes to complete it. A total of 386 candidates completed the pilot SJT.

Ethics

All candidates included in the analysis signed a consent form indicating that they understood that their SJT scores and feedback forms would be used for research purposes only, that their participation was voluntary and would not impact their chances of being offered a training place.

Results

Candidates with a high number of missing data (>3 scenarios, as identified by analysis of the mean SJT scores by number of missing responses; N = 37) and those who did not complete consent forms (N = 8) were removed, leaving a total of 341 remaining participants for analysis (analyses were conducted to compare the data of candidates who were or were not included in the final sample; no significant differences were found). Descriptive and demographic data for candidates and the SJT are provided in Table 2.

Table 2 Descriptive and demographic data

Are the psychometric properties of the SJT robust?

An initial psychometric examination of the scenarios showed that three were performing badly and were therefore removed from the pilot analysis. Here we examined each scenario's partial correlation with the total SJT: this measures the extent to which the candidates' performance on that scenario correlates with their overall performance on the pilot SJT. We removed scenarios that had partial correlations of r <0.02. The remaining 32 scenarios had partial correlations ranging from r = 0.06–0.41; with 20 (62.50%) of these having partial correlations that would be considered within the 'good' range (r >0.17), and a further 7 (21.88%) in the 'moderate' range (r >0.13–0.16). The remaining 5 items (15.63%) were of limited quality, as assessed by their partial correlations (r <0.12).15

The difficulty of the scenarios (that is, how achievable it is to gain full marks on a scenario) was calculated as a percentage of the candidates' mean score on the scenario divided by the total possible score as follows:

This ranged from 64.61% to 90.03% across the 32 scenarios. This range is comparable to those found in SJTs with similar test specifications that are used in other postgraduate settings.5,16,17

SJT total scores showed a close to normal distribution with a slight negative skew (see Figure 2; skewness = 1.86, kurtosis = 7.12).18 This indicates that the test is capable of differentiating between candidates, which is an important requirement in selection contexts.

Figure 2
figure 2

Histogram of distribution of SJT scores

Results show the reliability of the SJT to be α =0.68, comparable to values reported for SJTs in other settings19 and close to the accepted standard of α = 0.70 as an indicator of good internal reliability.20

Does the SJT produce scoring differences based on demographic groups?

There were no significant differences between total SJT scores for males and females (t(322) = 0.54, P = .59). White candidates scored significantly higher than Black and Minority Ethnic (BME) candidates (t(329) = 2.12, P=.03), d = 0.26), although the effect size was small,21 indicating minimal differences between groups on SJT performance.

Do candidates react to the SJT positively and view it as fair – does the SJT have good face validity?

All DCT candidates who completed the SJT also completed the feedback questionnaire (N = 386). The majority of candidates agreed or strongly agreed with the following statements: the SJT content was relevant to DCT (84.20%); the content of the SJT was appropriate for their training level (84.72%); the content of the SJT appeared to be fair (79.79%); and the level of difficulty of the SJT was appropriate (82.12%). These results provide good evidence for the face validity of the SJT and are commensurate with other findings of positive candidate perceptions towards SJTs for use in selection.5,22

Discussion

This study provides initial evidence for the suitability of an SJT to assess professional attributes during selection into UK Dental Core Training. A level of internal reliability close to that widely deemed acceptable20 was found for this pilot test in line with similar findings from SJTs in other settings.11 Additionally, partial correlations between scenarios and total SJT scores showed that 84.38% of scenarios had moderate or good associations with total SJT scores, indicating that the vast majority of SJT scenarios were high quality. Scenarios which were categorised as being of limited quality will be amended and re-piloted. Similarly, the difficulty of scenarios was within the range we would expect to see for a postgraduate SJT for use to 'sift out' low performing candidates.5,16,17 That is; most candidates are scoring fairly well (but not perfectly) across the majority of scenarios. The distribution of the pilot SJT was slightly negatively skewed, which is a common finding among SJTs in the wider research literature.23

The group difference analyses indicate that the SJT is fair. No gender differences were found for the SJT and although White candidates performed better than BME candidates, the effect was marginal. This is important because ethnic group differences are common among selection methods, and the effect size found in this study was lower than that often reported in other settings.24

Candidates' reactions to the pilot SJT were very positive, indicating good levels of face validity. Over three quarters of candidates rated the SJT as relevant to dentistry, appropriate for the training level, suitably difficult and fair.

Implications for practice

The distribution of test scores showed that the pilot DCT SJT can differentiate between test takers. Practically, this is important because a selection method is only valuable if it allows selectors to differentiate between candidates on the basis of key attributes or skills relevant to the target role. Taken together with the findings of the difficulty ranges of the scenarios, this suggests that the SJT provides greatest value to selectors towards the bottom end of the distribution, as most candidates tend to perform relatively well on the test. Emerging evidence in the UK from the medical education literature suggests that SJTs may have greater predictive validity at the lower end of performers.4,8 This suggests that in practice SJTs may be best used to 'select out' candidates in the earlier stages of a selection process and other, more academically-focused, methods may be more suited to 'select in' candidates during the later stages of selection.25 This is in line with the intended use of the DCT SJT, which, if implemented, would be used to select out candidates who do not demonstrate the required non-academic attributes for DCT. It is likely that this will be followed by other selection tools, designed to target clinical and academic skills. While we note that this is a preliminary study, since we did not assess the predictive validity of the SJT, the psychometric evaluation does provide initial support for the use of the SJT to select out lower performing candidates.

Finally, given the minimal group differences found in this pilot study, we can tentatively conclude that the SJT has little adverse impact on candidates based on their group membership. This is important in a high-stakes selection context as in DCT.

Recommendations for future research

Future pilots should be completed to assess whether these findings can be replicated with other cohorts, where psychometric and group difference analyses could be conducted for other contexts. It would also be useful to further examine the model of 'selecting out' candidates scoring at the lower end of the distribution on important non-academic attributes in the early stages of selection in the dental context. Future studies should also examine the longitudinal predictive validity of the SJT with measures of performance at key stages in the postgraduate dental training pathway including during DCT and into qualified practice, and investigate whether relationships between the SJT and outcome measures show the nonlinear trend that may be expected if SJTs are better at predicting performance towards the lower end of the distribution.