Faculty development programs improve the quality of Multiple Choice Questions items' writing

The aim of this study was to assess the utility of long term faculty development programs (FDPs) in order to improve the quality of multiple choice questions (MCQs) items' writing. This was a quasi-experimental study, conducted with newly joined faculty members. The MCQ items were analyzed for difficulty index, discriminating index, reliability, Bloom's cognitive levels, item writing flaws (IWFs) and MCQs' nonfunctioning distractors (NFDs) based test courses of respiratory, cardiovascular and renal blocks. Significant improvement was found in the difficulty index values of pre- to post-training (p = 0.003). MCQs with moderate difficulty and higher discrimination were found to be more in the post-training tests in all three courses. Easy questions were decreased from 36.7 to 22.5%. Significant improvement was also reported in the discriminating indices from 92.1 to 95.4% after training (p = 0.132). More number of higher cognitive level of Bloom's taxonomy was reported in the post-training test items (p<0.0001). Also, NFDs and IWFs were reported less in the post-training items (p<0.02). The MCQs written by the faculties without participating in FDPs are usually of low quality. This study suggests that newly joined faculties need active participation in FDPs as these programs are supportive in improving the quality of MCQs' items writing.

The aim of this study was to assess the utility of long term faculty development programs (FDPs) in order to improve the quality of multiple choice questions (MCQs) items' writing. This was a quasi-experimental study, conducted with newly joined faculty members. The MCQ items were analyzed for difficulty index, discriminating index, reliability, Bloom's cognitive levels, item writing flaws (IWFs) and MCQs' nonfunctioning distractors (NFDs) based test courses of respiratory, cardiovascular and renal blocks. Significant improvement was found in the difficulty index values of pre-to post-training (p50.003). MCQs with moderate difficulty and higher discrimination were found to be more in the post-training tests in all three courses. Easy questions were decreased from 36.7 to 22.5%. Significant improvement was also reported in the discriminating indices from 92.1 to 95.4% after training (p50.132). More number of higher cognitive level of Bloom's taxonomy was reported in the post-training test items (p,0.0001). Also, NFDs and IWFs were reported less in the post-training items (p,0.02). The MCQs written by the faculties without participating in FDPs are usually of low quality. This study suggests that newly joined faculties need active participation in FDPs as these programs are supportive in improving the quality of MCQs' items writing.
F aculty development is defined as the process designed to prepare and enhance the productivity of academic staff for other pertinent roles, such as teaching, assessment, research, managerial and administrative issues 1 as well as the development of resource material and facilitation which are required for active and studentcentered learning 2 . Students' learning is largely driven and enhanced by assessment, thus development of high quality test is an important skill for educators 3 . The mode of assessment has been shown to influence the students' learning capabilities 4 . Usually, educators develop the test items by themselves or sometimes rely on item test banks as a source of questions. The possibility of error is more in case of test banks if their staff members are not well educated and professionally trained enough for the development of test items 5 . Hence, the assessment tool should be valid and reliable, and capable of measuring the diverse characteristics of professional competencies. Multiple choice questions (MCQs) are one of the most frequently used competency test type. MCQs are appropriate competency test for measuring knowledge, comprehension and can be designed to measure application and analysis 6 . Use of well-designed MCQs has been increased significantly due to their higher reliability, validity, and ease of scoring 7,8 . Also, well-constructed MCQs are capable of testing the higher levels of cognitive reasoning and can efficiently discriminate between high-and low-achieving students 9,10 . Despite the above said facts pertaining to well-constructed MCQ items, various studies have documented violation of MCQs' construction guidelines 9,11 .
Generally, faculty members may be asked to perform duties for which they have received no formal training and experience 12 . Faculty development programs (FDPs) are therefore required to provide wide range of learning opportunities available to academic staff ranging from conferences on education to informal discussions on the development of assessment materials to support the running courses. However, a successful FDP requires more than simple attendance; a degree of reflection and development is also needed to ensure continuity for personal development and the desired outcomes which improve the teaching, learning and assessment process 13 . FDPs can be evaluated by the Kirkpatrick's model 14 . The Kirkpatrick's model describes four levels of outcome, i.e., learners' reaction (to the educational experience); learning (which refers to acquisition of new knowledge and skills); behavior (which refers to changes in practice and the application of learning to practice); and results (which refers to change at the level of the learner and the organization as the main outcome of a program) 15 . However, only scanty studies have reported the fourth level of Kirkpatrick's model 16-18 . Teachers' training in the 21 st century needs to be widen its spotlight by using varied learning methods based on established learning theories, fostering partnerships and collaboration, and thoroughly evaluating interventions to keep pace with the changes in medical curricula 19 . In order to address the above mentioned needs, the Faculty Development Unit, Department of Medical Education, College of Medicine, King Saud University (KSU), Saudi Arabia conducted two workshop training programs in the academic year 2013-2014 for all newly joined faculty members (demonstrators, lecturers, assistant professors, associate professors and professors) who were involved in teaching of various subjects of medicine for the first year in the College of Medicine, Princess Nourah University (PNU), Riyadh, Saudi Arabia. The main focus of the workshops' training program was to construct appropriate MCQ items by the participants based upon the sound scientific standard and guidelines. The long term impacts of FDPs have been evaluated with pre-program, immediate post-program and follow-up study 14,20 . Therefore, the present study aims to evaluate the effect of long term well-structured FDPs in order to improve the quality of MCQs items' writing.

Results
The results of the final MCQs based examinations of all the three courses (respiratory, cardiovascular, and renal) for academic years 2012-2013 (before workshop training) and 2013-2014 (after workshop training) were analyzed separately. The reliability co-efficient (Kr-20) of the all three examinations before and after workshop training program were more than 0.92. The improvement of students average mean score of the three courses before workshop training (mean score 18.29 6 0.77) and after workshop training (mean score 20.33 6 0.47) was observed (Table 1). Similarly, overall passing rate of students increased from 49.2 to 56.3% after workshop training.
The difficulty index (P) and discrimination index ( Table 2).
The numbers of non-functional distractors (NFDs) were also less (n513) in the year 2013-2014 (post-training) and it was more than double in case of the year 2012-2013 (pre-training) (n528) (x 2 56.0, p50.02). The MCQs were further divided into difficult, moderate and easy categories based on their difficulty index ( Table 3). The numbers of moderately difficult MCQs were more (n5183) in the year 2013-2014 (post-training) as compared to the year 2012-2013 (pre-training) ( Table 2). The numbers of easy type MCQs were quite less.
On the basis of Bloom's cognitive levels, in the academic year 2013-2014 (post-training) the K2 MCQs were more (n5123) in comparison to K2 level MCQs (n584) in the academic year 2012-2013 (pre-training), whereas K1 MCQs were decreased to 117 from 156 after training (x 2 512.91, p,0.0001). The MCQs with item writing flaws (IWFs) were around ten times more in numbers (n547) before workshop training program and reduced significantly (n55) after the training program (x 2 538.04, p ,0.0001). The items analysis showed significant variation between the pre-and post-training in the P-values of respiratory (F54.964, p50.027), cardiovascular (F56.253, p50.013) and renal block examinations (F57.852, p50.006) (Table 3). Similarly, item writing flaws also have significant variation between pre-and post-training in all three block examination (Table 3).

Discussion
Many untrained tutors are excellent in their academic responsibilities but earlier findings proved that medical faculties can be more effective in their roles with formal training 21 . FDPs built professional development especially for new faculties members to their various academic roles. Therefore, FDPs activities will appear highly valuable and effective, if participant's outcome measured to changes in learning, behavior and performances 22 . Our results show effectiveness of MCQs items' writing workshop training in positive context to items related outcome, student's mean score and passing rate (Table 1and  Table 2). The results analysis showed a significant positive difference in the measured outcomes including DI and p-values of the final MCQs based examinations of all the three courses separately included in this analysis for the academic year 2013-2014 (posttraining) over the academic year 2012-2013 (pre-training). The difference in pre-and post-training DI values suggests that the faculty development activity in the academics of medicine resulted in significant improvement in the quality of test items development by the participants. Significant differences were found for DIs in pre-and post-training examination, as more MCQs were present in all the three courses and showed significant improvement after faculty development program and demonstrated that the quality of the MCQs were improved after attending the FDP by the participants. Overall improvement of MCQs items' writing skills pre-to post workshop training reflect increased mean score and passing rate of students. Our results were in close agreement with the earlier reports of Naeem et al. and Jozefowicz et al. 18,23 . Naeem et al. evaluated the effect of FDPs on quality of MCQs, short answer questions (SAQ) and objective structured clinical examination (OSCE) items writing and reported significant improvement in the quality of test items developed by the participants after training intervention 18 . They achieved high effect sizes, which indicate strong effect of the dedicated FDP on items' quality 18  Step-1 questions written by trained faculties had mean score higher as compared to the faculties without formal training 23 . Other studies also demonstrated the benefit of both peer review and structured training in improving item writing quality 24,25 . In contrast, based on the utility and modern trend of medical school examinations' item writing our study was only focused on in-house MCQs items development. Also, we evaluated the effect of FDPs on MCQs items' writing quality in terms of increased mean score and passing rate of students. By applying rigorous statistical analysis on training intervention, we found significant decrease in easy questions, NFDs and IWFs, whereas remarkable improvement was noticed in DIs and Bloom's taxonomy, thus making our findings more reliable and significant from analytical point of view. Our result indicated that the pre-and post-training reliability (Kr-20) of the examination was .0.92, which indicates homogeneity of the test. It was also confirmed that the reliability of the test is not only depends upon the quality of the MCQs but also on the number of MCQs, distribution of the grades and the time provided for examinations 26 .
MCQs with a higher number of NFDs are easier than those with a lower number of NFDs and are less discriminating items 13,27,28 . Distractors usually distract the less knowledgeable student, but they should not result in tricky questions which might mislead knowledgeable examinees. A question with only two good distractors, however, is preferable to one with additional filler options added only to make up some pre-determined number of options 10 . We also found that the number of NFDs were less in the MCQs of posttraining examination. The discriminatory power of a MCQ is largely depends upon the quality of its distractors. An effective distractor will look plausible to less knowledgeable students and lure them away from the keyed option but, it will not entice students who are wellinformed about the topic under consideration. Writing effective distractors can be a challenging job, but helpful guidelines that can make the process easier are readily available 29,30 . Assessment derives learning 3,31 , which should be perfectly intervened with higher levels of cognitive abilities. The 'learning approach' is a dynamic characteristic and is continuously modified according to students' perception of the learning environment 32 . The cognitive level of the assessment should be in line with the cognitive level of the course objectives and with the instructional activities and the materials provided to students, which is known as 'constructive alignment of assessment' 33 .
One of the most common problems that affect the MCQs' quality is the presence of item writing flaws (IWFs). The IWFs are any type of violations of accepted/standard item-writing guidelines that can affect students' performance on MCQs and making the item either easier or sometimes even more difficult 11 . Various researchers have identified potential reasons for lack of quality questions and they reported IWFs as one of the major reasons. Vyas and Supe (2008) 34 reported that limited time and lack of faculty training in the area of MCQs preparation significantly contribute to flaws in writing quality items. Downing (2005) 11 assessed the quality of four examinations given to medical students in the United States of America and found that 46% of MCQs contained IWFs and reported that as a consequence of these IWFs, 10-15% of students who were classified as failures would have been classified as pass if MCQs' items with IWFs were removed. Earlier, it has been reported that if the flaws items had been removed from the test, fewer lower achieving examinees would have been passed the test and higher achieving examinees would have obtained high scores 35 . These results were consistent with our findings of decreased IWFs and increased passing rate in pre-to post-test training intervention. The flawed items may also affect difficulty and discrimination index. Low difficulty and poor discrimination in an item favors low achievers, whereas high difficulty and poor discrim- Table 2  ination negatively affect the high scorers 36 . Moreover, flawed items also fail to assess the courses learning objectives 36 .
The methods of assessment inspire the approach of students towards learning. Students are inclined to expose a surface approach when assessment emphasis is on recall of factual knowledge and students are more likely to adopt a surface approach 37 . The present study concludes that faculty should be encouraged and trained to construct MCQs for higher order cognitive levels to assess trainees in appropriate manner 38 . The current study also paves the way for application of suitable faculty development programs to improve the quality of MCQs for other career options/degrees. Improvement in the quality of MCQs will improve the validity of the examination as well as students' deep learning approaches. The outcomes indicate that FDPs should be arranged regularly in a wellorganized format and schedule. A flow-chart of MCQs items' writing training workshop program structure according to the Kirkpatrick's levels of evaluation has been given as a reference (Figure 1).
Several evaluations of FDPs have occurred immediately or soon after the conclusion of the intervention 14,[16][17][18] . Few studies have examined the impact of these programs on the skills and behaviors of participants at a later date [39][40][41] . This is the very first communication reporting the sustainability and the long term effect of FDPs on newly joined faculty members in MCQs construction skills development.
In addition to our progressive findings, certain limitations were associated with this study, such as: (i) the present study only revealed the results of single group of faculties' and students' scores only in three courses of the first year medical degree, (ii) further workshops will be needed for other assessment tools, such as, 'short answer questions', 'objective structured practical examination' and 'objective structured clinical examinations' which are also included in the assessment, (iii) such workshops must be conducted for broader scales on different contexts and variety of examinations.
In conclusion, well-constructed faculty development training improves the quality of the MCQs in terms of difficulty and discriminating indices, Bloom's cognitive levels and reduces item writing flaws and non-functioning distractors. Improvement in quality of MCQs will surely improve the validity of the examination as well as students' better achievements in their assessment. Based upon the outcomes, we can suggest that faculty development trainings should be conducted on regular basis and in a proper manner as well as follow up process for the continuously of the quality assessment process. Such training will lead to greater effectiveness. Also, the effectiveness of training depends more on design (should be aimed for learner's need) and implementation of training program. Faculty development program workshop intervention. Two days full time workshop was conducted for 25 newly joined faculty members of PNU. The workshop was designed to train faculty members, to construct high quality single-best MCQs for basic science courses. The participants were asked to bring five MCQs from their specialties to be discussed in the workshop.
On the first day, theoretical backgrounds were discussed along with the revision of the MCQs construction checklist criteria and consensus was achieved regarding the checklist items with the members. Whenever a disagreement was raised with any checklist item, that was discussed again for its rationale and disagreement was resolved. All pre-workshop MCQs, which were developed, by the faculty members, were revised based on the agreed checklist criteria and corrected accordingly.
On the second day of the workshop the participants were divided into three to four participants, in a small group and asked to develop five MCQs in their specialties, based on the provided and agreed checklist criteria. Further the MCQs were discussed and again, corrected and edited with the participants' agreement.  well-structured MCQs items' writing training workshop program has been given as   Quality measurement of the items of MCQs. The Kirkpatrick's model of educational outcomes offers a useful evaluation framework for the faculty development workshops. The Kirkpatrick's evaluation model has been found very useful in evaluating the workshops with higher level outcomes 13,46 . The present study lies in the fourth level of the Kirkpatrick's model which evaluated the change among the participants' performance in the MCQs items' writing outcome at three different levels.
1. MCQs items construction in terms of Bloom's cognitive level and Item writing flaws. A well-constructed MCQ consists of a stem (a clinical case scenario), a lead-in (question) and followed by 4-5 choice options (one correct/best answer and three to four distractors 7,42 . Bloom's taxonomy divided the cognitive domain into six hierarchically ordered categories: knowledge, comprehension, application, analysis, synthesis, and evaluation 47 . Tarrant et al. 38 simplified the taxonomy by creating two different levels, i.e., K1 which represents basic knowledge and comprehension; K2 which encompasses application and analysis. MCQ items at K2 level are better, more valid and discriminating good students from poor performing students 13 . MCQs with IWFs are those items which violate the standard item-writing guidelines. The flawed MCQs test items reduces the validity of examinations and penalizing some examinees 11 . In order to investigate the effectiveness of the faculty development program a checklist was used for checking the quality of MCQs (Appendix 1). -20). Difficulty index also named as P-value describes the percentage of students who correctly answered a given test item. This index ranges from 0 to 100% or 0 to 1. An easy item has a higher difficulty index. The cut-off values maintained to evaluate the difficulty index of MCQs were: .70% (easy); 20-70% (moderate); ,20% (difficult) 48 . Moderate difficulty items (20-70%) in a test have better discriminating ability 13 .

MCQs item analysis (Difficulty index, Discrimination index, Non-functional distractors and Kr
Discriminating index is the ability of a test item to discriminate between high and low examinee scorers. Higher discriminating indices in a test indicate better and greater discriminating capability of that test. The cut-off values for the discrimination index (DI) were taken as, discriminating index . 0.15, and non-discriminating index # 0.15 27 .
Nonfunctioning distractor (NFD) is an option(s) of a question(s), that have been selected by less than 5% of the examinees 49 . These NFDs may have no connection or have some clues which are not directly related to the correct answer 38 . Implausible distractors can be easily spotted even by the weakest examinees and are therefore usually rejected outright. Distractors that are not chosen or are consistently chosen by only a few participants are obviously ineffective and should be omitted or replaced 50,51 .
Kuder-Richardson Formula 20 (Kr-20) measures the internal consistency and reliability of an examination. The KR20 formula is a measure of internal consistency for examinations with dichotomous choices. If Kr-20 coefficient is high (e.g., .0.90), it is an indication of a homogeneous test. If Kr-20 figure is 0.8, it is considered as the minimal acceptable value, whereas figure below 0.8 indicates non-reliability of the exam 52 . Question mark perception software program (Questionmark Corporation, Norwalk, CT, USA) was used for the items' analysis and for the determination of Kr-20.
3. Students' performance. The MCQs items writing flaw and plausible distractor affect students' performance. Some flaws, such as the use of unfocused or unclear stems, gratuitous or unnecessary information and negatively worded in the stem can make questions more difficult 35 . Similarly, plausible distractor creates misconception about the correct option, at least in the average examinee's mind 28 .
Statistical analysis. The data obtained were entered in the Microsoft Excel file and analyzed using SPSS software (version 19.0). Pearson's chi-square test was used to evaluate and quantify the correlation, whereas ANOVA test was used for variance analysis between the categorical outcomes. The statistical significance level was set as p-value , 0.05 during the entire analysis.
Ethical considerations. The participants were informed about the study and agreed to get involved in the project. The study was approved by the research ethical committees of the respective medical colleges of KSU and PNU, Riyadh, Saudi Arabia. Also, the employed methods for this study were carried out in accordance with the approved guidelines of the respective medical colleges of KSU and PNU, Riyadh, Saudi Arabia.