Document Detail

Comparison of collegial individual and group reviews of general practice multiple choice questions.
Jump to Full Text
MedLine Citation:
PMID:  22916083     Owner:  NLM     Status:  MEDLINE    
AIMS: In most German medical faculties, credits in general practice can be earned via exams using multiple-choice questions (MCQ). Measures such as peer-reviews may help assure the quality of these exams. In order to use time and personnel intensive peer reviews effectively and efficiently, the procedures used are key. Therefore, we wanted to find out whether there are differences between group and individual reviews regarding defined parameters.
METHODS: We conducted a controlled cross-over study with three GP reviewers from four different German universities. Each reviewed 80 MCQs, 40 individually and 40 within a group, including external assessments by a panel of experts. Furthermore all reviewers were asked to evaluate the review process and the time spent carrying out these reviews.
OUTCOMES: We found no significant differences between the reliability and the validity of individual reviews versus group reviews. On average slightly more time was spent on group reviews compared with the individual reviews. The subjective assessments of the study participants regarding their satisfaction with the process and the efficiency and effectiveness of the reviews suggest a preference for group reviews.
CONCLUSIONS: Based on this study, there are no definite recommendations for or against either approach. When choosing between the two, the specific work structures and organisation at the local faculty should be taken into account.
Klaus Böhme; Jörg Schelling; Irmgard Streitlein-Böhme; Katharina Glassen; Jeannine Schübel; Jana Jünger
Related Documents :
21278393 - Septic arthritis following anterior cruciate ligament reconstruction secondary to clost...
18586693 - Premorbid adjustment, onset types, and prognostic scaling: still informative?
3318873 - Benign lipomatous lesions of the uterus: 3 new cases, review of the literature and hist...
Publication Detail:
Type:  Comparative Study; Journal Article     Date:  2012-08-08
Journal Detail:
Title:  GMS Zeitschrift für medizinische Ausbildung     Volume:  29     ISSN:  1860-3572     ISO Abbreviation:  GMS Z Med Ausbild     Publication Date:  2012  
Date Detail:
Created Date:  2012-08-23     Completed Date:  2013-03-20     Revised Date:  2013-07-12    
Medline Journal Info:
Nlm Unique ID:  101276035     Medline TA:  GMS Z Med Ausbild     Country:  Germany    
Other Details:
Languages:  eng     Pagination:  Doc57     Citation Subset:  IM    
University Hospital Freiburg, School of General Practice, Freiburg, Germany.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Cooperative Behavior
Cross-Over Studies
Educational Measurement / standards*
Faculty, Medical*
General Practice / education*
Interdisciplinary Communication
Licensure, Medical
Peer Review*
Quality Assurance, Health Care / standards

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Full Text
Journal Information
Journal ID (nlm-ta): GMS Z Med Ausbild
Journal ID (iso-abbrev): GMS Z Med Ausbild
Journal ID (publisher-id): GMS Z Med Ausbild
ISSN: 1860-7446
ISSN: 1860-3572
Publisher: German Medical Science GMS Publishing House
Article Information
Download PDF
This article is available from © 2012 Böhme et al.
Received Day: 15 Month: 7 Year: 2011
Accepted Day: 03 Month: 4 Year: 2012
Revision Received Day: 30 Month: 3 Year: 2012
Electronic publication date: Day: 08 Month: 8 Year: 2012
collection publication date: Year: 2012
Volume: 29 Issue: 4
E-location ID: Doc57
ID: 3420119
PubMed Id: 22916083
Publisher Id: zma000827
DOI: 10.3205/zma000827
Publisher Item Identifier: Doc57
Article Id: urn:nbn:de:0183-zma0008277

Comparison of Collegial Individual and Group Reviews of General Practice Multiple Choice Questions
Klaus Böhme*1
Jörg Schelling2
Irmgard Streitlein-Böhme3
Katharina Glassen4
Jeannine Schübel5
Jana Jünger6
1University Hospital Freiburg, School of General Practice, Freiburg, Germany
2LMU München, School of General Practice, München, Germany
3University of Freiburg, Medical Faculty, Dean of Studies Office, Freiburg, Germany
4University Hospital Heidelberg, Department for General Pracitce and Health Services' Research, Heidelberg, Germany
5University Hospital Dresden, Carus GP Surgery, Dresden, Germany
6University of Heidelberg, Competence Centre for Medical Exams, Heidelberg, Germany
Correspondence: *To whom correspondence should be addressed: Klaus Böhme, University Hospital Freiburg, School of General Practice, Elsässerstraße 2m, 79110 Freiburg, Germany, Phone: +49 (0)761/270-27460, Fax: +49 (0)761/270-27480, E-mail:

Introduction and objectives

The Medical Licensure Act of 2002 [1] introduced the requirement for graded student performance records in each subject of the clinical study section in human medicine. For practical reasons, this is usually done in the form of written exams using multiple choice questions (MCQ), which are considered to be of satisfactory reliability and objectivity [2]. The literature contains general rules for creating “good” MCQs, both regarding form and content [3], [4], [5]. To date, many authors of MCQs have not been trained in the application of these rules at German medical schools and, in many cases, no standardised review process for the questions used exists either [6]. Accordingly, there is no quality assurance of the MCQs in use.

To ensure an adequate standard of MCQs in use, a standardised peer-review process in addition to examiner training would appear suitable. The range of possible review process is wide, including individually reviewing questions or multiple individuals reviewing questions, moderated, un-moderated, face to face and virtual reviews [7].

Optimal effectiveness and efficiency of the peer reviews which are rather time and labour intensive depend on several factors. In this, the question of validity and reliability of the assessments play a central role. If one wants to strengthen the motivation of reviewers, they should be satisfied with the review process and they should be convinced of the effectiveness as well as the importance of it. Another important factor is the time required.

Since November 2008 the general medicine subject area at the University of Freiburg has been drawing upon a web-based digital exam system in the preparation of MC exams, developed by the “Centre of Excellence for Medical Exams in Baden-Württemberg” at the University of Heidelberg. The pool of questions for the general medicine subject area is held in this item management system (IMS) and used as the basis for creating exams at Freiburg.

In Germany, 15 departments now rely on the IMS as a digital tool in the preparation and evaluation of exams. While the system already significantly simplifies the organisational processes, a much more vital added benefit is that, in theory, a faculty can draw upon the exam questions of other faculties and use them for their own exams. Such access on the one hand requires the willingness of each faculty to share their questions with other faculties. On the other hand all users of the system had agreed that questions can only be placed in the public pool which can be accessed by other users if they have gone through a defined review process.

In the context of this study, individual reviews were compared with un-moderated face-to-face group reviews to investigate the following:

  • Are there differences between individual and group reviews in (a) the frequency of errors found (consistency of review forms: reliability) and (b) compared to a standard review by designated experts (consistency with a standard: validity)?
  • Does the review process influence the satisfaction of the reviewers with the process, their assessment of the effectiveness and importance of the reviews carried out?
  • Is there a difference in the time required for implementing individual or group reviews?


Four general medical departments at German universities who declared their willingness to participate in the study were selected for the study: Dresden, Freiburg, Heidelberg and LMU Munich, hereinafter referred to in random order as Uni 1-4 for the sake of anonymity. Each site provided three employees who deal with the design and review of MCQs within their departments.

Item Sample

For the study, 2 x 40 MCQs (Group A and B) were randomly selected from the question pool of the general medicine subject area at the University of Freiburg. They were all so-called Type A questions (single positive or negative selection from five answer choices).

Materials and technical prerequisites

One of the functions of the IMS is that all recorded exam questions can be reviewed online using a ten-criteria evaluation form (see Table 1 (Tab. 1))1. An input field for free text comments makes it possible to record criticisms and to make specific suggestions for corrections.

The basis for assessing all reviews in this study was the “Short Guide to Reviewing MC Questions” by the competence centre, which in turn is based on the relevant literature on this subject [3], [4], [5].

For evaluating satisfaction with the review process, its effectiveness and the subjective assessment of its importance, a short questionnaire was created. Answers were given using a 6-point Likert scale (see Figure 1 (Fig. 1)). In addition, there were two open questions: “Which part of the review did I like best?” and “Which part of the review caused me the most problems?”.


The first step was creating a comparative standard for the assessment of study participants through a review of all 80 MCQs by experts. The four-member committee, all of whom had appropriate experience (MME or employee of the Centre of Excellence), was composed of three specialist representatives and a non-specialist colleague. The “Centre of Excellence for Exams in Medicine in Baden-Württemberg” subjected all 80 MCQs which were going to be part of the study to a group review.

The “Brief Guide to Reviewing MC Questions”, which explains the checklist criteria for the review (see Table 1 (Tab. 1)), was made available to all study participants. No further training of the reviewers took place. According to the study design (see Table 2 (Tab. 2)), each site the three reviewers then assessed 40 MCQs 40 individually and 40 MCQs in review groups.

For the group reviews, the study participants agreed dates when all 40 MCQs were assessed in a single session. In dealing with the individual reviews, the reviewers had the opportunity to work to their own timetable.

Following both the individual and the group reviews, the study participants were required to fill in the short questionnaire.


Through a review of all MCQs used in this study by a panel of experts, a reference assessment, a gold standard, was set for each question. The assessments of this study were compared to this external reference to check its validity. The reliability was tested by comparing the identified deficiencies at the four different sites.

All of the 3200 individual assessments carried within this study’s reviews (80 MCQs × 10 assessment criteria x 4 sites) were dichotomized as follows: 0 = no deviation from the expert review and 1 = deviation from the expert review and 0 = no defect and 1 = slight or severe defect and thus subjected to statistical analysis. For the individual reviews, the rounded average of the three individual assessments was used for comparison.

The variables “Number of deviations from the gold standard” and “Number of errors found” were analysed using variance analysis (linear mixed model [8]) with the fixed factors “review form” (single/group), “location” (Uni 1 - Uni 4), “question group” (MCQ Group A/MCQ Group B) and “pass” as well as the random factor “MCQ” (i.e. it is assumed that the MCQs represent a random selection from the MCQ pool). Of primary interest were the factors “review form” and “location”, the other factors were used as control variables.

The variables for the individual categories are binary (values 0 and 1) and were therefore analysed with a similar non-linear mixed model with a logistic link function (logistic-normal model [9], [10]).

The statistical analysis of the questionnaires accompanying the study was done descriptively.

Sample Composition

Twelve GPs from four German universities, eight women and four men, took part in the study. The participants’ ages ranged from 24 to 61 years, the average age was 41 years (SD=14.6).


The results of the deviations of the study reviews from the expert reviews are presented in Table 3 (Tab. 3). There were no statistically significant differences, either between the type of review (single vs. group review), or between Pass 1 and Pass 2, or between Question Group A and Question Group B. A significant difference was found only between the individual reviewer groups of the different locations.


The number of deficiencies identified as part of the review study are shown in Table 4 (Tab. 4). Overall, no significant differences between the nature of the review (single vs. group review), the pass and the question group were found for all the 3200 criteria. The only significant difference was also found to be between the sites.


In response to the question regarding satisfaction with the review process, following the individual reviews the study participants responded with an average of 4.92 (SD=0.69), following the review group with 5.17 (SD=0.83). The question on effectiveness was answered as follows: following individual reviews with an average of 4.92 (SD=0.69), following group review with 5.58 (SD=0.67). The importance of the review following individual reviews was rated 5.75 (SD=0.45) on average, following group review 6.00 (SD=0).

The free text comments on the questions “Which part of the review did I like best?” and “Which part of the review caused me the most problems?” can be summarised as follows:

From the perspective of the reviewer, the most positive aspect of the individual reviews was the freedom in time-planning, where for the group reviews it was the collegial exchange of ideas and the associated learning effects. Commonly cited problems in individual review were complaints of vague evaluation criteria in the absence of opportunities for asking questions, whereas in the group reviews it was timing problems, as it is often not easy to produce consensus in carrying out assessments and the long sessions associated with this type of review and the resulting concentration problems.

Time Expenditure

The average processing time for the 40 questions in individual reviews was 113 minutes (SD=44), 139 minutes (SD=48) in group reviews.


There is already sufficient evidence in German-speaking countries that the systematic collegial review of MCQs improves the quality of exam questions [11], [12], [13]. There are only a few indications as to the advantages and disadvantages of different review process [7]. This study was intended to help clarify the practical questions about the advantages and disadvantages of individual vs group reviews.


The results of this study show that are review processes are equally valid. There is therefore no basis upon which we could recommend one over the other.

Statistically significant differences in the assessments of MCQs were found, however, between the reviewers of the four individual sites. The “Short Guide to Reviewing MC Questions” alone, although based on the same basis as the knowledge of the “experts”, apparently is not suitable for ensuring homogeneous assessments across the four university sites. In terms of the deviations from expert’s review, there was one significant difference across all checklist items. The result suggests that in addition to author training required elsewhere [4] and wide-scale introduction of a standardised review process, reviewer training should be recommended [13].


The deficiencies observed at the four locations not dependent on whether the questions were assessed individually or in groups were viewed across all checklist items. There is therefore, again, no basis for deciding for a particular version.

In contrast, the comparison of university sites with each other revealed statistically significant differences for all items which could be statistically analysed. The demand for reviewer training can thus be affirmed.


If the goal is to ensure that throughout Germany as many tutors participate in a mutual and sustainable review of MCQs, it must be ensured that they are convinced of the necessity of such an approach and that the methodology is effective and subjectively satisfactory from their point of view. For this reason, the individual reviews were accompanied by questionnaires which queried these parameters.

Averaged across all four sites, for both the question regarding satisfaction, the effectiveness of the review process and the importance of the review, slightly higher approval was awarded following group reviews compared to individual reviews. When analysing the questions for each site separately, the higher approval following group reviews is replicated almost consistently. Only at one site (Uni 1), was the question regarding satisfaction with the review process ranked higher following individual reviews. In this case too, group dynamics should be considered, as the Uni 1 site took the longest of all review sessions, with 210 minutes of processing time in group reviews. This could suggest problems when trying to come to decisions with joint responsibility.

The free text comments on individual and group reviews reflect the expected advantages and disadvantages of both methods once again, juxtaposing the freedom to plan in individual reviews with of the lack of feedback opportunities from colleagues. It becomes obvious that in group reviews the timing issues are in contradiction with the often mentioned (and positively seen) collegial exchange of ideas and the associated learning effects.

Time Expenditure

If we consider only the pool of questions of general medicine in Freiburg which includes approximately 280 MCQs (though other sites may have much larger pools), we can guess the enormous amount of time necessary for reviewing the entire set for all general medical departments and subject areas. In this respect, the timings found by this study may be considered when discussing the more practical review process. Group reviews of 40 MCQs on average took 26 minutes longer than individual reviews. If one followed this criterion alone, there would have to be a pronounced recommendation of individual reviews. Interestingly, this extra need for time is not raised in the free text answers.


The relatively small sample of reviewers, which for feasibility reasons was not a random sample also limits the generalisability of the study results. As employees of general medical departments and subject areas were consciously selected (i.e. people who in their daily routine are tasked with creating and/or reviewing exam questions without special additional training) this meant that inevitable discrepancies in medical experience and prior theoretical knowledge had to be accepted. Both factors limit the comparability.

The ten-item checklist against which the MCQs were judged did not reflect all criteria for the creation of “good” MCQs given in the literature [3], [4], [5]. Some items must be considered redundant while other criteria, such as the important question of the relevance of MCQs [11], are missing.


The results of this study do offer some help when trying to reach a decision about which review process may be recommended as the better.

As was shown in the results section and explained in the discussion, no statistically significant difference regarding validity and reliability of both procedures could be found which means that it will not be possible to decide for one or the other solely based on these factors.

Against this background, the subjective assessments of the study participants on the reviews gain in weight. These tended more towards group reviews. It would appear that the time factor is subordinated to the satisfaction and the subjective assessment of the effectiveness of the process by the study participants.

The specifics of the situation on the ground can understandably play a crucial role when selecting a method. If the reviewers are scientific or clinically-based staff who usually work together in a department, the selection will tend towards group reviews which are relatively easy to terminate. If the reviews will tend to be carried out by established lecturers (as is common in general medicine), individual reviews at their own workplace could seem more reasonable and practicable.

For practical reasons this study only looked at Type A questions. But essentially, the results should be transferable to other question formats.


1 In the IMS this evaluation form is based on an algorithm for which no further details are available which either makes the questions available to the public folder or refers them to the authors for correction.


My special thanks for support with the statistical aspects of this project goes towards Dr. Andreas Möltner, at the Competence Centre for Exams in Medicine - Baden Württemberg, University of Heidelberg

Competing interests

The authors declare that they have no competing interests.

1. Bundesministerium für GesundheitApprobationsordnung für Ärzte vom 27.06.2002BGBLYear: 200224052435
2. Möltner A,Schellberg D,Jünger J. Grundlegende quantitative Analysen medizinischer PrüfungenGMS Z Med AusbildYear: 2006233Doc53 Available from:
3. Haladyna TM,Downing SM,Rodrigues MC. A review of multiple-choice item-writing guidelines for a classroom assessmentAppl Meas EducYear: 20021530934410.1207/S15324818AME1503_5 Available from:
4. Krebs R. Anleitung zur Herstellung von MC-Fragen und MC-Prüfungen für die ärztliche AusbildungYear: 2004BernInstitut für Medizinische Lehre IMS, Abteilung für Ausbildungs- und Examensforschung AAE
5. AG Progress Test MedizinProgress Test Medizin. Leitfaden für Fragenautorinnen und –autoren des Progress Test MedizinYear: 2003BerlinCharité Universitätsmedizin
6. Jünger J,Möltner A,Lammerding-Köppel M,Rau T,Obertacke U,Biller S,Narciß E. Durchführung der universitären Prüfungen im klinischen Abschnitt des Medizinstudiums nach den Leitlinien des GMA-Ausschusses Prüfungen: Eine Bestandsaufnahme der medizinischen Fakultäten in Baden-WürttembergGMS Z Med AusbildYear: 2010274Doc5710.3205/zma000694 Available from:
7. Kazubke E,Schüttpelz-Brauns K. Gruppenleistungen beim Review von Multiple-Choice-Fragen – ein Vergleich von face-to-face und virtuellen Gruppen, mit und ohne ModerationGMS Z Med AusbildYear: 2010275Doc6810.3205/zma000705 Available from: 21818213
8. Brown H,Prescot R. Applied Mixed Models in Medicine, Second EditionYear: 2006Oxford/UKJohn Wiley & Sons, Ltd10.1002/0470023589 Available from:
9. Beitler PJ,Landis JR. A Mixed-effects Model for Catagorical DataBiometrYear: 198541991100010.2307/2530970 Available from:
10. Wolfinger R,SUGI Proceedings. Fitting Nonlinear Models with the New NLMIXED ProcedureYear: 1999Cary/NCSAS Institute Inc
11. Kropf R,Krebs R,Rogausch A,Beyeler C. Auswirkungen angeleiteter Itemanalysebesprechungen mit Dozierenden auf die Qualität von Multiple Choice-PrüfungenGMS Z Med AusbildYear: 2010273Doc4610.3205/zma000683 Available from:
12. Weih M,Harms D,Rauch C,Segarra L,Reulbach U,Degirmenci U,de Zwaan M,Schwab S,Kornhuber J. Qualitätsverbesserung von Multiple-Choice-Prüfungen in Psychiatrie, Psychosomatik, Psychotherapie und NeurologieNervenarztYear: 200980332432810.1007/s00115-008-2618-8 Available from: 19104765
13. Rotthoff T,Soboll S. Qualitätsverbesserung von MC Fragen: Ein exemplarischer Weg für eine medizinische FakultätGMS Z Med AusbildYear: 2006233Doc45 Available from:

Article Categories:
  • Article

Keywords: Medical Educatio, assessment, Multiple-Choice-Questions, Review.

Previous Document:  Novice medical students: individual patterns in the use of learning strategies and how they change d...
Next Document:  The practial use of the consensus statement on practical skills in medical school--a validation stud...