Changes in participant performance in the "test-taking" environment: observations from the 2006 College of American Pathologists Gynecologic Cytology Proficiency Testing Program.
* Context.--Because the consequences of making an interpretive
error on a proficiency test are more severe than those made on an
educational challenge, the same slide may exhibit different performance
characteristics in the 2 different settings.
Objective.--The results of the 2006 College of American Pathologists Gynecologic Cytology Proficiency Testing Program (PAP PT) provide the opportunity to compare the performance characteristics of the field-validated slides in the PAP PT environment with those of the same graded slides in the College of American Pathologists Educational Program (formerly known as the PAP Program).
Design.--All participant responses for negative (category B) and positive (categories C and D) validated slides in the 2006 PAP PT were used to determine the error rates of participants. These data were compared with the historical error rates observed on the same validated slides in the graded PAP Program.
Results.--The performance characteristics of the slides in the PAP PT environment were statistically different from those in the Educational PAP Program. In proficiency testing both cytotechnologists (P < .001) and pathologists (P < .002) were more likely to interpret validated category B slides as category C or D and less likely to interpret category C slides as category B (P < .001). These differences were more pronounced among cytotechnologists than among pathologists.
Conclusions.--In the test-taking environment, both cytotechnologists and pathologists appear to use a defensive strategy that results in "upgrading" of category B slides. This trend is more pronounced among cytotechnologists.
(Arch Pathol Lab Med. 2009;133:279-282)
Educational programs (Management)
Hughes, Jonathan H.
Bentz, Joel S.
Souers, Rhona J.
Wilbur, David C.
|Publication:||Name: Archives of Pathology & Laboratory Medicine Publisher: College of American Pathologists Audience: Academic; Professional Format: Magazine/Journal Subject: Health Copyright: COPYRIGHT 2009 College of American Pathologists ISSN: 1543-2165|
|Issue:||Date: Feb, 2009 Source Volume: 133 Source Issue: 2|
|Topic:||Temporal Scope: 2006 AD Event Code: 200 Management dynamics Computer Subject: Company business management|
|Product:||Product Code: 9124500 Health Care Financing Admin; 8521214 Cellular Biology NAICS Code: 92312 Administration of Public Health Programs; 54171 Research and Development in the Physical, Engineering, and Life Sciences|
|Organization:||Government Agency: United States. Centers for Medicare and Medicaid Services|
|Geographic:||Geographic Scope: United States Geographic Code: 1USA United States|
The College of American Pathologists (CAP) Interlaboratory
Comparison Program in Gynecologic Cytology (PAP) has existed for 18
years as an educational and laboratory accreditation activity for
pathologists and cytotechnologists. (1-3) When the CAP Gynecologic
Cytology Proficiency Testing Program (PAP PT) received approval in 2006
from the Centers for Medicare and Medicaid Services, the
best-performing, statistically field-validated slides from the PAP
Program were used to assemble the 10-slide proficiency test (PT) slide
sets. (4) The slides used for both programs have therefore undergone the
same rigorous statistical validation process that has long been used by
the CAP to select slides for the graded PAP Program.
Thus, the data from these 2 programs can be compared with each other to determine the impact, if any, that the test-taking environment has on slide and participant performance.
MATERIALS AND METHODS
Slide Selection for the PAP Educational Program and PAP PT Program
The source of slides for the PAP Educational and PAP PT programs is the same, and the slides undergo identical validation processes. Participating laboratories contribute slides to the CAP Cytopathology Resource Committee. Submitted slides with a diagnosis of low-grade squamous intraepithelial lesion (LSIL) or higher must be confirmed by biopsy. The slides and accompanying clinical information are reviewed by a minimum of one supervisory-qualified cytotechnologist and 3 board-certified pathologists from the committee. Each slide must be judged to be of good technical quality and an excellent example of the reference interpretation in order to be accepted into the program as an educational slide. Furthermore, the 4 reviewers must agree on the exact target interpretation, and this must agree with the stated biopsy interpretation if LSIL or higher. Once a slide has been accepted into the educational arm of the PAP Program, it is circulated among program participants for interpretation. The coded answer sheets have interpretive menus using modified Bethesda System terminology. Prior to 2006, referenced slides were placed into 1 of 3 selection series: the 000 series for unsatisfactory slides; the 100 series for negative, infectious, and reparative conditions; or the 200 series for epithelial abnormalities and carcinoma (Table 1).
PAP Criteria for Assessing Participant Responses
In the PAP Educational Program, laboratory responses were graded based on the selection series. Therefore, a correct/concordant response meant that the answer for the slide was in the correct selection series. For example, an interpretation of reparative change on a slide with a target/reference interpretation of negative for intraepithelial lesion or malignancy, not otherwise specified (NILM-NOS) was graded as correct/concordant since both answers are in the 100 selection series.
An incorrect/discordant response, or an error, meant that it is in the wrong selection series. For example, a response of reparative change on a slide with a reference interpretation of adenocarcinoma was considered a false-negative interpretation andwas registered as a discordant/incorrect response. Similarly, a response of LSIL on a slide with a reference diagnosis of NILMNOS was considered a false-positive interpretation and was registered as a discordant/incorrect response. Discrepant interpretations within a selection series (for example, LSIL vs high-grade squamous intraepithelial lesion [HSIL] responses) were considered minor discrepancies and were recognized as concordant/ correct responses.
PAP Criteria for Slide Grading/Validation
In 1992, the regulations derived from the Clinical Laboratory Improvement Amendments of 1988 (CLIA '88) proposed 4 response categories for scoring PT: unsatisfactory (category A), normal or benign changes (category B), low-grade squamous intraepithelial lesion (LSIL) (category C), and high-grade squamous intraepithelial lesions (HSIL) and carcinoma (category D). (5) The PAP Educational Program had always "field-validated" glass slides for acceptance into the laboratory accreditation PAP Program exercises. This field validation carried over into the PAP PT Program. Validation criteria are as follows: (1) At least 20 participants must submit a correct response on the slide for the following reference diagnoses: negative for intraepithelial lesion or malignancy, LSIL, HSIL, and repair. (2) The percentage of participant responses in the correct selection series must be at least 90%. (3) The standard error of this percentage must be, at most, .05 (SE [greater than or equal to] .05). (2)
These requirements ensure that for each slide: (1) a large enough group of responses/interpretations are used to determine the field-validated reference response, (2) close agreement exists within that group, and (3) agreement is statistically significant. For example, if a slide has a reference interpretation of NILM (category B) and the number of correct participant responses is exactly 20, then at least 95% of participant interpretations must be in category B to achieve the standard error requirement. If participants agree with the category only 90% of the time, the group must contain at least 36 members to achieve the standard error requirement of .05. Although specific supplemental criteria have been added during the last 11 years, the basic validation criteria require that a slide have a match rate of at least 90% to the exact "series" and a standard error of .05 or less. (2) Slides that have not obtained or retained field-validated status are designated only as educational slides and are not accepted into or utilized for the PAP PT Program.
Gynecologic Proficiency Testing Criteria for Assessing Participant Responses
Both pathologists and cytotechnologists must score at least 90 to achieve a passing grade, and the scoring system rewards or penalizes participants in proportion to the degree of variance from the target interpretation. The penalty is also weighted in proportion to the severity of the lesion. However, the penalty system is more severe for pathologists than for cytotechnologists (Table 2). Cytotechnologists are not penalized for responses discordant between categories C and D (LSIL and HSIL, respectively), whereas pathologists must be able to distinguish between these 2 categories in order to achieve full credit.
The performance of pathologists and cytotechnologists in the CAP 2006 PAP PT was compared with historical performance data from the 1996 through 2005 CAP-graded PAP Educational Program by determining the percentages and types of interpretive errors in the 2 data sets (Table 3). In order to determine differences in individual slide performance between the educational and testing environments, a pairwise slide analysis was performed that compared the types of interpretive errors for a given slide in the PAP educational arm with the types of interpretive errors for the same slide in the PAP PT.
Statistical analyses were carried out using SAS version 9.1 (SAS Institute Inc, Cary, NC) and [chi square] tests or 2-tailed Fisher exact tests. A P level of .05 was used for statistical significance.
The comparative data for the 2 programs are presented in Tables 3 and 4. Table 3 summarizes the percentages of 4 error types: negative classified as LSIL/HSIL, LSIL classified as negative, HSIL classified as negative, and HSIL classified as LSIL. The data in Table 3 suggest that more negative classified as LSIL/HSIL errors occur in the PT testing environment than in the educational environment and that there is a decrease in the number of LSIL cases classified as negative in the PT environment compared with the educational environment. To test this hypothesis further, a pairwise slide analysis was performed for all of the slides in the 4 error groups (Table 4). The findings show that, among pathologists, there was a statistically significant increase in the percentage of negative cases classified as LSIL/HSIL (P = .002) and statistically significant decreases in the percentages of LSIL cases classified as negative (P < .001), HSIL cases classified as negative (P < .001), and HSIL cases classified as LSIL (P < .001). Similar trends were observed among cytotechnologists, although the magnitude of the percentage increase of negative classified as LSIL/HSIL errors was much greater among cytotechnologists (0.5%-2.15%) than it was among pathologists (0.7%-1.59%) (Table 3).
Because the slides used for PAP PT are derived from the PAP Educational Program, the slide performance characteristics of PAP PT slides can be compared to historical data from the PAP Educational Program in order to examine the effect of the test-taking environment on slide and participant performance. The comparative data suggest that a test-taking environment alters performance by pathologists and cytotechnologists. Both pathologists and cytopathologists experienced increased percentages of negative classified as LSIL/HSIL errors (ie, category B slides incorrectly classified as category C or D) in the testtaking situation. Because the PAP PT slides represent the same pool of validated slides in the PAP Educational Program, and because the population of pathologists and cytotechnologists who take the PAP PT test also participate in the PAP Educational Program, it is likely that the changes observed in the PAP PT data are a consequence of changes in interpretation strategies directly associated with the test-taking experience.
The most severe penalties on the CLIA-mandated, individual PAP PT are associated with underinterpretation (Table 2). Any test taker (pathologist or cytotechnologist) who interprets a single slide as category B (NILM) that has a category D (HSIL or higher) reference diagnosis will fail the PAP PT examination on the basis of that single erroneous response, even if all of the responses for the remaining 9 test slides are correct. Penalties for over-interpretation are not as severe; for example, even the most extreme error in overinterpretation, in which a category B (NILM) slide is interpreted as category D (HSIL or higher), will not by itself result in failure of the PAP PT examinee if the responses for the remaining 9 slides are all correct.
Cytotechnologists who make negative classified as LSIL or negative classified as HSIL errors are not penalized as severely as pathologists who make such errors. The differential scoring schemes are intended to mirror the "realworld" goal of minimizing false-negative interpretations for the Papanicolaou screening test and to emphasize the different roles of cytotechnologists and pathologists for imparting sensitivity and specificity, respectively, to the process. However, in spite of its intentions to mirror "realworld" practice, the CLIA-mandated differential scoring system has the potential to undermine the validity of the PT process because it may cause test takers to replace their normal interpretive approach to the slides with strategies designed to minimize the differential punitive effects of the scoring system and maximize their probability of passing the test.
Our finding that the percentage of negative cases classified as LSIL/HSIL is higher and that the percentage of LSIL cases classified as negative is lower for slides examined in the PT environment than for slides examined in the nonpunitive, educational PAP Program environment (Table 3) suggests that the PAP PT examinees are acutely aware of the differential severity of penalties for underinterpretation and for overinterpretation and that they have adopted test-taking strategies that minimize the chance of failure. If PAP PT examinees encounter a slide for which they are uncertain of the correct diagnosis, they know that an underinterpretation will be penalized more severely than an overinterpretation. Given this, they will (and should, based upon the rules of the examination) always choose a response of category C for a case that they perceive to be borderline between category B and category C; similarly, they will (and should) always choose a response of category D for a case that is perceived to be borderline between category C and category D. The predicted end result of this "gamesmanship" is an increase in negative classified as LSIL/HSIL errors, as some category B slides are overinterpreted as category C and D in an effort by the test taker to decrease the likelihood of a potentially catastrophic underinterpretation. And this predicted result is exactly what is seen in the data in Tables 3 and 4. In addition, the consequences for failure of the PAP PT examination are more severe than those for PAP educational exercises; therefore, heightened vigilance may also be a factor in increased sensitivity for true positive C/D category cases.
The thesis that the differences in participant performance between the PAP Educational and the PAP PT programs are the result of test-taking/survival strategies by the examinee is also borne out by comparing the differential impact of the test-taking process on pathologists versus cytotechnologists. In PAP PT, pathologists are penalized more severely than are cytotechnologists for both underinterpretation and overinterpretation. For example, if a pathologist underinterprets a category D slide as category C (category D classified as C error), he or she will receive a 5-point penalty. In the opposite example of a pathologist overinterpreting a category C slide as category D (category C classified as D error), a 5-point penalty would also be imposed. On the other hand, there is no penalty for cytotechnologists who make a category C classified as D error or category D classified as C error. In light of the fact that pathologists are graded more strictly, with a subsequently greater chance of failure than for cytotechnologists, one would expect defensive test-taking strategies to be more amplified in pathologists than in cytotechnologists.
Again, this hypothesis is borne out by the data in Table 3, which show that increases in category D classified as C errors in the PT environment are observed in cytotechnologists, for whom this type of error is nonpunitive, while there is no change in the D classified as C error rate among pathologists, for whom a D classified as C error results in a loss of 5 points on the PT examination. Put another way, the data in Table 3 suggest that the lack of punitive consequences for a D classified as C error results in a decrease in cytotechnologists' vigilance for making this interpretive distinction; this phenomenon is not observed among pathologists, who are subject to punitive consequences for D classified as C errors. The percent increase in B classified as C errors in the test-taking environment is also much more dramatic among cytotechnologists than among pathologists, presumably because of the differential punitive consequences for this type of error between the 2 groups (ie, a 5-point error for cytotechnologists and a 10-point error for pathologists).
In summary, the data presented herein indicate that participants in the CLIA-mandated gynecologic cytology PT program evaluate the same slide challenges in a different fashion than they do in nonpunitive educational or laboratory accreditation exercises. These differences appear to be a direct result of the artificial testing environment and relate specifically to the overinterpretation of abnormal or potentially abnormal slides. Further evaluation of this phenomenon may be helpful in the future in designing more appropriate measures of cytologist proficiency.
Accepted for publication August 11, 2008.
(1.) Nielsen ML. Cytopathology interlaboratory improvement programs of the College of American Pathologists Laboratory Accreditation Program (CAP LAP) and Performance Improvement Program in Cervicovaginal Cytology (CAP PAP). Arch Pathol Lab Med. 1997;121:256-259.
(2.) Renshaw AA, Wang E, Mody DR, Wilbur DC, Davey DD, Colgan TJ. Measuring the significance of field validation in the College of American Pathologists Interlaboratory Comparison Program in Cervicovaginal Cytology: how good are the experts? Arch Pathol Lab Med. 2005;129:609-613.
(3.) Renshaw AA, Walsh MK, Blond B, Moriarty AT, Mody DR, Colgan TJ. Robustness of validation criteria in the College of American Pathologists Interlaboratory Comparison Program in Cervicovaginal Cytology. Arch Pathol Lab Med. 2006;130:1119-1122.
(4.) Bentz JS, Hughes JH, Fatheree LA, Schwartz MR, Souers RJ, Wilbur DC, for the Cytopathology Resource Committee, College of American Pathologists. Summary of the 2006 College of American Pathologists Gynecologic Cytology Proficiency Testing Program. Arch Pathol Lab Med. 2008;132:788-794.
(5.) Clinical Laboratory Improvement Amendments of 1988 Final Rule, 42 USC [section]263a(f)(4)(B)(iv) [section]353(f)(4)(B)(iv) of the Public Health Service Act, 57 Federal Register 7001-7186 (1992).
Jonathan H. Hughes, MD, PhD; Joel S. Bentz, MD; Lisa Fatheree, SCT(ASCP); Rhona J. Souers, MS; David C. Wilbur, MD; for the Cytopathology Resource Committee, College of American Pathologists
From the Department of Pathology, Laboratory Medicine Consultants, Ltd, Las Vegas, Nev (Dr Hughes); the Department of Pathology, University of Utah, Salt Lake City (Dr Bentz); Cytology Surveys (Ms Fatheree) and Statistics Department (Ms Souers), the College of American Pathologists, Northfield, Ill; and the Department of Pathology, Massachusetts General Hospital, Boston (Dr Wilbur).
The authors have no relevant financial interest in the products or companies described in this article.
Reprints: Jonathan H. Hughes, MD, PhD, Laboratory Medicine Consultants, Ltd, 3059 S Maryland Pkwy, Suite 100, Las Vegas, NV 89109 (e-mail: firstname.lastname@example.org).
Jonathan H. Hughes, MD, PhD; Joel S. Bentz, MD; Lisa Fatheree, SCT(ASCP); Rhona J. Souers, MS; David C. Wilbur, MD; for the Cytopathology Resource Committee, College of American Pathologists
Table 1. Response Categories Used for the Gynecologic Cytology Proficiency Test and Abbreviations Used for the Scoring Chart * CLIA '88 Response Category Abbreviation Interpretation A UNSAT Unsatisfactory for diagnosis because of scant cellularity, air drying, or obscuring material (blood, inflammatory cells, or lubricant) B NEG Normal or benign changes, includes normal, negative, or within normal limits; infection other than HPV (eg, Trichomonas vaginalis, changes or morphology consistent with Candida spp, Actinomyces spp, or herpes simplex virus); reactive and reparative changes (eg, inflammation or effects of chemotherapy or radiation) C LSIL Low-grade squamous intraepithelial lesion, includes cellular changes associated with HPV and mild dysplasia/CIN 1 D HSIL High-grade lesion and carcinoma includes high-grade squamous intraepithelial lesions with moderate dysplasia/CIN 2 and severe dysplasia/carcinoma in situ/CIN 3; squamous cell carcinoma; adenocarcinoma and other malignant neoplasms CLIA '88 Response Category Abbreviation PAP Series Category A UNSAT 001 B NEG 100 C LSIL 201 only D HSIL All other 200 series * CLIA '88 indicates Clinical Laboratory Improvement Amendments of 1988; PAP, Interlaboratory Comparison Program in Cervicovaginal Cytology; UNSAT, unsatisfactory; NEG, normal or benign changes; HPV, human papillomavirus; LSIL, low-grade squamous intraepithelial lesion; CIN, cervical intraepithelial neoplasia; and HSIL, high-grade squamous intraepithelial lesion. Atypical squamous cells of undetermined significance (ASC-US) or atypical squamous cells suspicious for HSIL (ASC-H) cases are not included. All LSIL cases and above must be tissue biopsy confirmed. Table 2. Grading System by Participant Type, as Mandated by Clinical Laboratory Improvement Amendments Regulations * Examinee Response Correct A-- B-- C- D- Response UNSAT NEGATIVE LSIL HSIL Pathologist (Technical Supervisor) 10-Slide Test A-UNSAT 10 0 0 0 B-NEGATIVE 5 10 0 0 C-LSIL 5 0 10 5 D-HSIL 0 -5 5 10 Cytotechnologist 10-Slide Test A-UNSAT 10 0 5 5 B-NEGATIVE 5 10 5 5 C-LSIL 5 0 10 10 D-HSIL 0 -5 10 10 * UNSAT indicates unsatisfactory; NEGATIVE, normal or benign changes; LSIL, low-grade squamous intraepithelial lesion; and HSIL, high-grade squamous intraepithelial lesion or higher. Table 3. Percentages of Interpretive Errors in Interlaboratory Comparison Program in Cervicovaginal Cytology (PAP PT) Versus Gynecologic Cytology Proficiency Testing Program (PAP Education) * PAP Education, % PAP PT, % (n = 109 856) (n = 43 080) Pathologist Results No error 97.71 97.27 Negative classified as LSIL/HSIL 0.70 1.59 LSIL classified as negative 0.38 0.08 HSIL classified as negative 0.56 0.42 HSIL classified as LSIL 0.65 0.65 PAP Education, % PAP PT, % (n = 109 470) (n = 44 218) Cytotechnologist Results No error 98.47 95.82 Negative classified as LSIL/HSIL 0.50 2.15 LSIL classified as negative 0.10 0.09 HSIL classified as negative 0.29 0.50 HSIL classified as LSIL 0.64 1.43 * Aggregate data. LSIL indicates low-grade squamous intraepithelial lesion; HSIL, high-grade squamous intraepithelial lesion or higher. Table 4. Pairwise Slide Analysis of Interpretive Errors to Assess Statistical Significance * % Error Change From Significant Education Result to PT (P < .05) Pathologist Results (4930 Slides) Negative classified as LSIL/HSIL Increase Yes (P = .002) LSIL classified as negative Decrease Yes (P < .001) HSIL classified as negative Decrease Yes (P < .001) HSIL classified as LSIL Decrease Yes (P < 001) Cytotechnologist Results (4955 Slides) Negative classified as LSIL/HSIL Increase Yes (P < .001) LSIL classified as negative Decrease Yes (P < .001) HSIL classified as negative ... No (P = .44) HSIL classified as LSIL Increase Yes (P < .001) * PT indicates proficiency testing; LSIL, low-grade squamous intraepithelial lesion; and HSIL, high-grade squamous intraepithelial lesion or higher.
|Gale Copyright:||Copyright 2009 Gale, Cengage Learning. All rights reserved.|