Contingent magnitude of reward in a human operant IRT>15-s-LH schedule.
|Article Type:||Statistical Data Included|
Conditioned response (Research)
Behaviorism (Psychology) (Research)
Lippman, Louis G.
Leritz, Lyle E.
|Publication:||Name: The Psychological Record Publisher: The Psychological Record Audience: Academic Format: Magazine/Journal Subject: Psychology and mental health Copyright: COPYRIGHT 2002 The Psychological Record ISSN: 0033-2933|
|Issue:||Date: Wntr, 2002 Source Volume: 52 Source Issue: 1|
|Geographic:||Geographic Scope: United States Geographic Code: 1USA United States|
In an IRT>15-s schedule, either the number of points was fixed
or a lesser number was awarded the later a response occurred during the
5-s LH. As an additional means for enhancing feedback about performance,
a response that exceeded the LH yielded minimal reward (vs. no
reinforcer and clock reset), thus allowing differentiation between
excessively long IRTs and anticipatory responses. It was expected that
the "graded" scale of reward magnitude, coupled with delivery
of one point for exceeding the LH, would increase precision of
performance. Providing a point for overestimations did increase the
percentage of IRTs falling within the LH, and graded magnitude of reward
did enhance the proportion of IRTs taking place earlier within the LH.
It was suggested that contingent incentive value, as incorporated in the
present laboratory paradigms, effectively influenced performance and, in
general, represents the nature of contingencies that prevail in the
Human operant laboratory tasks are often designed as counterparts of animal studies, but the typical incentive of food or water reinforcement is replaced by some symbolic secondary reinforcer such as points displayed on a counter or computer screen (Davey, 1981). Despite differences between animal and human forms of motivation (Wearden, 1988), such symbolic reinforcers do seem to carry some incentive value. And despite inconsistent findings for reward magnitude effects in animal studies, it appears that those effects are most stable in within-subject studies and particularly when reward magnitude is response contingent (Bonem & Crossman, 1988).
Lippman (2000) demonstrated that contingent incentive value was effective in modifying human operant performance in a laboratory setting. In keeping with Lippman (1973), incentive value was operationalized as magnitude of reward (number of points earned). With the contingent incentive schedule, one feature of behavior determines delivery of reinforcers whereas a second performance attribute governs reward magnitude each time a reinforcer is to be presented. It was argued, with examples provided, that this type of arrangement allows for a more faithful representation of most of the contingencies that affect humans in the real world (Lippman, 1977, 2000). As a further example, consider a mother eager to foster academic productivity in her young child. Indeed, she will praise the child for bringing home a sample of school work. Although that action should increase the likelihood of the child bringing home more school products, it is the less important contingency. When the child brings home a careless piece of w ork, the parental attention is likely to be rather minimal relative to the praise that will be lavished when thorough and careful school work is brought home. While delivery of reward is contingent upon bringing home some school product, the magnitude of incentive is being used to foster quality work.
In the laboratory as well as with most formal tests of contingency application, the primary emphasis has been on the behavior requirement for occurrence of reinforcers; any contingency for reinforcer value has been inoperative (i.e., incentive value of reward set and fixed). As Lippman (2000) suggested, however, in many nonlaboratory and "natural" circumstances, the major feature for promoting behavior change is found in the contingency for value, with the behavior requirements for occurrence being of secondary importance. It was also declared that variable reinforcer values can be more informative than simpler presence vs. absence of a reinforcer, and thus should be expected to provide more effective feedback to performers about their behavior.
Lippman (2000) tested a variety of contingencies such as positive and negative correlated reward (Logan, 1966) plus more complex relationships between reinforcer value and performance, as superimposed upon a fixed-interval schedule for occurrence. Those studies demonstrated, as did the human correlated reward studies by Buskist, Oliveira-Castro, and Bennett (1988), that the magnitude contingencies promoted appropriate performance changes. The main purpose of the present study was to provide further testing and demonstration of the informative value of contingent reward magnitude. In a series of early studies dealing with subjects' "discovery" that a contingency was time-based such that large numbers of responses were not required (see Lippman, Leander, & Meyer, 1970, for a summary), it had been observed that once subjects had shifted into a low rate behavior pattern, there would be long delays between the time that an interval had expired and the time that subjects responded in order to "collect' a reinforce r. If fixed interval (Fl) performance were regarded as a means for testing timing, then subjects often overestimated by twice the duration of the Fl. It may be expected that an interresponse time (IRT)>t contingency, in which anticipatory responses are "punished" by imposing a delay before the next reinforcer becomes available, could certainly foster such overestimation.
A means for curtailing such overestimation would be to impose a limited hold (LH), which would thus create a "window" of interresponse times that would be reinforced; hence IRTs that were both too short and too long would be punished by the next available reinforcer being postponed. Adding a LH to an IRT>15-s schedule would be expected to increase the number of nonreinforced anticipatory responses, and a performer would have the problem of differentiating between responses that were anticipatory and those that occurred after the LH had expired. It should be noted that for the present research, the IRT clock did not reset automatically (which might disrupt performers' timing) but reset only when a response was made (thus cueing the outset of another interval). Because an IRT>t schedule would test subjects' timing, feedback about accuracy and drifts toward overestimation and underestimation would be important.
The main purpose of the present study was to adjust magnitude of reward, in the form of number of points, in order that each reinforcing event indicated when each response took place during LH. For a control or "traditional" contingency, number of points remained fixed. In the graded contingency, maximum points were awarded for responses taking place at the outset of the LH; for each subsequent second, that number was reduced one point. It was expected that this specification would curtail a tendency toward longer IRTs that would eventually exceed the LH but would likely increase anticipatory responding.
In a typical IRT>t with LH schedule, the meaning of a nonreinforced response is ambiguous: The IRT was either too long or too short. An additional purpose of the present research was to provide enhanced feedback via reinforcers to allow learners to make this distinction, In a traditional (control) contingency, all IRTs that failed to fit the IRT window were nonreinforced. In the experimental condition, anticipatory responses were nonreinforced whereas responses that exceeded the LH resulted in delivery of a single point. (Responses taking place during the LH were fixed at 6 or ranged from 2 to 6 for the graded condition described above.) It was expected that acquisition would be facilitated and performance would remain more stable when reward magnitude was made more informative. Specifically, that more responses would fall within the LH even though overestimates were reinforced.
Participants and Appatatus
Participants were 8 senior psychology students at Western Washington University. Participation was an optional activity for their seminar class.
The apparatus consisted of a microcomputer, color monitor, printer, software for programming contingencies and recording performance data, and a push button switch centered on the top of a 3.9- x 7.3- x 10.1-cm box. The monitor displayed the total number of points earned, number of points just received, and an "A" or "B." For participants' first session the "A" and "B" were not displayed.
At the outset of each day of testing participants were asked to remove watches and leave belongings outside of the testing room. Participants were handed a printed copy of the instructions and asked to follow along as the experimenter read them aloud. Instructions for the initial session were as follows; the underlined sections appeared in bold face:
This is an experiment in performance and not a "psychological test." We are interested in features of performance that are common to all people, and are not concerned with "personal reactions."
Here is a button and here is a computer screen. Your goal in this task is to earn as many points as possible. Points become available on the basis of time. You can earn points by pressing the button.
When we begin the screen will display the total points earned and points just received. There will be one sound whenever the button is pressed. There will be another sound whenever points are earned. It is always possible to gain 6 points whenever points are delivered.
Do not touch anything except the button.
The experimenter will be in the next room. The computer will indicate when the session is finished.
Remember, your goal is to accumulate as many points as possible. It is always possible to add 6 points whenever they are delivered.
Instructions for the subsequent sessions differed only in the second and third paragraphs:
Here is a button and here is a computer screen. Your goal in this task is to earn as many points as possible. As before, points become available on the basis of time. You can earn points by pressing the button.
When we begin an "A" or "B" will appear at the top of the screen. The screen will also display the total points earned and points just received. There will be one sound whenever the button is pressed. There will be another sound whenever points are earned. It is always possible to gain 6 points whenever points are delivered.
After presenting the instructions and answering questions, the experimenter started the session and left the room.
Participants were tested in a series of daily sessions. The initial session was IRT>1 5 s with a fixed reinforcer value of 6 points; its purpose was to familiarize participants with the contingency. A two-component multiple (MULT) schedule obtained for four subsequent sessions in which the "A" component was the same IRT>15-s schedule with reinforcer value fixed at 6 points. The "B" component consisted of IRT>15-s with a limited hold (LH) of 5-s, but with a different magnitude contingency in effect each day. Those contingencies can be described according to a 2 by 2 factorial arrangement: Presence vs. absence of a single point for responses taking place following expiration of LH and magnitude of reward fixed at 6 points or graded from 6 to 2 points, depending on when a response was emitted during LH. Specifically, a response emitted during the first second of LH yielded 6 points; a response occurring in the fifth second earned 2 points.
During the first session participants were trained on IRT>15-s with fixed value reinforcers for familiarization; performance was recorded in 10 segments which advanced following the first reinforcing event after 4 mm. On the four subsequent sessions, the MULT alternated between component "A" and one of the four contingencies of "B." The sequence of the "B" contingencies was rotated so that all participants completed one session with each of them but in different orders. Components were switched following the first delivery of a reinforcer once the component had been in effect for 4 mm. Each session, at least 40 mm in duration, consisted of five 4-minute segments of each component. The number of responses, reinforcing events, reinforcer magnitude, and percentages of IRTs in 1-s bins were recorded in each segment of the session. At the conclusion of each session, participants were asked to describe the contingency (i.e., how points were earned).
Given the limited amount of data and the instability of information in each segment of the session, each participant's IRT frequencies were totaled across all "B" components for the session and converted to percentages. The bins were further compressed into 5-s blocks of time and subjected to within-subject ANOVA having factors of nature of reward (fixed vs. graded), point following LH (presence vs. absence) and five IRT blocks (the final block contained all IRTs that surpassed the LH). This analysis revealed a significant main effect of IRT blocks, F(4, 28) = 34.70, p < .0005, MSE = 416.15, and the point following LH by IRT block interaction, F(4, 28) = 4.49, p = .006, MSE= 182.46. Data for all four groups are shown in Figure 1, in which it is rather evident that providing a single point, relative to nonreinforcement, for responses exceeding the LH increased the proportion of responses that fell within the LH. Although nonsignificant, the graded reward magnitude led to a slight increase over fixed magnitude in the percentage of responses falling within the block of IRTs preceding the LH (Block 3), as can be seen in the figure. More specifically, 8.31% of responses fell in Block 3 for the fixed magnitude condition, compared to 14.50% for the graded condition. Thus the graded reward contingency led to increased anticipatory responses, as might be expected.
There was considerable variation in participants' performance, some of which is attributable to limited experience with each contingency and some to the randomized sequence of exposure to each of the variants. Despite such differences, the general trends described above can be seen in individual data. Total responses, anticipatory responses, responses taking place during LH, and responses falling beyond the LH are shown in Table 1. Inspection of those data shows a trend toward greater response frequency caused by graded reinforcement, and reduced response frequency, especially anticipatory responses, when a point had been awarded for IRTs that exceeded the LH.
In order to provide a more detailed examination of performance effects, the percentage of responses falling within each second of the LH were subjected to the same type of ANOVA, having factors of nature of reward, point following LH, and IRT bins (seconds 1-5 of the LH). This analysis revealed significant main effects of point after, F(1, 7) = 6.69, p - .036, MSE = 67.13, and bin, F(4, 28) = 2.08, p = .045, MSE = 209.69, plus the interaction of nature of reward by bin, F(4, 28) = 3.73, p = .015,
MSE = 54.88. Data for all conditions are shown in Figure 2. The point after effect, in keeping with the analysis described above, indicates that a greater percentage of responses occurred within LH when overestimates led to a single point rather than absence of reward. The bin effect captures a general trend of decreasing percentages over the duration of LH. However, the interaction of reward by bin points to the steeper slope of this gradient when subjects had been provided graded rather than fixed reward magnitudes, in keeping with predictions.
Finally, an analysis having the same between-subjects factors of nature of reward and point following LH was conducted using the total number of intervals in which the LH had been exceeded as the performance measure. Although nonsignificant, there was a slight tendency for fewer such intervals when reinforcer magnitude had been graded rather than fixed. The only significant outcome of this analysis was a main effect of point following, F(1, 7) = 15.13, p = .006, MSE = 90.25. There were far more intervals where LH had been exceeded when a reinforcer had not been provided (21.94) than when a reinforcer had been presented (8.88). This outcome is in keeping with the point after effect described above for percentage of responses.
As had been noted, typical contingencies specify the occasions in which a fixed-value reinforcer is presented. In the current research, the emphasis was on value, rather than presence or delivery, of a reinforcer--which is in keeping with a limited number of specialized contingencies such as correlated reward (Logan, 1966), a multiple-ratio schedule (Lovitt & Esveldt, 1970), or conjugate reinforcement (Lindsley, 1962).
The present effects of providing a reinforcer for responses exceeding the LH led to a clear increase in the proportion of responses falling within the bounds of the LH. It should be noted that the consequence of exceeding LH in the present experiment was not typical. It is more conventional for the IRT clock to reset automatically once LH has expired. Given the brevity of testing and the added complexity of the graded reinforcement condition, it was decided to maintain the constant conditions that every response reset the IRT clock and that the only way it could be reset was by a response. Hence participants were generating a reliable cue for their timing. Although not the central purpose of the study, it was nevertheless of interest that providing a single point following LH--which allowed differentiation between IRTs that were too early, too late, or within LH--reduced the percentage of IRTs that exceeded the LH. Clearly this provision was serving an informative function in that overestimations decreased w hen "rewarded,"
Graded magnitude of reward led to a modest increase in anticipatory responses, but especially led to an increased percentage of responses taking place early in the LH. It is thus suggested that this differential magnitude of reward provided enhanced feedback about performers' timing and thus led to more precise estimations.
Unlike much operant research in which training is prolonged, there was comparatively little exposure to the four contingencies that were tested in the present study. Despite the limited practice, there was evidence that contingent reinforcer value was able to influence performance. Lippman (2000) also had relied upon a relatively short session without repeated testing and found strong behavioral effects from schedules in which magnitude of reward was contingent upon specific features of performance. Because the present contingencies were considerably more difficult and seem more subtle than general differences in response frequency, the current findings provide rather clear testimony to the effectiveness of contingent incentive value for influencing performance.
Kristine Bennington's assistance with data entry is gratefully acknowledged. A version of this report was presented at the 26th annual convention of ABA, May 26-30, 2000, in Washington DC.
Address correspondence to Louis G. Lippman, Department of Psychology, Western Washington University, 516 High Street, Bellingham, WA 98225-9089; firstname.lastname@example.org (e-mail).
BONEM, M., & CROSSMAN, E. K. (1988). Elucidating the effects of reinforcement magnitude. Psychological Bulletin, 104, 348-362.
BUSKIST, W., OLIVEIRA-CASTRO, J., & BENNETT, R. (1988). Some effects of response-correlated increases in reinforcer magnitude on human behavior. Journal of the Experimental Analysis of Behavior, 49, 87-94.
DAVEY, G. (1981). Animal learning and conditioning. Baltimore: University Park Press.
LINDSLEY, O. R. (1962). A behavioral measure of television viewing. Journal of Advertising Research, 2, 2-12.
LIPPMAN, L. G. (1973). Contingent magnitude of reward in human fixed-interval performance. Proceedings, 81st Annual Convention, APA, 8, 867-868. (Summary)
LIPPMAN, L. G. (1977). Approximating "real-world" contingencies in the human operant laboratory. Journal of Biological Psychology, 19, 11-19.
LIPPMAN, L. G. (2000). Contingent incentive value in human operant performance. The Psychological Record, 50, 513-528.
LIPPMAN, L. G., LEANDER, J. D., & MEYER, M. E. (1970). Human fixed interval performance as related to response effortfulness and to initial point. Journal of General Psychology, 82, 57-61.
LOGAN, F A. (1966). Continuously negatively correlated amount of reward. Journal of Comparative and Physiological Psychology, 62, 31-34.
LOVITT, T. C., & ESVELDT, K. A. (1970). The relative effects on math performance of simple- versus multiple-ratio schedules: A case study. Journal of Applied Behavior Analysis, 3, 261-270.
WEARDEN, J. H. (1988). Some neglected problems in the analysis of human operant behavior. In G. Davey & C. Cullen (Eds.), Human operant conditioning and behavior modification. New York: Wiley.
[Figure 1 omitted]
[Figure 2 omitted]
Table 1 Total Frequency of Responses Taking Place Before, Durng, and Exceeding LH When Reinforcer Magnitude Had Been Fixed (F) or Graded (G), a Point Had (P) or Had Not (NP) Been Delivered for Responses Exceeding LH Participant F, NP F, P G, NP G, P 1 Pre 14 4 29 2 During 40 28 39 20 Post 18 29 23 38 2 Pre 28 7 45 27 During 12 62 39 53 Post 35 7 26 7 3 Pre 3 14 94 7 During 59 60 55 71 Post 10 5 21 0 4 Pre 16 32 24 21 During 35 37 64 56 Post 27 15 2 5 5 Pre 62 20 30 32 During 3 59 56 51 Post 28 3 1 2 6 Pre 39 12 23 26 During 22 37 23 35 Post 23 19 54 16 7 Pre 11 6 16 11 During 47 48 41 47 Post 19 15 24 12 8 Pre 6 5 14 19 During 62 68 59 66 Post 7 5 8 2
|Gale Copyright:||Copyright 2002 Gale, Cengage Learning. All rights reserved.|