A computer-assisted analysis of verbal coaching behavior in soccer.
Abstract: The Coach Analysis Instrument (CAI) uses form of event recording to systematically examine the effectiveness of verbal coaching behavior in a sport environment. While many of the earlier observation instruments provided information on verbal behavior by identifying the focus of coaching interactions, the CAI more fully describes the instructional style and tendencies a coach may possess. The quantitative data provides a comprehensive profile of coaching methodology and equips the researcher with a full range of coaching behaviors to examine and modify. The utility of the supervisory and/or self-assessment tool is central to its design and CAI as development. The first section of this paper describes the instrument and discusses its capacity for use within sport pedagogy. The next section reports the findings of a developmental study that measured and subsequently improved the reliability of the CAI. We suggest that the hierarchical structure of the CAI captures most verbal behaviors and the introduction of a systematic training program will enhance the objectivity of data collection.
Subject: Coaching (Athletics) (Research)
Soccer coaching (Research)
Authors: More, Kenneth G.
McGarry, Tim
Partridge, David
Franks, Ian M.
Pub Date: 12/01/1996
Publication: Name: Journal of Sport Behavior Publisher: University of South Alabama Audience: Academic Format: Magazine/Journal Subject: Psychology and mental health; Sports and fitness Copyright: COPYRIGHT 1996 University of South Alabama ISSN: 0162-7341
Issue: Date: Dec, 1996 Source Volume: v19 Source Issue: n4
Accession Number: 19280452
Full Text: The primary purpose of practice in a sport setting is to improve an athlete's performance. Effective coaching instruction is an essential component in this regard and a proficient coach needs a variety of pedagogical skills. These include the planning and organization of appropriate learning experiences, the accurate observation of an athlete's performance during these experiences and the provision of appropriate feedback to the athlete about their performance in a meaningful manner. The most important variable for the learning of a motor skill is feedback (Schmidt, 1988) and researchers and teacher educators in sport pedagogy have established guidelines for using feedback in teaching-learning situations" (Lee, Keh, & Magill, 1993, p.228). While athletes frequently receive feedback from the coach about their performance, the coach rarely receives feedback regarding his(her) own coaching effectiveness. Most coaches neither engage in nor have the means to evaluate their own skills and abilities to deliver accurate feedback to their athletes.

In an effort to remedy this problem a number of systematic observation instruments, originally developed for use in the field of teaching, have been adapted to analyze the behavior of coaches in a sport environment (Claxton, 1988; Lacy & Darst, 1985; Markland & Martinek, 1988; Rushall, 1977; Segrave & Ciancio, 1990). Amongst the first to adapt this technique were Tharp and Gallimore (1976) who devised a ten category system to observe U.C.L.A. basketball coach John Wooden. Markland and Martinek (1988) explored the nature and amount of feedback that successful and less successful high school varsity volleyball coaches gave their starting and non-starting players. The growth of this research endeavor is evidenced by Darst, Zakrajsek, and Mancini (1989) who devoted an entire section of their book to describing several instruments, each specifically designed to analyze the behavior of coaches or athletes in a practice environment.

Computer technology has also recently been applied to ease the processes of data collection and analysis (Briggs, 1991; Carlson & McKenzie, 1984; Johnson & Franks, 1991; Hawkins & Wiegand, 1989). This technology offers immediate summary and display of the collected data and allows the timely return of meaningful feedback on the observed coaching performance. Franks, Johnson, and Sinclair (1988) developed the 'Computerized Coaching Analysis System' (CCAS) in an attempt to improve on existing techniques for the systematic observation of coaches in a sport environment. The CCAS has three components, one of which is the 'Coaching Analysis Instrument' (CAI). The CAt is designed to collect data on the verbal behaviors exhibited by coaches as they organize and instruct athletes during a practice session. Initial use of the CAI highlighted some shortfalls in its design (see Partridge & Franks, 1996). As a result, a revised version of the CAI has been developed. (Continued reference to the CAI now refers to the revised version unless otherwise stated.) The CAI has been developed for use within the Canadian National Coaching Certification Program (NCCP) as a coaching evaluation or self-assessment tool that can be used by coaches to examine their own performance on a regular basis. It can be operated via the keyboard of an IBM compatible microcomputer and can thus be used by coaches as part of their ongoing professional coaching development.

There are a number of advantages in using the CAI to examine the verbal behavior of coaches. First, data collection is conducted via the traditional QWERTY keyboard in menu driven form and may be operated from either the hard or floppy disk drives. The system is thus portable and does not require computer accessories. Second, the quantitative and objective information that is generated provides a complete description of the organizational and instructional behaviors employed by the coach. The data are amenable to detailed analysis, the results of which can be used as feedback for the coach in attempting to modify their behavior. Third, the potential exists for the CAI to be interactively linked to a video recorder. This technology makes it possible to precisely recall selected examples of a coach's verbal behavior from video that might be of interest (for a detailed account of the design and use of computer interactive video technology, see Franks & Nagelkerke, 1988). The purpose of this study is two fold, (a) to describe development of the CAI to date, and (b) to report the findings of a study that was undertaken to assess the reliability of the CAI before use within the NCCP.

Method

Data Collection

A practice session in most team sports may be composed of different activity segments. For example, a practice could be divided into any or all of the following segments: (a) warm-up, (b) technical skill, (c) tactical, (d) conditioned game, (e) fitness, and (f) game scrimmages. During each segment a coach can use a number of drills, each designed to improve an athlete's performance in certain skills or techniques. After organizing their athletes to participate in these drills, the coach's primary role is to provide feedback to the athletes about their performance. The CAI has been designed to collect data on the organizational and instructional elements of a coach's verbal behavior as they instruct and provide feedback to their athletes.

The Coaching Analysis Instrument (CAI)

The CAI data are collected in four stages. The first stage includes administrative information (e.g., date, coach, setting, drill type, number of drills). The second stage collects data on the "Organizational" features of the coach's verbal behavior. After viewing the videotape portion of the practice in which the coach explains how a drill will function, the analyst responds 'Yes' or 'No' to three questions: (1) Did the athletes understand the organization of the drill? (2) Were the goals of the drill clearly stated? (3) Was the organization delivered in a clear and concise manner? The third and largest stage of data collection takes place once the athletes are working. Data are collected (via the QWERTY keyboard) on the "Instructional" features of the coach's verbal behavior as they give feedback to the athletes. A comment that is "skill" related is coded at five levels - Direction, Focus, Method, Delivery and Emphasis. A comment that is "non-skill" related is coded at only three levels - Direction, Focus, and Intent.

The fourth stage of data collection occurs at the conclusion of a drill. The analyst responds 'Yes' or 'No' to a further seven questions once viewing of the videotape and data collection of the instructional behaviors exhibited by the coach is completed. Four questions address the "Realism" of the drill: (1) Was the drill representative of game situations? (2) Did the coach use an adequate playing area? (3) Did the coach use an adequate number of athletes? (4) Did the drill match the set goals? Three questions address the "Athlete(s) Performance" during the practice: (1) Did the athletes work enthusiastically throughout the drill? (2) Did the drill challenge the athletes? (3) Did performance improve because of information provided by the coach? We shall now describe the instructional component of the CAI.

Level 1 (Skill and Non-Skill comments)

Direction. The coach can deliver the comment towards either (a) a particular individual(s) (maximum of two) or (b) a group of athletes (more than two).

Level 2 (Skill and Non-Skill comments)

Focus. The analyst determines if the comment was either "skill" or "non-skill related." If skill related the comment is coded as one of the following (a) Re-instruct, the coach refers to information that is a repeat of previous instructions and the comment is not influenced by the quality of the athlete's performance (e.g., a general statement to reinforce what the athletes are trying to achieve), (b) Correct, the comment is made in reference to a skill or technique performed correctly by the athlete, or (c) Incorrect, the comment is made in reference to a skill or technique performed incorrectly by the athlete. If the comment is non-skill related it is coded as being one of the following: (a) Non-specific, the comment has no specific focus and does not direct the athlete to which components of their performance they should attend to (e.g., "Nice try"), (b) Effort, the comment refers to the athlete's work output or rate during a performance (e.g., "Great hustle"), (c) Behavior, refers to the athlete's conduct during a drill effort and is probably more applicable to certain age groups, or (d) Organization, details how the drill should function in terms of space and athlete positioning (e.g.,"John change positions with Eric").

Level 3 (Skill and Non-Skill comments)

Method. If the comment was coded as skill related then the analyst next enters the method of comment delivery. This can be either (a) During, the comment is made at the same time as the athlete(s) is performing (e.g., commentary as the performance is occurring), (b) Post, after the athlete(s) has completed a part or their role within the drill, or (c) Stopped, the coach deliberately intervenes in the action to stop the drill ("freezes the action").

Intent. If the comment was coded as non-skill related then the analyst next enters the intent of comment delivery. Therefore, a "non-specific," "effort" or"behavior" non-skill comment can be coded as being (a) Affective, the comment is delivered in a manner such that it could have a motivational effect upon the athlete(s), or (b) Non-Affective, the comment has no motivational effect on the athlete(s). If the comment was coded as a Non-Skill Organization comment then no data are collected.

Level 4 (Skill comments only)

Delivery. The comment can be coded as either (a) Demo, the delivery of the comment was accompanied by a demonstration in real time or at a slower pace or (b) No-Demo, no demonstration or visual aid was used.

Level 5 (Skill comments only)

Emphasis. It was stated earlier that prior to the practice the coach provides a brief written outline of the drills to be used and the key coaching points they intended to be emphasized or instructed during a drill(s). If, within a comment, the coach refers to any of these coaching points then the comment is coded as "key factors." If no reference is made to these key coaching points or the coach refers to a coaching point other than those in their practice outline, then it is coded as "non-key factors."

Additional Data

In addition to these five levels of data collection, there is one key that has been designated as "Inappropriate" (each coded comment is considered to be "Appropriate" by default). If within the data analysis the coach includes erroneous technical information, they incorrectly evaluate the athlete(s) performance, or the comment is irrelevant to the athletes performance, then the comment is tagged as "Inappropriate." For example, a coach might make a comment that would be coded as skill, incorrect, post, no-demo, or key factors. If, however, the coach had incorrectly diagnosed what had caused the athlete to perform the skill incorrectly then the incorrect comment would be tagged "Inappropriate."

Data Analysis

All software for the CAI is menu driven and the data automatically stored for later analysis. The analysis software processes the recorded data and checks for syntactic (spelling) and logical error (integrity of the data structure). (Error checking is necessary since the data file might be modified, for whatever reason, following its construction using any text editor.) If errors are detected then the analyst is presented with an informed error message including the location of the data error within the file. The data must be error free for the software to proceed with the analysis. Table 1, 2, & 3.

Preparation of the Videotape and Transcript

During a practice session a coach was videotaped using a Panasonic AG-170 VHS Camcorder as they worked with their athletes. The coach also wore a wireless microphone which captured the verbal behaviors expressed. Prior to the practice the coach provided a written outline of the drills they intended to use and the key coaching points within each drill. This was used during the data collection phase as information is collected on how often the coach refers to these key factors. Data are collected on the coaches organizational and instructional behaviors from the videotape recording of the completed practice.

A transcript was prepared from the video practice segment to be analyzed before data collection. When the CAI is being used as a research tool the requirement of producing a full transcript is a necessary and worthwhile process. However, when being used by coaches as a self-assessment tool this stage can be eliminated. Familiarity with the operational definitions should make it possible to bypass the transcript and perform the coding process straight from videotape by using the pause function of the VCR after each definable comment. The transcript contained all the comments made by the coach as they instructed their athletes during the practice. Comments made by the coach in explaining to the athletes how a particular drill was to function were considered [TABULAR DATA FOR TABLE 3 OMITTED] "organizational" and were not transcribed. Comments made by the coach once a drill had commenced and the athletes were working are considered "instructional" and were fully transcribed.

During a second viewing of the videotape, the completed transcript was 1) separated into definable instructional comments. The CAI uses a form of event recording, with the units of instruction being the separate comments made by the coach (for a complete definition of a "definable comment" see Johnson & Franks, 1991). A limitation of the instrument is that data are not collected on the nonverbal postures or actions made by the coach during a practice The videotape was then reviewed a third time and each comment coded.

Reliability and Validity of the Coach Analysis Instrument

The reliability and validity of the instrument and observer was tested using intra- and inter-rater reliability measures. Intra-rater reliability assesses the degree of consistency (or reliability) of the observer over time and inter-rater reliability the degree of objectivity (or validity) between two independent observers. We used the percentage agreement statistic (Siedentop, 1976), which simply reflects the amount of agreement as a percentage of the total, to a criterion of 80% (Rushall, 1977).

Two subjects (X and Y), including the first author, volunteered participation in this study. Each subject has extensive knowledge and field experience in coaching and instruction. The subjects were provided with a videotaped record of a coaching session, a transcript, a practice plan outline and the CAI. (The transcript was a composite of X's transcript recorded on two separate occasions where discrepancies between the two records had been resolved.) The observers separated the transcript into comments on two different occasions (Day 1: XT1, YT1; Day 3:XT2 and YT2). The comments were then coded by X on three separate occasions (Day 4: XC1; Day 5:XC2 and Day 8: XC3) and by Y on only one occasion (Day 4: YC1). Reliability for the transcript analysis was achieved by identifying the number of agreements and disagreements between any two particular instances. Reliability for the coding of comments was analyzed using software which produced agreement coefficients between the data being compared.

Analysis of the Transcript

Analysis of the transcript revealed different numbers of comments between the observers and between days (XT1=142, XT2=146, YT1=104, YT2=1 04). Intra-rater reliability for each observer (XT1-XT2; YT1-YT2) revealed 5 discrepancies for X and 10 discrepancies for Y. These discrepancies did not affect the interpretation of the coaching session for the observer. While the increased consistency for X would suggest a learning effect (X had previous experience with transcript analysis), the low discrepancies would suggest reliability in the application of the operational definitions for both observers. Inter-rater reliability analysis (XT1-YT1; XT2-YT2), however, produced 24 and 27 discrepancies respectively. (Note that a discrepancy can account for a difference of more than one comment.) Many of these discrepancies resulted from Y frequently combining comments that X had viewed as distinct. Further analysis showed the instances of disagreement to be similar across the two analyses, indicating reliable individual interpretation of what a comment constituted. This allowed instances of inter-rater discrepancies to be investigated and the different interpretations of the operational definition to be addressed.

First, X split several long coaching interventions into a number of separate comments, for example, the intervention "OK, and relax" / coaching comment / "We'll change Gary over in a second" / coaching comment / "off you go", was interpreted as three organizational comments preceding and succeeding two main comments. Y interpreted such organizational comments as incidental to the intervention and not worthy of separation which seemed to better reflect the coach's intent. Second, Y failed to recognize change in "topic," for example, a reinstruction to an evaluation ("One touch, pass it in...That's a better ball"), or changes in target audience, for example, one player to another ("Play it back towards him... First touch, now get it in the area"). This latter point might, in part, be explained by Y's unfamiliarity with the players performing the drill.

Coding of Comments Intra-rater results exceeded 80% in all dimensions (Table 4). X, who coded previous coaching practices using the CAI, achieved reliability measures ranging 86% through 97%. These results support the suggestion that an individual can attain consistent observation once the definitions are rationalized. The higher results associated with codings 2 and 3 support the contention that reliability of coding increases with exposure to the instrument and definitions. Learning (and thus training) is clearly an important component of the observation process. Inter-rater results exceeded 80% in 5 of the 8 dimensions (Table 5), but failed to reach the criterion for the Method, Emphasis and Intent dimensions with values of 75%, 75%, and 62% respectively. These results indicate problems with observer validity. While the speed at which athletic performance and coaching comments arise can sometimes make coding difficult, this alone cannot account for the poor validity, since they do not manifest in the intra-rater measures also. The low inter-rater values therefore indicate that the specific definitions of the instrument are ambiguous and open to personal and situational interpretation.

Low validity in the dimension Method stemmed from the interpretation of when an athlete's performance was complete. For example, when one player completed his performance and a coaching comment was directed to him, ambiguity arose regarding whether this was comment was made "during" (as the drill progressed) or "post" (after performance was completed). It was apparent from analysis of the data files that difficulties in the dimension Emphasis were encountered when the coach made reference to the "key factors" without using the words of the lesson outline. However, there were also instances where, despite the words of the lesson outline being used, disagreement still occurred. The lack of objectivity cannot then be explained from the key factors definition. A lack of concentration or diligence on the part of the observers, however, might explain the low validity. Reliability of 62% in the dimension Intent could, in part, be attributed to the subjective nature with which such evaluation occurs. Factors such as the comment's nature, timing and affect will influence what is perceived as motivational from one observer to another. Inspection of the data isolated the repetitious use of nonspecific comments such as "Good job" or "Well done" as causing much of the disagreement. Finally, observer familiarity with the individuals involved in the practice may well have impacted the coding process. Changes in direction, method and emphasis can be missed or misinterpreted if the coaching comment does not clearly identify the player(s) to which a coaching comment is directed. X's familiarity with the players helped him recognize subtle changes in the direction of certain comments which, in turn, lead to alternative coding in subsequent dimensions. Note that a difference in coding in a higher dimension can have a concomitant effect on the lower dimensions.

Revisions

Disagreement between observers in transcript analysis and the failure of the inter-rater coefficients to meet the criterion necessitated a revision of the CAI. Hawkins and Dotson (1975) indicated that sources of error in obtaining accurate data through systematic observation can be attributed to the definition of behavior, the detection of behavior and or the competence of the observer. Analysis of the observers' transcripts and data files suggested that the low observer validity was primarily due to poor definitions of behavior. Failure to detect behavioral change and observer competence further reinforces the need for a training program in both observation and the use of the CAI.

Observer disagreement in the analysis of the transcript were considered to be due to a lack of clarity in the operational definition of a comment. Y failed, on a number of occasions, to recognize changes in direction, focus and method that should have signaled the beginning of a new comment. The operational definition was thus revised to more clearly identify those cues which distinguish individual comments. Different interpretations of what constituted an organizational comment were also addressed. Y acknowledged certain "organizational" comments as incidental to the main intervention comments and not worthy of analysis as individual comments. The "organization" definition was changed to reflect this interpretation.

Ambiguity in several descriptor definitions may have caused some of the observer disagreement when coding of comments. The dimension Method was revised to clearly indicate when an athlete's role within a drill was complete so that the method of the comment could be accurately coded. Some revisions were also changed to better reflect the coach's behavior and intentions. For example, the definition of individual(s) was revised to include any recognizable subset of athletes performing a similar function within a drill. Such interventions (conscious efforts to isolate particular performances) were previously coded as "group" and so failed to acknowledge this more personal coaching behavior. A further revision changed the code "reinstruct" to "instruct" to allow the introduction of new coaching information during a drill to now be coded. A follow-up study was undertaken after making the listed revisions to reassess the intra and inter-rater reliability.

X and Y were provided copies of the new definitions and repeated the data collection for the same coaching session. Y was given a cover page accompanying the new definitions indicating the nature of the changes that had been made. Each observer analyzed the transcript once before coding a "master" transcript prepared by X. No feedback regarding this task was given to either observer, as the purpose was simply to expose them to the new definitions. X and Y were then provided with a videotaped record, transcript, and practice plan outline of a different soccer coaching practice. The data collection process as previously detailed was then repeated and analyzed.

Results

Analysis of the Transcript Revisited

Analysis of the transcript once more revealed different numbers of comments between the observers and between days (XT1=63, XT2=69, YT1=74, YT2=71). Intra-rater reliability for each observer (XT1-XT2; YT1-YT2) now revealed 4 discrepancies for X and 12 discrepancies for Y. Inter-rater reliability analysis (XTI-YTI; XT2-YT2) was again less satisfactory with 18 and 23 discrepancies respectively. (The master transcript from which the comments were coded contained 69 comments.) (Note that a discrepancy can account for a difference of more than one comment.)

The intra-rater reliability results for X supported the earlier finding that repeated coding by the same observer can be highly reliable. X's involvement in the revision process appeared to provide sufficient understanding of the new definitions as to allow clear and consistent interpretation. Y, however, was less consistent than in the earlier study, a disappointing finding which can be accounted for on two counts. First, Y failed to recognize many changes in focus in the earlier study and despite increasing his overall recognition in the later study, did not manage to do so in a consistent fashion. Second, Y had previously interpreted certain "organizational" comments (e.g., intervention instructions and repositioning of players) as incidental to a main coaching comment - an interpretation which prompted the revision to the definition of "organization" - yet this very change caused subsequent interventions to be inconsistently interpreted. These differences were also observed in the inter-rater measures: 8 disagreements were attributed to X's interpretation of a change in focus and 6 disagreements when X interpreted long interventions as a single comment. The speed at which some comments were made may also have been a contributory factor. The manner in which some comments ran together, almost without pause, made the recognition of changes in focus additionally problematic.

Coding of Comments Revisited

Intra-rater agreement well exceeded 80% in all dimensions (see Table 6). Two factors contributed to the increase in Intra-observer reliability. First, as expected, the observer became more experienced in the use of the instrument and more adept at recognizing the appropriate descriptors to code each comment. Secondly, revisions made to specific definitions made coding easier and more apparent. Inter-observer reliability coefficients exceeded 80% in five dimensions (see Table 7) with the dimensions of Emphasis and Intent to once again be below the 80% criterion. Low reliability in the dimension Emphasis is a particular cause of concern if we are to direct coaches to attend to the key factors of performance before making appropriate intervention. The definition of "key factors" was not considered ambiguous, therefore greater observer attention to the lesson outline and its detailing of the key factors is warranted. Coding in the dimension Intent was, by far, the least reliable although it should be viewed with caution as it is based on a very small sample of possible comparisons (4) due to the low number of non-skill comments in the coaching session. Nonetheless, it does indicate that the two observers had different perceptions of what would constitute an "affective" comment. These perceptions may be credited to the characteristics of each observer; characteristics that will necessarily have been shaped by their own experiences in the coaching environment as a coach and/or player.

While objective coding of intent can be problematic, it is important that its data are collected. Studies by Tharp and Gallimore (1976), Lacy and Darst (1985), Claxton (1988), Segrave and Ciancio (1990), and Miller (1992) have all included measurement and analysis of comments in the affective domain. As a result a data base is emerging concerning the use of "praise," "scold," and "hustle" behaviors of successful coaches across a range of sports, age levels, and phases of season. Further, it is envisaged that the data gathered via the CAI will be used to promote collaborative discussion with the coach, rather than as a simple evaluative tool of coaching behavior. Differing perceptions of comments made in this domain can therefore be discussed in a cooperative forum and appropriate modifications in coaching behavior determined.

The most disappointing feature of the inter-observer results was the decrease in agreement in the Focus dimension (84% to 71%), particularly as the revisions had caused an increase in intra-observer agreements in the same dimension. Examination of the data showed three general sources of disagreement. Observer "competence" contributed to six instances of disagreement. Detailed viewing of these instances revealed that each comment did in fact fall into one specific descriptor definition. Five instances of contention arose over coding the comment as either a "correct" evaluation of performance, or as a "non-specific" comment. For example, the comment"Yes, that's it" with no informational content would, by definition, be coded as "non-specific." However, because the comment was so clearly linked to its preceding instructional comment, its coding as a "correct" skill related comment (at that level) could be viewed as more reflective of the coach's intentions although affording observer discretion to interpret this feature naturally increases the likelihood of disagreement between observers. The unique and dynamic nature of the coaching environment thus makes some disagreement inevitable.

The emergence of a third source of disagreement between observers caused the inter-rater reliability for Focus to fall below 80%. Y chose to code particular comments as "effort" in eight instances. Such comments included "Stay tight, now win," and "Be patient, patient." These comments had been coded as "instruct" by X as they referred to how the skill should be performed rather than to the intensity of the athletes work. Examination of these comments showed that, while the coach often increased his volume and intonation, the nature of the comment remained an instruction.

An encouraging feature of the inter-rater results was the increase in agreement in the dimension Direction (84% to 96%) and Method (75% to 82%). The increase in Direction seemed to be a direct and positive result of the revisions made to the definition of "individual(s)" and "group(s)." Coding a comment made to subgroups, for example, players performing a similar function (e.g., "wingers," "feeders"), was much easier to isolate and identify within the context of the drill. Similarly, increased reliability in Method could be attributed to the revision of the "post" definition. It is now clear that once an athlete has completed their performance within the drill, any comment made to them thereafter is "post," regardless of how the drill proceeds.

Discussion

The findings of this study clearly indicate that, with limited exposure to the CAI, an observer can reliably collect and analyze data on a coach's verbal behavior. This is an encouraging finding as one of the proposed uses of the instrument is for "novice" coaches to collect data on their own performance in order to analyze and, subsequently, improve their own coaching performance. However, the objectivity of the data collection process remains a concern because, despite revisions, there were still instances of disagreement between observers in analysis of the transcript and the coding process. These problems primarily are both definitional (different interpretations of the same event) and functional (difference in familiarity with the collection process and software interface). Training in observational procedures and usage of the CAI would be expected to eradicate many of the problems reported in this study.

A major source of disagreement and confusion in the transcript analysis were single coaching interventions that included incidental "organization" instructions, and coaching (technical/tactical) information. It was decided that these interventions should be coded as one coaching comment and that the organizational information (e.g., intervention instructions and the repositioning of players), while appropriate in the context of the drill, be ignored in terms of coding. However, it is clear that the resultant modification (a change in the definition of "organization"), while correct in nature, should have been applied to the operational definition of a comment. Had this been the case, the identification of a single comment would have been more clearly defined and consequently more reliable. This modification, which has since been attended to, would also have preempted many of the problems encountered in subsequent codings. The problems in the dimensions where inter-rater reliability measures for coding failed to reach criterion (Focus, Emphasis, and Intent) lie, not with descriptor definitions, but in the ability of the observers to accurately observe and detect coaching behaviors. Many instances of disagreement can be explained only as either observer errors or an inability to reliably detect definable coaching behavior.

We feel that this problem can be attended to by the creation of a videotape program designed to train the observational skills of the observer. This videotape will expose proposed users of the CAI to examples that they may encounter in the transcript analysis and coding processes. Franks and Miller (1991) found that a training regimen designed to improve the observation of soccer coaches did effect an improvement in a coach's ability to recall critical events of team play. Providing examples from the coaching setting, complete with explanations for appropriate separation and coding, observer competence and the ability to detect coaching behavior would be expected to enhance observational ability. The videotape, while providing examples across a range of coaching behaviors representative of a typical practice situation, will also focus on the problematic areas identified in this study. Once the training program is developed, our intention is to then conduct a more rigorous study to test the reliability of the CAI. Once acceptable reliability and validity is attained, it is envisaged that the CAI will prove a useful and practical tool for improving coaching effectiveness.

References

Briggs, J.D. (1991). The physical education systematic observation program (PE-SOP): An application oriented, computerized systematic observation program. The Physical Educator. Fall, 151-156.

Carlson, B.R., & McKenzie, T.L. (1984). Computer technology for recording, storing, and analyzing temporal data in physical activity settings. Journal of Teaching in Physical Education, 4, 24-29.

Claxton, B. (1988). A systematic observation of more and less successful High School Tennis Coaches. Journal of Teaching in Physical Education, 7, 302-310.

Darst, P.W., Zakrajsek, D.B., & Mancini, V.H. (1989). Analyzing physical education and sport instruction (2nd ed.). Champaign, IL: Human Kinetics.

Franks, I.M., Johnson, R.B., & Sinclair, G.D. (1988). The development of a computerized coaching analysis system for recording behavior in sporting environments. Journal of Teaching in Physical Education, 8, 23-32.

Franks, I.M., & Miller, G. (1991). Training coaches to observe and remember. Journal of Sport Sciences, 9, 285-297.

Franks, I.M., & Nagelkerke, P. (1988). The use of computer interactive video technology in sport analysis. Ergonomics, 31, 1593-1603.

Hawkins, R.P., & Dotson V.A (1975). Reliability scores that delude: An Alice in Wonderland trip through the misleading characteristics of interobserver agreement scores for interval recording. In E. Ramp & G. Semb (Eds.), Behavior Analysis: Areas of Research and Application. (pp. 339-376). Englewood Cliffs, N.J.: Prentice-Hall.

Hawkins, A.H., & Wiegand, R.L. (1989). West Virginia University teaching evaluation system and feedback taxonomy. In P.W. Darst, D.B Zakrajsek, V.H. Mancini (Eds.), Analyzing physical education and sport instruction, (pp. 277-293). Champaign, IL: Human Kinetics.

Johnson, R.B., & Franks, I.M. (1991). Measuring the reliability of a computer aided systematic observation instrument. Canadian Journal of Sport Science, 16, 45-57.

Lacy, A.C., & Darst, P.W. (1985). Systematic observation of winning football coaches. Journal of Teaching in Physical Education, 4, 256-270.

Lee, A.M., Keh, N.C., & Magill, R.A. (1993). Instructional effects of teacher feedback in physical education. Journal of Teaching in Physical Education, 12, 228-243.

Markland, R., & Martinek, T.J. (1988). Descriptive analysis of augmented feedback given to high school varsity female volleyball players. Journal of Teaching in Physical Education, 1, 289-301.

Miller, A.W. (1992). Systematic observation behavior similarities of various youth sport soccer coaches. The Physical Educator, 49, 136-143.

Partridge, D., & Franks, I.M. (1996). Analyzing and modifying coaching behaviors by means of computer aided observation. The Physical Educator, 53, 8-23.

Rushall, B.S. (1977). Two observational schedules for sporting and physical education environments. Canadian Journal of Applied Sport Sciences, 2, 1521.

Schmidt, R.A. (1988). Motor control and/earning. A behavioral emphasis. (2nd ed.). Champaign. IL: Human Kinetics.

Segrave, J.O., & Ciancio, C.A. (1990). An observational study of a successful Pop Warner Football Coach. Journal of Teaching in Physical Education, 9, 294-306.

Siedentop, D. (1976). Physical Education: Introductory Analysis. (2nd ed.), Dubuque, IA: Wm. C. Brown.

Siedentop, D. (1991). Developing teaching skills in physical education (3rd ed.I. Mountain View, CA: Mayfield.

Tharp, R.G., & Gallimore, R. (1976). What a coach can teach a teacher. Psychology Today, 25, 75-78.
Table 1

Sample of CAI Analysis File Administration
Details and Comment Summary Data

Number of Drills Recorded                    3

Total Number of Skill Comments               108

Number of Skill Comments                     81 (75%)

Number of Non-Skill Comments                 27 (25%)

Number of Inappropriate Comments             16 (15%)

Number of Demonstrations                     8


Table 2

Analysis of Skill Comments
Comment Summary

NUMBER of Skill Comments                                  81 (75%)

DIRECTION of Skill Comments            Individuals        77 (95%)
                                       Groups              4 (5%)

FOCUS of Skill Comments                Instruction        26 (32%)
                                       Correct            34 (42%)
                                       Incorrect          21 (26%)

TIMING of Comment Delivery             During             24 (30%)
                                       Post               49 (60%)
                                       Stopped             8 (10%)

REFERENCE to Key Factors               Key Factors        49 (60%)
                                       Non-Key Factors    32 (40%)

Number of INAPPROPRIATE Comments                          10 (12%)

Number of DEMONSTRATIONS                                   8


Table 4

Intra-observer (X) reliability (percentage agreement)

                     XC1-XC2      XC2-XC3      XC1-XC3      Mean
Dimension              (%)          (%)          (%)         (%)

Direction               94           98           95          96
Focus                   92           91           86          90
Focus Skill             96           98           96          97
Method                  90           93           90          91
Delivery                96           98           96          97
Emphasis                88           93           88          90
Focus Non-Skill         96           98           98          97
Intent                  90           82           87          86
Table 5

Inter-observer (X and Y) reliability (percentage agreement)

                     XCI-YCI      XC2-YC1      XCI-YCI      Mean
Dimension              (%)          (%)          (%)         (%)

Direction               82           85           86          84
Focus                   82           84           86          84
Focus Skill             83           83            3          83
Method                  74           76            5          75
Delivery                94           92            3          93
Emphasis                71           77            7          75
Focus Non-Skill         93           98            5          95
Intent                  60           55            0          62


Table 6

Intra-observer (x) reliability (percentage agreement) revisited

                     XC1-XC2      XC2-XC3      XC1-XC3      Mean
Dimension              (%)          (%)          (%)         (%)

Direction               99           94           96          96
Focus                   96           99           97          97
Focus Skill             95           93           91          93
Method                  96           98           95          96
Delivery               100          100           00         100
Emphasis                96           91           91          93
Focus Non-Skill        100          100          100         100
Intent                 100          100          100         100


Table 7

Inter-observer (X and Y) reliability (percentage agreement)
revisited

                     XCI-YCI      XC2-YC1      XCI-YCI      Mean
Dimension              (%)          (%)          (%)         (%)

Direction               97           96           96          96
Focus                   71           70           71          71
Focus Skill             80           82           88          83
Method                  83           82           82          82
Delivery                90           90           90          90
Emphasis                76           72           80          76
Focus Non-Skill        100          100          100         100
Intent                 100           50           50          50
Gale Copyright: Copyright 1996 Gale, Cengage Learning. All rights reserved.