Clinical significance of the Outcome Questionnaire (OQ-45.2).
Subject: Mental health
Psychotherapy
Medical research
Medicine, Experimental
Authors: Beckstead, D. Joel
Hatch, Arlin L.
Lambert, Michael J.
Eggett, Dennis L.
Goates, Melissa K.
Vermeersch, David A.
Pub Date: 01/01/2003
Publication: Name: The Behavior Analyst Today Publisher: Behavior Analyst Online Audience: Academic Format: Magazine/Journal Subject: Psychology and mental health Copyright: COPYRIGHT 2003 Behavior Analyst Online ISSN: 1539-4352
Issue: Date: Wntr, 2003 Source Volume: 4 Source Issue: 1
Product: Product Code: 8000200 Medical Research; 9105220 Health Research Programs; 8000240 Epilepsy & Muscle Disease R&D NAICS Code: 54171 Research and Development in the Physical, Engineering, and Life Sciences; 92312 Administration of Public Health Programs SIC Code: 8730 Research and Testing Services
Accession Number: 170020110
Full Text: The Outcome Questionnaire-45.2 (OQ-45.2) is purported to measure important areas of functioning (symptoms, interpersonal problems social role functioning and quality of life) that are of central interest in mental health. In recent years research employing the OQ-45.2 has focused on tracking patient change over time and indicating if and when patients return to a normal state of functioning as proposed by criteria for clinically significant change. This study examined the OQ-45.2 cut-off scores for clinical significance by comparing concordance rates with cut-off scores based on other measures of psychotherapy outcome. Instruments of each area of functioning were administered to patients undergoing psychotherapy at the beginning and end of treatment. Each patient's degree of success was then classified by each instrument and differences between the measures were examined. The results provided evidence for the construct validity of the concept of clinical significance and the OQ-45.2 cut-off scores demarking the boundaries for functional/dysfunctional samples. Correspondence between measure estimates for classifying patients as functional or dysfunctional averaged 85%. Estimates of agreement between measures classification of patients as meeting criteria for clinically significant change averaged 65%. Implications of these results were discussed in reference to use of the OQ-45.2 and the concept of clinical significance.

**********

Contemporary research focused on applied clinical questions often relies heavily on using operational definitions of meaningful change at the level of the individual patient. This research includes studies that explore the dose-effect relationship, i.e., the amount of therapy needed for recovery (e.g., Anderson & Lambert, 2001; Maling, Gurtman, & Howard, 1995). In such studies a definition of meaningful change (clinically significant change) allows researchers to estimate the number of sessions needed to meet such an event. In addition, studies aimed at improving the quality of services require operational definitions to judge a particular patient's treatment response and the need for additional services (Kordy, Hannover, & Richard, 2001; Lambert, Hansen & Finch, 2001; Lueger et al, 2001). Finally, the call for presentation of the results of clinical trials research that include estimates of the practical consequences of treatment for individual patients is widespread (e.g., Barlow, 198 1; Hugdahl & Ost, 1981; Kendall, 1999; Saunders, Howard, & Newman, 1988).

The most frequently used method for operationalizing clinical significance for the preceding research activities is that described by Jacobson and Truax (1991). They proposed a two step criteria. The first step entails an evaluation of reliable change by calculating a Reliable Change Index (RCI). As defined by Jacobson and Truax, the RCI is obtained by subtracting a pre-treatment score from a post-treatment score and dividing by the standard error of the measurement (Christensen & Mendoza, 1986 (1); Jacobson, Follette, & Revenstorff, 1984). A particular change is considered to be reliable when it exceeds measurement error at the .05 level of confidence.

The second step consists of defining a cut-off point between functional and dysfunctional samples. This cut-off represents the point at which a person's score is more likely to fall in the distribution of scores characteristic of normal functioning. The use of this social comparison methodology has the advantage of referencing a client's state of functioning against peer functioning rather than demanding that the client be asymptomatic in order to be considered healthy (Kendall & Grove, 1988; Kendall, Marrs-Garcia, Nath, & Sheldrick, 1999). When both the RCI and the normative group comparison criterion are met, the change is regarded as clinically significant according to the Jacobson method.

Despite widespread use of the Jacobson method (Ogles, Lunnen, & Bonesteel, 2001), little research has been conducted on its' validity. A review of the literature revealed only two studies that examined the validity of classifying a client's change as clinically meaningful using the Jacobson method. Ankuta and Ables (1993) were the first to address this question by comparing clients who met Jacobson's criteria with their self-rated satisfaction with therapy. They found that those clients who met criteria for clinically significant change were more satisfied with treatment than those who did not meet criteria. Lunnen and Ogles (1998) expanded on the Ankuta and Ables evaluation by performing a multiperspective, multivariable analysis of the RCI component of Jacobson's method. They divided outpatients into three groups based on reliable change: Improvers, No-changers, or Deterioraters. Clients in all three groups rated their perceived change, satisfaction with treatment, and the helping alliance. Their spouses/significant others also rated perceived change and satisfaction with treatment, and their therapist rated perceived change and the helping alliance.

Results indicated that perceived change and the alliance were significantly higher for those who showed reliable improvement than for those who were No-changers or Deterioraters from both client and therapist perspectives. Satisfaction with services did not differ across groups. None of the measures distinguished Deteriorators from No-changers from any of the perspectives. They concluded that the RCI is an effective method of evaluating symptomatic improvement, but a less valid indicator of deterioration. Ogles et al. (2001) called for more research into the validity of the Jacobson method as a means of operationalizing the concept of meaningful change.

The current study was undertaken in response to the need for further research on the concept of clinically significant change. In particular, this study was designed to examine the correspondence between the way different outcome measures classify a client, before and after treatment, as being a member of the functional or dysfunctional distribution. Reliable change classification following psychotherapy was also examined. The reference measure for the current study was the Outcome Questionnaire (OQ-45; Lambert & Finch, 1999). This measure has been used in several outcome management studies (Lambert, Whipple et al., 2001; Lambert, Whipple, et al., 2002; Lambert, Whipple, Bishop, et al., 2002) that rely on the concept of clinically significant change for clinical decision making in quality improvement efforts. The OQ was designed to measure four areas considered essential and theoretically related to change: 1) levels of psychiatric symptoms, 2) performance in various roles and activities, 3) interpersonal functioning, and 4) levels of life satisfaction or quality of life. Four comparison measures were chosen based on: (1) their solid psychometric properties; (2) their widespread use and presence in psychological literature (Froyd, Lambert, & Froyd, 1996), and; (3) their ability to effectively tap domains related to those included in the OQ.

The following instruments were chosen for use in the present study: the Symptom Checklist-90-Revised (SCL-90-R; Derogatis, 1983) for assessing symptomatic improvement; the Social Adjustment Rating Scale-Self Report/and Other Report (SAS-SR or OR; Weissman, Prusoff, Thompson, Harding, & Myers, 1978) for measuring social role performance; the Inventory of Interpersonal Problems-Short Form (IIP-S; Hansen, Umphress, & Lambert, 1998; Horowitz, Rosenberg, Baer, Ureno, & Villasenor, 1988) for assessing interpersonal problems; and the Quality of Life Inventory (QOLI; Frisch, 1988) for assessing life satisfaction. Additionally, the Client Satisfaction Questionnaire-8 (CSQ-8; Larson, Attkisson, Hargreaves, & Nguyen, 1979) was administered to assess the general level of satisfaction with psychotherapeutic services received.

It was hypothesized that there would be statistically significant concordance between the classification of each individual based on the OQ-45 functional/dysfunctional cut-off, and cut-offs derived from each of the four measures. It was ftirther hypothesized that each client's classification with regards to Jacobson's two- step criteria of clinically significant change would also be statistically significantly concordant. Finally, it was hypothesized that clients whose change was classified as clinically significant would be more satisfied with treatment than no-changers and deterioraters.

Methods

Participants

Clients. Clients for the study were drawn from individuals requesting psychotherapy services at a non-profit university training clinic (TC) and a university student-counseling center (CC). Given that the TC was a training clinic for graduate students, the following diagnostic categories are screened out to the extent possible, and are referred to licensed practitioners in the community: 1) acute psychotic disorders, 2) severe eating disorders, 3) primary serious drug (including alcohol) related disorders that might be treated in hospital or day treatment programs, and 4) immediate, high risk for suicide. The TC generally receives referrals from individuals in the community who are unable to afford psychotherapy services due to financial or insurance reasons. Consequently, many of the clients tend to come from lower income families. The TC serves a region of about 270,000 residents.

Unlike the TC, the CC has no restrictions on the type of diagnosis treated. Clientele at the CC are comprised of students enrolled at a large western university. All types of psychological services are offered at the CC, which serves a university population of about 35,000 students. Of the 86 adults who participated in the study, 51 (59%) were female, and 76 (88%) were Caucasian. Their average age was 28.9 years and their average education was 15.9 years.

Therapists. Graduate students who were supervised by licensed clinicians provided the therapy services at TC. The TC student therapists were enrolled in clinical psychology (doctoral), social work (masters), and marriage and family therapy (masters) programs. The CC therapists were licensed psychologists, supervised interns or graduate students (enrolled in clinical and counseling psychology doctoral programs).

Measures

Outcome Questionnaire-45.2. The OQ-45.2 is a 45-item self-report scale designed to track and measure client progress in psychotherapy. The scale is designed specifically with the purpose of being repeatedly administered (e.g., either pre- and post-treatment, or after every psychotherapy session), providing the psychotherapist with an assessment of progress, deterioration, or no change The items address common symptoms and problems (mostly depressive and anxiety-based) that occur across the most frequently occurring psychiatric disorders. Each item is rated using a 5-point Likert scale (0 = never, I rarely, 2 = sometimes, 3 = frequently, 4 = always), with a range of 0 to 4, yielding a range of possible scores from 0 to 180. The OQ45.2 provides a total score and three subscale scores. The three subscales are operationalizations of the three aspects of a client's life functioning--social role, symptom distress, and interpersonal relationships. Nine of the items measure the presence of positive mental states evenly divided across the three subscales. In this study, the Total Score (as opposed to the subscales) was utilized to estimate clinical significance. Lambert et al. (1996) reported the 3-week test-retest reliability for the total score to be .84. Additionally, internal consistency values were found to be high ([alpha] = .93). Concurrent validity was estimated by correlating the OQ-45.2 Total Score with corresponding total scores on the Symptom Checklist 90-Revised (SCL-90-R; Derogatis, 1983), Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961), Zung Self-Rating Anxiety Scale (ZAS; Zung, 1971), Zung Self-Rating Depression Scale (ZSRDS; Zung, 1965), Taylor Manifest Anxiety Scale (TMA; Taylor, 1953), State-Trait Anxiety Inventory (STAI; Spielberger, 1983, Spielberger, Gorsuch, & Lushene, 1970), Inventory of Interpersonal Problems (IIP; Horowitz et al., 1988), and the Social Adjustment Scale (SAS; Weissman & Bothwell, 1976). The concurrent validity for the total score was significant at the .01 level (ranging from.55-.85). Sensitivity to change of the OQ-45.2 has been reported by Vermeersch, Lambert, and Burlingame (2000).

Symptom Checklist-90-R. The SCL-90R is a ninety item self-report measure that is intended to measure current psychiatric symptom status. The SCL-90-R can be administered at pre- and post-treatment in order for the clinician to monitor changes in symptoms. Client disturbance is measured along nine primary symptom dimensions: somatization, obsessive-compulsive, interpersonal sensitivity, depression, anxiety, hostility, phobic anxiety, paranoid ideation, and psychoticism. The SCL-90-R provides the clinician with three global indices of distress. The Global Severity Index (GSI) was the primary index score used in this study and is recognized as the best indicator of pathological symptom disturbance on the SCL-90-R (Ogles, Lambert, & Masters, 1996). The items are based on a 5-point Likert scale (0 = not at all, 1 = a little bit, 2 = moderately, 3 quite a bit, 4 extremely).

The SCL-90-R has been demonstrated to have excellent psychometric properties. Derogatis (1983) reported high test-retest reliability with coefficients ranging between .78 and .90. Additionally, Derogatis, Rickels, and Rock (1976) found internal consistency estimates to be satisfactorily high, with alphas ranging from .77 to .90. The ability to measure group change with the SCL-90-R has been extensively studied and reported in numerous settings and with diverse populations (e.g., Baum, Gatchel, & Schaeffer, 1983; Peltz & Merskey, 1982). Research suggests the SCL-90-R has strong concurrent validity (Peveler & Fairbum, 1990).

Social Adjustment Rating Scale-Self Report/Other Report. The SAS-SR is a 54 item self-report questionnaire that measures performance over the past two weeks in six areas of functioning: work; domestic or academic responsibilities; social and leisure activities; relationship with extended family; marital role as a spouse; parental role; and membership in the family unit (Weissman et al., 1978). Questions on this measure generally fall into four domains: client performance at expected tasks, the amount of interpersonal discord, elements of interpersonal relationships, and personal feelings (e.g., feeling shame, upset, worry, or discomfort while fulfilling roles) and satisfactions (Weissman et al., 1978). Each item is rated on a five-point scale, with the higher score being indicative of greater distress or impairment. The scale provides the clinician with a mean score for each of the areas of functioning discussed above.

The SAS-SR has been found to have good psychometric properties. Fischer and Corcoran (1994) found the SAS-SR to have adequate test-retest reliability (r = .72). Additionally, this same report suggests that the SAS-SR has fair internal consistency (a = .74). There is also evidence to suggest that the SAS-SR has adequate concurrent validity (Weissman et al., 1978). The SAS-Other Report is much less used and its properties are virtually unknown.

Inventory of Interpersonal Problems-Short Form. The IIP-S (Hansen et al., 1998) is an 18-item self-report measure designed to assess stress and difficulty arising from interpersonal relationships and problems. The IIP-S is a shortened version of the 127-item IIP (Horowitz et al., 1988), designed to maintain the original purpose of the longer instrument. The items are ranked on a five -point scale (0 = not at all, I = a little bit, 2 = moderately, 3 = quite a bit, to 4 = extremely), yielding a range of possible scores from 0 to 72. The summed total score is a global score of interpersonal distress or dysfunction.

The IIP-S has adequate psychometric properties (Hansen et al., 1998). The test-retest reliability coefficient (over a three-week period) was .68, significant at the .01 level. Additionally, high coefficients of alpha were obtained for the internal consistency of the test ranging from .88 to .90. Concurrent validity estimates were obtained by correlating the IIP-S with both the IFP and the SCL-90-R-Results indicated that the IIP-S has concurrent validity coefficients comparable to the longer version of the IIP. Additionally, it was reported that the IIP-S showed excellent ability to discriminate between symptomatic and asymptomatic states, suggesting the measure could discriminate between various interpersonal states.

Quality of Life Inventory. The QOLI is a 34-item self-report questionnaire designed to assess general life satisfaction. The scale measures 17 areas of life satisfaction that were empirically derived from previous research on this topic (Frisch, Cornell, Villanueva, & Retzlaff, 1992). Clients rate each of the seventeen items as to how important each area is to their overall happiness (0 = not important, 1 = important, 2 = very important). Current levels of satisfaction are then rated for each life area 3 = very dissatisfied to 3 = very satisfied.).

The QOLI has been demonstrated to have excellent psychometric properties. Frisch et al. (1992) reported that the QOLI has high test-retest reliability with coefficients of .91 and .80. High internal consistency was also reported, with Cronbach's coefficient alphas ranging from .77 to .89. The same study reported strong evidence for the instrument's convergent validity, given that QOLI scores "correlated significantly with seven measures of subjective well-being and life satisfaction, which included five widely used self-report measures, a peer rating measure, and a measure consisting of a clinical rating of interviews" (p. 96).

Additionally, evidence for nomological validity was offered, in that the QOLI "correlates, in the expected ways, with other psychological constructs of theoretical importance to the concept of life satisfaction" (p. 96). Specifically, the QOLI was found to positively correlate with measures of happiness, life satisfaction, and general self-efficacy, and negatively correlate with measures of anxiety, depression, and general psychopathology. Preliminary indications suggest that the QOLI is sensitive to change, but the samples involved were small, and more research is needed.

Client Satisfaction Questionnaire-8.

The CSQ-8 is an 8-item self-report questionnaire that measures post-treatment client satisfaction with therapy. The CSQ-8 is the shortened form of a 3 1 -item post service questionnaire (CSQ-31). The original scale was designed to assess the following nine areas of client satisfaction: physical surroundings, support staff, type of service, treatment staff, quality of service, quantity of service (amount or length), outcome of service, general satisfaction, and procedures. From the original 31 items, 8 items were selected that had the highest loadings on the general satisfaction factor. The client rates each item on a 4-point scale (1 = poor, 2 = fair, 3 = good, 4 = excellent) yielding a global score with a range of score from 0 to 32. This global score is then compared to the appropriate norm group score.

The CSQ-8 has been demonstrated to have excellent psychometric properties. Larson et al. (1979) reported that the instrument had excellent internal consistency with alphas ranging from .83 to .93. The construct validity of the CSQ-8 has also been established. Research has shown statistically significant correlations between the CSQ-8 and other instruments of client satisfaction (Larson et al.).

Procedures

Prospective participants from the TC were contacted and informed that the study involved both a pre- and a post-test. Individuals agreeing to participate were scheduled to meet with an undergraduate research assistant who had been trained in the test administration and protocol of the instruments in the battery. Similar protocol was followed at the CC, with the exception of how clients were invited to participate in the study. At the CC, information regarding the study was included in client intake packets, explaining the study and inviting them to participate.

In the pre-test assessment battery, the following tests were included: Outcome Questionnaire 45.2, Symptom Checklist 90-R, Quality of Life Inventory, Inventory of Interpersonal Problems, Social Adjustment Rating Scale-Self Report, and the Social Adjustment Rating Scale-Significant Other. The Social Adjustment Rating Scale-Significant Other (SAS-SO) contains the same questions as the SAS-SR, but is filled out by a significant other. Upon completion of these tests, the client was given an addressed envelope with the instructions that the questionnaire, which would be completed by a significant other, should be returned in the envelope as soon as it had been completed.

A research assistant kept track of when each of the clients attended therapy. As soon as participants terminated therapy or did not attend for a two-week period regardless of the reason, they were contacted and arrangements were made to complete the post-testing. This post-test protocol was followed to increase the likelihood that participants would comply with post-testing, and to maximize the likelihood of observing changes that were made in therapy. Those clients who remained in therapy were asked to complete the post-test battery following their tenth session. The tenth session was selected for post-test given that research suggests that a substantial number of clients have made clinically significant or reliable change by this point in therapy (Anderson & Lambert, 2001). The post-test battery included the same assessments that were given at pre-testing, with the addition of the Client Satisfaction Questionnaire-8.

Establishment of Cutoffs and RCIs.

Prior to beginning the data collection portion of the study, published normative data were gathered for all of the tests used in the assessment battery. These normative data were used to calculate cutoff scores and reliable change indices for each measure. The establishment of the cutoff score was done following the clinical significance methodology outlined by Jacobson and Truax (199 1), using the following formula: [c.bar] = [([SD.sub.1])([mean.sub.2]) + ([SD.sub.2])([mean.sub.1])/([SD.sub.1] + [SD.sub.2])

The establishment of the cutoff score allowed each of the participant's tests to be classified as either being in the functional population or the dysfunctional population. This classification suggested that a given participant obtained a standardized score on each assessment that was either more similar to those who appeared to be in the normal range or the clinical range on that particular test. The reliable change indices (RCI) were also calculated prior to data collection. Using the criteria established by Jacobson and Truax (199 1), the following formula was utilized: RCI = [(pre-test)--(post-test)] / [S.sub.diff] = X

If X (from the above RCI formula) is greater than 1.96, this suggests reliable change at the .05 alpha level of confidence.

The standard error of difference was computed by using the internal consistency value of the particular test and a pooled standard deviation, which resulted in an estimate of the test's standard error of measurement. Following procedures recommended by Jacobson and Truax (199 1), estimates to complete the RCI and normative cutoff score were obtained from available published norms rather than the participants in the current investigation.

For the OQ-45.2, the SCL-90-R, and the IIP-S, there existed previous research in which the RCIs and cutoff scores had been calculated. For the OQ-45.2, community samples and outpatient clinic samples were used in the calculation (Lambert et al., 1996). The SCL-90-R RCI and cutoff were calculated using a moderately symptomatic sample and community samples (Tingey, Lambert, Burlingame, & Hansen, 1996). The IIP-S RCI and cutoff were calculated using a community sample, and a sample from an outpatient clinic (Hansen, et al., 1998). The QOLI RCI and cutoff scores were tabulated from a counseling center sample and from a general undergraduate population (Frisch et al., 1992). The SAS-SR and the SAS-SO RCIs and cutoffs were both computed from the same normative data gathered on the SAS-SR, which was comprised of acute depressives and a community sample.

Statistical Analyses

Prior to the statistical analyses, each of the test scores (OQ-45.2, SCL-90-R, IIP-S, SAS-SR, SAS-SO, QOLI) was transformed to a z-score with a cutoff score of 0 and clinically significant change score of 1. This was done by taking the score on the test, subtracting the cutoff score and dividing by the clinical change score (i.e., RCI). Transformations thereby created an average cutoff score of 0 and a clinically significant change score of 1 for each test. For convenience in interpreting the data, each score was multiplied by ten and added to 100. This created a cutoff score of 100 and a clinically significant change score of 10, allowing each of the measures to be analyzed and interpreted on the same scale.

[(x - [c.bar])(10) / (RCI) + 100

Intra-class correlation coefficients were calculated to estimate the reliability of measures across participants. The intra-class corTelation was calculated using the formula:

rho = [rho].sup.2.sub.[tau]]/[rho].sup.2.sub.E]+[rho].sup.2.sub.[tau]]

Traditionally, coefficients higher than .70 have indicated that there is adequate inter-rater (in this case measure) reliability (Howell, 1997).

Three intra-class correlations were computed. The first two correlations estimated how highly the scores of the clients on the measures (raters) at pre- and post-test correlated. The third intra-class correlation estimated the correlation of the change scores on the different tests across clients, from pre- to post-test. In addition to the intra-class correlation, concordance rates were calculated. Each client's score was dichotomized into a "1" or a "0" on pre-test, post-test, and with reference to clinical significance. A" P on pre- or post-test indicated that the client scored in the non clinical range on the particular test, while a "0" indicated that the client scored in the clinical range. Similarly, a "1," with reference to clinical significance, indicated that the client changed in a clinically significant way (exceeded both criterion) on the test from pre- to post-test. A "0" indicated that clinical significance had not been reached. From each client's dichotomized scores on all of the tests, concordance (i.e., percent agreement) was calculated individually and collectively at preand post-test (in terms of whether clients were classified as resembling clinical or non-clinical populations) and based on whether clinically significant change was achieved.

Ten chi-square tests for independence were performed ([p.bar] < .05) comparing the OQ-45.2 with each of the other individual tests at pre- and post-test. The scores at pre- and post-test were dichotomized (0 = dysfunctional range; 1 = functional range). Additionally, 5 chi-square tests for independence were conducted on the OQ-45.2 as it compared to the other tests after each client was classified as having met or not met clinical significance at post-test (1 = clinically significant change; 0 = non-clinically significant change). In cases where the cell size was less than five, the Fisher's Exact Test was calculated. A Bonferonni correction set the alpha level at [p.bar] < .0 1.

A [t.bar]-test was conducted to determine if individuals who achieved clinical significance on the OQ-45.2 had higher levels of client satisfaction than individuals who did not reach clinical significance. The level of client satisfaction was determined using the Client Satisfaction Questionnaire-8 (CSQ-8).

Results

The results are divided into three analyses. The first considers concordance rates between the OQ-45.2 and the other tests across clients. The second considers intra-class correlations, and the third considers the chi-square analyses.

For the pre-test the mean percent concordance was found to be 75%, at post-test it was 77.5%, and for clinical significance it was determined to be 66.2%, with less than one-half (43%) of the clients being classified perfectly across all six measures at pre- and post-testing. Chance percent concordance between the OQ-45.2 and all five other measures classifying similarly would be 3% (note, that with a .5 probability for each of the other instruments to be classified the same as the OQ-45.2, chance concordance is calculated by (.5) (5).

Table 1 provides percent concordance levels (across measures) for the clients at pre-test, post-test, and for clinical significance. Note that at pre-test, at least three out of the five comparative measures agreed with the OQ-45.2 classification as clinical or non-clinical 85% of the time. At post-test, at least three out of the five measures agreed with the OQ-45.2 classifications in 82.2% of the cases. Additionally, 64.6% of the time, at least 3 out of the 5 measurements agreed with the OQ-45.2 classification as meeting or not meeting criteria for clinically significant change.

In terms of classification, the SCL-90-R identified the most individuals as dysfunctional at pre-test (76%), followed by the QOLI (65%), the OQ-45.2 (59%), the IIP (57%), the SAS-SR (57%), and the SAS-SO (49%). At post-test the OQ-45.2 classified the most individuals as functional (68%), followed by the SAS-SO (62%), IIP (57%), SCL-90-R (57%), SAS-SR (52%), and the QOLI (45%). The OQ45.2 classified the most individuals as having met the criteria for clinically significant change (32%) followed by the SCL-90-R (23%), IIP (20%), QOLI (18%), SAS-SR (I I%), SAS-SO (7%).

The use of the SCL-90-R as the primary instrument of comparison with reference to clinically significant change was also explored. However, the data indicated that although more individuals were classified as dysfunctional at pre-test (76% compared to 59% on the OQ-45.2), fewer individuals met criteria for clinically significant change (23% compared to 32 % on the OQ-45.2). These data suggest that the SCL-90-R may be less sensitive to the effects of psychotherapy that the OQ-45.2.

An intra-class correlation was performed on scores at pre-test, post-test and the gain scores. Results from the intra-class correlations indicated high correlations. The pre-test correlation was .835 ([n.bar] = 86); the post-test correlation was .870 ([n.bar] = 56) and the gain score correlation was .849 ([n.bar] = 56). In the post-test and the gain correlations, the SAS-SO was left out to increase the n from 42 and 40 ([n.bar] of post-test and gain correlations, respectively), to an [n.bar] of 56. With the SAS-SO, the correlations were.867 (post-test) and.835 (gain). These high intra-class correlations offer support for the OQ-45.2's ability to classify clients (as functional or dysfunctional) and classify change (as clinically significant or non-clinically significant) in a manner fairly commensurate with the other instruments used in the study. The chi-square analyses at post-test and clinical significance suggest that the SAS-SO classified clients in a manner independent of the OQ-45.2. The chi-square analyses provided general support (i.e., significant independent tests) for the OQ-45.2, SCL-90-R, QOLI, SAS-SR, and IIP-S classifying clients similarly at pre-test ([p.bar]-values ranging from .002 to >.00 1), post-test ([p.bar]-values ranging from .001 to <.001), and for identifying clinically significant change ([p.bar]-values ranging from .005 to .00 1).

Relationship of Satisfaction to OQ-45.2 Categorization

A two-tailed [t.bar]-test was conducted comparing the means of the following two groups: Group 1 ([n.bar] = 13) was comprised of clients' score on the Client Satisfaction Questionnaire-8 (CSQ-8) for those who met criteria for clinical significance on the OQ-45.2, while Group 2 ([n.bar] = 12) was comprised of clients' score on the CSQ-8 who did not meet the clinical significance criteria on the OQ-45.2. Graphical evaluation (using a Q-Q plot) of the data suggested normality, and Levene's Test for Homogeneity of Variance suggested that equal variance between the two groups could be assumed ([F.bar] = . 176, [p.bar] = .679). The [t.bar]-value was 1.938 with 23 degrees of freedom, and was not significant ([p.bar] = .065).

Discussion

The concept of clinical significance, as operationalized through the application of Jacobson and Truax's (1991) formulas, can help bridge the gap between clinical research and clinical practice by examining the importance of each individual client's treatment response (e.g., Kendall et al., 1999). Despite the intuitive appeal of this method and its frequent use in recent research (Ogles et al., 2001), little evidence exists supporting the validity of this method. The current study examined the degree to which classifications for clinical significance, based on a brief self-report scale (i.e., the OQ-45.2), would be consistent with classifications based on other frequently used outcome measures.

The results suggested that there was concordance with many clients being classified perfectly across all six measures at pre- and post-testing and even higher concordance when the standard of agreement was set at three or more of the five other measures. If one test suggests a client is in the dysfunctional range, other tests as a (group) make the same classification the majority of the time.

Findings were similar with regards to classifying client change over time. When a client: 1) begins treatment in the dysfunctional range; 2) changes by 14 or more points on the OQ-45.2; and 3) passes the OQ-45.2 cutoff score of 63, he or she is regarded as having made clinically significant change. Thirty-four of the clients were included in the percent concordance calculations for clinically significant change, given that the remaining individuals (who completed pre- and post-testing) began treatment in a functional range on the OQ-45.2. Having commenced treatment at a functional level, it was not possible for these individuals to achieve clinical significance as outlined by Jacobson and colleagues (e.g., Jacobson, Roberts, Berns, & McGlinchey, 1999). There was considerable concordance with the clients being classified as having reached clinically significant change on the OQ-45.2 when compared to the other instruments classification of clinically significant change. It was rare (in 3 out of 34 clients) to see a complete lack of agreement between the OQ-45.2 and all other measures. Similar evidence for the agreement in classification of clinical significance was found in the intraclass correlation of the gain score and the chi-square analyses. The OQ-45.2 appeared to be most similar to the SCL-90-R in classifying clinically significant change.

The above results must be interpreted in the context of factors that undoubtedly influenced the degree of concordance that could be expected. The measures were each constructed to measure different aspects of client functioning. The SCL-90-R was designed to assess client levels of symptom distress, the SAS purports to measure levels of social adjustment and role performance, the IIP-S was intended to tap interpersonal functioning, and the QOLI was constructed to measure overall client satisfaction with life quality. The OQ-45.2 was intended to tap all these domains of functioning but with fewer items devoted to assessing each domain. Concordance could only be expected to be perfect if movement on all domains occurred at similar rates across the relatively brief amount of therapy offered in the current study. Several research studies suggest that different types of problems change at different rate (e.g., Barkham et al., 1996; Hansen, Lambert, & Forman, 2001; Maling et. al, 1995). In addition, the cutoff scores for RCI and functional/dysfunctional states were derived from different samples. For example, outpatient and community/college student normative samples were used to calculate the cutoffs and RCIs of the SCL-90-R, the IIP-S, and the OQ-45.2. Norms used for the SAS were comprised of acute depressives and a community sample, while norms for the QOLI came from counseling center clients and non-client undergraduates.

These differences likely resulted in differences in some of the cutoffs and RCIs being more or less conservative. The current study could not sort out the contribution of different content areas verses sample characteristics that may affect cut scores. For example, the OQ-45.2 and SCL-90-R differed in percent of clients who were classified as functional and dysfunctional at pre-test, with the SCL-90-R identifying more persons in the dysfunctional range. The SAS-SO, in contrast, classified the fewest individuals as dysfunctional at pre-test. At post-test, the QOLI identified the most individuals as being dysfunctional, compared to the other measures used in the study. Additionally, the SAS-SO and SAS-SR had the fewest persons identified as having met clinically significant change from pre- to post-test, while the OQ-45.2 identified the most cases as experiencing clinically significant change. These noteworthy differences of the specific measures for classifying clients and assessing change should be pursued in future research.

Concordance rates were probably raised by the preponderance of self-report measures. Future research could better follow the "tripartite" model for assessing client change (Strupp & Hadley, 1977) by more effectively taking the significant other perspective into account than was done in the present study, as well as including the therapist perspective in evaluating client functioning. It is anticipated that concordance rates between measures which more extensively evaluate the significant other perspective and that consider therapist and expert judge evaluations could be lower than those based on self report as found in the present study.

Based on the present results, researchers and clinicians who employ the measures used in the present study can expect differences in the degree to which outcomes will vary as a function of instrument choice. The cutoff scores on the OQ-45.2 found 32% of patients meeting criteria for clinically significant change while the other measures ranged from 7% to 23%. This suggests that the OQ-45.2 is likely to provide relatively high estimates of change. Whether this is due to item content, cut scores, or actual increased sensitivity to the effects of treatment could not be determined. If these results hold up across replications it would suggest that, at least with low treatment doses, the OQ-45.2 would be the preferred measure since it attempts to measure domains common across all of the other measures. The SAS measures would be least preferred since they classify a similar number of patients dysfunctional at pre-test but classify very few patients as achieving clinically significant change at termination. The failure of significant other ratings of the client to correspond better with self-report ratings is not uncommon (e.g., Lunnen & Ogles, 1998). It supports the need to include additional sources for evaluating clinically significant change in future research.

There was a lack of concordance between satisfaction and clinically significant change as measured by the OQ-45.2 (i.e., there was not a significant difference between the means of those clients who met clinical significance criteria on the OQ-45.2 and those who did not). This finding is consistent with other findings that suggest a tendency for high satisfaction ratings despite therapy outcome (Lunnen & Ogles, 1998; Pekarik & Wolff, 1996). This tendency may be a result of demand characteristics, or a lack of "appropriate range of items reporting dissatisfaction7' (Lunnen & Ogles, p. 407).

The concept of clinically significant change is important in bridging the gap between research and clinical practice by helping turn attention from group averages back to the individual client. While promising, much more research on the validity of methods for classifying clinically meaningful change is needed before confidence can be placed in these methods.

References

Anderson, E.M., & Lambert, M.J. (2001). A surivival analysis of clinically significant change in outpatient psychotherapy. Journal of Clinical Psychology, 57, 875-888.

Ankuta, G.Y., & Ables, N. (1993). Client satisfaction, clinical significance, and meaningful change in psychotherapy. Professional Psychology Research and Practice, 24, 70-74.

Barkham, M., Rees, A., Stiles, W.B., Shapiro, D.A., Hardy, G.E., & Reynolds, S. (1996). Dose-effect relations in time-limited psychotherapy for depression. Journal of Consulting and Clinical Psychology, 64, 927-935.

Barlow, D.H. (1981). On the relation of clinical research to clinical practice. Current issues, new directions. Journal of Consulting and Clinical Psychology, 49, 147-155.

Baum, A., Gatchel, R.J., & Schaeffer, M.A. (1983). Emotional, behavioral, physiological effects of chronic stress at Three Mile Island. Journal of Consulting and Clinical Psychology, 51, 565-572.

Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571.

Christensen, L., & Mendoza, J.L. (1986). A method of assessing change in a single client. An alteration of the RC index. Behavior Therapy, 17, 305-308.

Derogatis, L.R. (1983). The SCL-90-R: Administration, Scoring and Procedures Manual-II. Towson, MD: Clinical Psychometric Research.

Derogatis, L.R., Rickels, K., & Rock, A. (1976). The SCL-90 and the MMPI: A step in the validation of a new self-report scale. British Journal of Psychiatry, 128, 280-289.

Edwards, D.W., Yarvis R.M., Mueller, D.P., Zingale, H.C., & Wagman, W.J. (1978). Test-taking and the stability of adjustment sales: Can we assess patient deterioration? Evaluation Quarterly, 2, 275-292.

Fischer, J., & Corcoran, K. (1994). Measures for Clinical Practice (Vol. 2). NY: Free Press. Frisch, B.M. (1998). Quality of Life Inventory. Unpublished manuscript. Frisch, B.M., Cornell, J., Villanueva, M., & Retzlaff, P.J. (1992). Clinical validation of the Quality of Life Inventory: A measure of life satisfaction for use in treatment planning and outcome assessment. Psychological Assessment, 4, 92-101.

Froyd, J.E., Lambert, M.J., & Froyd, J.D. (1996). A review of practices of psychotherapy outcome measurement. Journal of Mental Health, 5, 11-15.

Gladis, M.M., Gosch, E.A., Dishuk, N.M., & Crits-Christoph, P. 1999). Quality of life: Expanding the scope of clinical significance. Journal of Consulting and Clinical Psychology, 67, 320-331.

Hansen, N.B., Lambert, M.J., & Forman, E.M. (2001) Comparisons of clinically significant change in clinical trials and naturalistic settings: Clinical Psychology: Science and Practice,

Hansen, N.B., Umphress, V., & Lambert, M.J. (1998). The reliability and validity of a short form of the Inventory of Interpersonal Problems. Journal of Psychoeducational Assessment, 16, 201-214.

Horowitz, L.M., Rosenberg, S.E., Baer, B.A., Ureno, G., & Villasenor, V.S. (1998). Inventory of Interpersonal Problems: Psychometric properties and clinical applications. Journal of Consulting and Clinical Psychology, 57, 599-606.

Howell, D.C. (1997). Statistical methods for psychology (4th Ed.). Boston, MA: Duxbury Press.

Hsu, L.M. (1989). Reliable changes in psychotherapy: Taking into account regression toward the mean. Behavioral Assessment, 11, 459-467.

Hugdahl, K., & Ost, L. (1981). On the difference between statistical and clinical significance. Behavioral Assessment, 3, 289-295.

Jacobson, N.S., Follette, W.C. & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15, 336-352.

Jacobson, N.S., Roberts, L.J., Berns, S.B., & McGlinchey, J.B. (1999). Methods for defining and determining the clinical significance of treatment effects: Description, application, and alternatives. Journal of Consulting and Clinical Psychology, 67, 300-307.

Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19.

Kendall, P.C. (1999). Clinical significance. Journal of Consulting and Clinical Psychology, 67, 383-384.

Kendall, P.C., & Grove, W.M. (1988). Normative comparisons in therapy outcome. Behavioral Assessment, 10, 147-158.

Kendall, P.C., Marrs-Garcia, A., Nath, S.R., & Sheldrick R.C. (1999). Normative comparisons for the evaluation of clinical significance. Journal of Consulting and Clinical Psychology, 67, 285-299.

Kordy, H., Hannover, W., & Richard, M. (2001). Computer-assisted feedback-driven quality management for psychotherapy: The Stuttgart-Heidelberg model. Journal of Consulting and Clinical Psychology, 69, 173-183.

Lambert, M.J., & Finch, A.E. (1999). The Outcome Questionnaire. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcomes assessment (2nd ed., pp. 831-869). Mahwah, NJ: Erlbaum.

Lambert, M.J., Hansen, N.B., & Finch, A.E. (2001). Patient-focused research: Using patient outcome data to enhance treatment effects. Journal of Consulting and Clinical Psychology, 69, 159-172.

Lambert, M.J., Hansen, N.B., Umphress, V., Lunnen, K., Okiishi, J., Burlingame, G.M., & Reisenger, C.W. (1996) Administration and Scoring Manual for the OQ-45.2.

Stevenson, MD: American Professional Credentialing Services LLC.

Lambert, M.J., Whipple, J.L., Smart, D.W., Vermeersch, D.A., Nielsen, S.L., & Hawkins, E. J. (2001). The effects of providing therapists with feedback on patient progress during psychotherapy: Are outcomes enhanced? Psychotherapy Research, 11(1), 49-68.

Lambert, M.J. Whipple, J.L., Bishop, M.J., Vermeersch, D.A., Gray, G.V., & Finch, A.E. (2002). Comparisons of empirically derived and rationally derived methods for identifying patients at risk for treatment failure. Clinical Psychology and Psychotherapy, 9, 149-164.

Lambert, M.J., Whipple, J.L., Vermeersch, D.A., Smart, D.W., Hawkins, E.J., Nielsen, S.L., & Goates, M. (2002). Providing Therapists with feedback on patient progress as a method of enhancing psychotherapy outcomes. A replication. Clinical Psychology and Psychotherapy, 9, 91-103.

Larson, D.L., Attkisson, C.C., Hargreaves, W.A., & Nguyen, T. D. (1979). Assessment of client/patient satisfaction in human service programs: Development of a general scale. Evaluation and Program Planning, 2, 197-207.

Lueger, R.J., Howard, K.I., Martinovitch, Z., Lutz, W., Anderson, E.E., & Grissom, G. (2001). Assessing treatment progress of individual patients using expected treatment response models. Journal of Consulting and Clinical Psychology, 69, 150-158.

Lunne, K.M., & Ogles, B.A. (1998). A multiperspective, multivariable evaluation of reliable change. Journal of Consulting and Clinical Psychology, 66, 400-410.

Maling, M.S., Gurtman, M.B., & Howard, K.I. (1995). The response of interpersonal problems to varying doses of psychotherapy. Psychotherapy Research, 5, 63-75.

Nunally, J.C., & Kotsch, W.E. (1983). Studies of individual subjects: Logic and methods of analysis: British Journal of Clinical Psychology, 22, 83-93.

Ogles, B.M., Lambert, M.J., & Masters, K.S. (1996). Assessing outcome in clinical practice. Boston: Allyn & Bacon.

Ogles, B.M., Lunnen, K.M., & Bonesteel, K. (2001). Clinical significance: History, definitions and applications. Clinical Psychology Review, 21, 421-446.

Pekarik, G., & Wolff, C.B. (1996). Relationship of satisfation to symptom change, follow-up adjustment, and clinical significance. Professional Psychology: Research and Practice, 27, 202-208.

Pelz, M., & Merskey, H. (1982). A description of the psychological effects of chronic painful lesions. Pain, 14, 293-301.

Pelz, M., Merskey, H. (1982). A description of the psychological effects of chronic painful lesions. Pain, 14, 293-301.

Peveler, R.C., & Fairburn, C.G. (1990). Measurement of neurotic symptoms by self-report questionnaire: Validity of the SCL-90-R. Psychological Medicine, 20, 873-879.

Saunders, S.M., Howard, K.I., & Newman, F.L. (1988). Evaluating the clinical significance of treatment effects: Norms and normality. Behavioral Assessment, 10, 207-218.

Speer, D.C., & Greenbaum, P. (1995). Five methods for computing significant individual client change and improvement rates: Support for an individual growth curve approach. Journal of Consulting and Clinical Psychology, 63, 1044-1048.

Spielberg, C.D. (1983). Manual for the State-Trait Anxiety Inventory STAI (Form Y). Palo Alto, CA: Consulting Psychologits Press.

Spielberg, C.D., Gorsuch, R.L., & Lushene, R.E. (1970). The State-Trait Anxiety Inventory Self Evaluation Questionnaire. Palo Alto, CA: Consulting Psychologists Press.

Strupp, H.H., & Hadley, S.W. (1977). A tripartite model of mental health and thrapeutic outcome: With special reference to negative effects in psychtherapy. American Psychologist, 32, 187-196.

Taylor, J.A. (1953). A personality scale of manifest anxiety. Journal of Abnormal and Social Psychology, 48, 285-290.

Tingey, R.C., Lambert, M.J., Burlingame, G.M., & Hansen, N. B. (1996). Assessing clinical significance: Proposed extension to method. Psychotherapy Research, 6, 109-123.

Vermeersch, D.A., Lambert, M.J., & Burlingame, G.M. (2000). Outcome Questionnaire: Item sensitivity to change. Journal of Personality Assessment, 74, 242-261.

Weissman, M.M., & Bothwell, S. (1976). The assessment of social adjustment by patient self-report. Archives of General Psychiatry, 33, 1111-1115.

Weissman, M.M., Prusoff, B., Thompson, D., Harding, P., & Myers, J. (1978). Social adjustment by self-report in a community sample and in psychiatric outpatients. Journal of Nervous and Mental Disease, 166, 317-326.

Zung, W.W.K. (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63-70.

Zung, W.W.K. (1971). A rating instrument for anxiety disorders. Psychosomatics, 12, 371-379.

(1) Numerous other less commonly used statistical methods for determining reliable change exist (Edwards, Yarvis, Mueller, Zingale, & Wagman, 1978; Hsu, 1989; Jacobson & Truax, 1991; Nunnally & Kotsch, 1983). Speer and Greenbaum (1995) can be consulted for a comparative analysis of these less frequently used statistical methods. In their analysis (Speer & Greenbaum), they endorse using the Jacobson and Truax pre-post difference approach for the following three reasons: 1) it circumvents statistical complications related to "residualized true score adjustments" (p. 1047), 2) its calculation is unambiguous, and 3) there is a growing literature base that reports change using this method.

D. Joel Beckstead

Arlin L. Hatch

Michael J. Lambert

Dennis L. Eggett

Melissa K. Goates

David A. Vermeersch

Brigham Young University

Author Note

This paper represents a collaborative effort derived from dissertions done by Arlin L. Hatch and D. Joel Beckstead. Order of the Authorship for the first two authors was determined by a coin toss.

Correspondence concerning this article should be addressed to Michael J. Lambert, Brigham Yound University, Psychology Department, 284 TLRB, Provo, UT, 84602. Electronic mail may be sent via Internet to mike_lambert@byu.edu.
Table 1

Percent Concordance of All Measures with OQ-45.2 at Pre-test,
Post-test and Clinical Significance (CS) (a)

% Agreement   Clients Meeting this        Percentage
   Level              Level

               Pre    Post     CS     Pre    Post    CS

    100       27/86   24/56   13/34   31.4   42.9   38.2
     80       28/86   12/56    6/34   32.6   21.4   17.6
     75        4/86    2/56    (b)     4.7    3.6    (b)
     60       14/86    8/56    3/34   16.3   14.3    8.8
     50        1/86    2/56    3/34    1.2    3.6    8.8
     40        6/86    4/56    1/34    7.0    7.1    2.9
     25        (b)     2/56    (b)     (b)    3.6    (b)
     20        4/86    2/56    5/34    4.7    3.6   20.0
      0        2/86    (b)     3/34    2.3    (b)    8.8

Note. 85.0%, 82.2%, 64.6% of the time, at least 3 out of the 5
measurements agreed with the OQ-45.2 classification as clinical
or non-clinical at pre-, post-test, and CS, respectively.

(a) CS = participants who met or did not meet the clinical
significance criteria as defined by Jacobson and Truax (1991) by
crossing the cutoff and reliably changing as defined by the RCI.

(b) = no data points for this agreement level
Gale Copyright: Copyright 2003 Gale, Cengage Learning. All rights reserved.