Disproportionality at the "front end" of the child welfare services system: an analysis of rates of referrals, "hits," "misses," and "false alarms".
Abstract: Data from NIS-4, NCANDS, and the State of California were used to analyze the front end of the child welfare services system--the referral and substantiation components--in terms of the system's ability to diagnose or detect instances of child maltreatment. The analyses show that Blacks are disproportionately represented in rates of referral into the system. Moreover, the analyses demonstrate that the system is less accurate for Blacks than for other racial or ethnic groups. There is a higher rate of false positives (or "false alarms") for Blacks than for other groups--that is, referrals leading to unsubstantiated findings. There is also a higher rate of false negatives (or "misses") for Blacks than for other groups--that is, children for whom no referral was made but who are in fact neglected or abused. The rate of true positives (or "hits") children for whom a referral has been made and for whom that allegation has been substantiated--is generally higher for Blacks than for other groups, but this is attributable largely to the higher rate of referral for Blacks. In sum, the system demonstrates lower levels of accuracy for Blacks than for other groups. A model is proposed demonstrating that random error, as opposed to systematic bias, could produce a pattern of results much like that observed in the data.
Subject: Child welfare (Analysis)
Medical care (United States)
Medical care (Analysis)
Administrative agencies (Health policy)
Author: Mumpower, Jeryl L.
Pub Date: 12/22/2010
Publication: Name: Journal of Health and Human Services Administration Publisher: Southern Public Administration Education Foundation, Inc. Audience: Academic Format: Magazine/Journal Subject: Government; Health Copyright: COPYRIGHT 2010 Southern Public Administration Education Foundation, Inc. ISSN: 1079-3739
Issue: Date: Winter, 2010 Source Volume: 33 Source Issue: 3
Topic: Event Code: 970 Government domestic functions Canadian Subject Form: Medical care (Private); Medical care (Private)
Product: Product Code: 8000001 Medical & Health Services; 9105210 Health Care Services NAICS Code: 62 Health Care and Social Assistance; 92312 Administration of Public Health Programs
Geographic: Geographic Scope: United States Geographic Code: 1USA United States
Accession Number: 250033500
Full Text: The disproportional representation of minority children in the child welfare system has been a topic of concern for many years, dating back at least to the work of Billingsley and Giovanni (1972). Nearly forty years after the issue was raised, neither scholars nor practitioners have reached agreement about the precise nature, extent, or causes of racial and ethnic disproportionality or the most appropriate measures for addressing the problem, despite the substantial body of research that has and continues to grapple with the question (e.g., Ards et al., 2003; Barth, 2005; Casey Family Programs, 2006; Chapin Hall Center for Children, 2008; Courtney & Skyles, 2003; Derezotes et al., 2005; Fluke et al., 2003; Hill, 2006; Needell et al., 2003; Shaw et al., 2008; U.S. General Accountability Office, 2007). Many advocates and researchers attribute disproportionality to some form of discrimination, either at the individual or institutional level. In counterpoint, some, such as Bartholet (2009), have argued that if Black children are disproportionately victimized by maltreatment then they should appropriately be removed to foster care at rates proportionate to their maltreatment rates, which will be disproportionate with respect to the overall population.

Theories about the root causes of disproportionality have been categorized into those that emphasize three types of factors (Chibnall et al., 2003; Hill, 2006): parent and family risk factors (giving rise to disproportionate needs), community risk factors (living in high-risk neighborhoods that lead to increased surveillance), and organizational and systemic factors (including biases in decision making, cultural insensitivity, and structural racism.) According to Barth (2005), four dominant models have been proposed to explain racial disproportionality in the child welfare system: the risk, incidence, and benefit model; the child welfare services decision making model; the placement dynamics model; and the multiplicative model. Courtney and Skyles (2003) observed that two general types of mechanisms contribute to disproportionality. A racial or ethnic group may enter the child welfare system at a rate that is disproportionate to its presence in the overall population; this is the "front end" (i.e., child maltreatment reporting, substantiation, etc.) of the problem. Similarly, a racial or ethnic group may exit the child welfare system at a slower rate than other groups; this is the "back end" (i.e., family reunification, adoption, etc.) of the problem.

The present paper focuses on decisions that are made at the front end of the problem--disproportionality in reporting and substantiation. It takes a molar perspective; the analyses make use of national level data and data from a single large state, California.

The analysis relies on binary classification techniques based on signal detection theory (Green & Swets, 1966) to address the following questions, among others:

* What is the best estimate of the probability that instances of child maltreatment will be detected by child protective services (CPS) agencies? Are there differences among racial and ethnic groups in the probability of detection?

* What is the overall accuracy of the child welfare screening system? How accurate is the system with respect to maltreatment? How accurate is the system with respect to non-maltreatment? Are there differences among racial and ethnic groups in accuracy?

* What are the error rates in the system? What is the rate of false negatives (failing to detect maltreatment when it is present)? What is the rate of false positives (identifying cases as potentially involving abuse or neglect when they do not in fact involve such maltreatment)? Are there differences among racial and ethnic groups in the rates of false positives and false negatives?

* What is the probability that allegations of child maltreatment will be substantiated? Are there differences among racial and ethnic groups in the probability that allegations will be substantiated?

This paper conceptualizes and analyzes the front end of the child welfare system in a manner analogous to how medical or public health studies analyze screening tests. If referrals to the child welfare system represent a screening tool that is analogous to mammography with relation to breast cancer or PSA tests with relation to prostate cancer, the following questions can be addressed: How well does this screening mechanism function? How accurate is the screening process? What percentage of cases is detected? Does the screening process work the same for all racial and ethnic groups and, if not, in what ways does it contribute to disproportionality?

Aspects of all these questions have been addressed previously by researchers concerned with disproportionality in the child welfare system. The distinctive contribution of this work is to address all these questions simultaneously within an integrated analytic framework that makes explicit their linkages to one another.

METHOD

Simple binary classification analyses, as well as more sophisticated versions of signal detection analysis, have been used in a wide variety of psychological, social, and medical research contexts (e.g., see Swets, 1996; Swets et al., 2000).1 Such analytic techniques have not been widely used in child welfare research, but Shlonsky and Wagner (2005) have noted that they have been used in some analyses of risk assessment instruments. Ruscio (1998) made use of a similar conceptual framework to that proposed here in the context of attempts to improve clinical decision making in child welfare cases.

The schema for the binary classification analyses presented in this paper appears in Figure 1. Two key variables are included in all analyses. The first is maltreatment, which is defined to have only two possible states--either the presence of maltreatment or its absence. The second variable is simply whether or not a referral--an allegation of neglect or abuse--has been made. There are, then, four possible exhaustive and mutually exclusive outcomes. (1) There are true positives (TP), which are defined as children for whom a referral has been made and for whom that allegation has been substantiated; a true positive is sometimes called a "hit". (2) There are false positives (FP), which are defined as children for whom a referral was made but was later dismissed; false positives are sometimes called "false alarms" (or Type I errors; Neyman & Pearson, 1933, 1936). (3) There are true negatives (TN), which are defined as children for whom no referral was made and who are not maltreated; true negatives are the equivalent to what is sometimes called a "correct rejection." (4) There are false negatives (FN), which are defined as children for whom no referral was made but who are in fact neglected or abused; a false negative is sometimes called a "miss" (or Type II error; Neyman & Pearson, 1933, 1936).

Seven measures of performance can be derived from this simple binary classification schema.

1. The incidence rate is the rate of child maltreatment across the entire population. It is equivalent to the sum of true positives (hits) plus false negatives (misses), divided by the sum of the entire population--(TP + FN/(TP+TN+FP+FN).

2. The positive predictive value is the probability that a child who is referred will be ascertained to have been mistreated. This is computed by dividing the number of true positives (hits) by the total number of referrals--TP/(TP+FP). The maximum positive predictive value is one, which would signify that every referred child is found to have been mistreated.

3. The negative predictive value is the probability that a child who is not referred is not mistreated. This is computed by dividing the number of true negatives (correct rejections) by the total number of children who were not referred--TN/(TN+FN). The maximum negative predictive value is one, which would mean that every child who was not referred was also not mistreated.

4. Sensitivity is the proportion of maltreatment cases that are referred and substantiated. Sensitivity is sometimes called the true positive rate and is a critically important measure in most diagnostic contexts. In medicine, sensitivity measures the proportion of breast cancers that are detected by mammograms or the proportion of prostate cancers that are detected by a PSA test. In the child welfare context, sensitivity addresses the question of what proportion of child maltreatment cases are referred. Sensitivity is computed by dividing the number of true positives by the sum of true positives plus false negatives--(TP/(TP+FN). The maximum sensitivity value is one, which would mean that every maltreated child was referred.

5. Specificity is the proportion of non-maltreated children who were not referred. Specificity is sometimes described as the true negative rate. This is also a critically important measure in most diagnostic contexts. A measure with perfect sensitivity (that is, one that gave a positive test result for all cases of breast cancer, prostate cancer, child maltreatment, etc.) will not be very helpful if it achieves a perfect record of predicting positive cases simply by always predicting a positive value for each and every case. A good predictor also needs to yield a negative result when the target condition is not present. Specificity is the probability that children who are not maltreated are not referred. It is computed by dividing the number of true negatives (correct rejections) by the sum of true negatives (correct rejections) plus false positives (false alarms)--TN/(TN+FP). The maximum specificity value is one, which would mean that no non-maltreated child was referred. The false positive rate is the mirror image of specificity; it is simply one minus specificity. The false positive rate is the probability of a "false alarm," or the probability that a child who is not maltreated will be referred.

6. Accuracy measures the proportion of correct diagnoses, weighting both positive and negative diagnoses equally. It is computed by summing the number of true positives (hits) and true negatives (correct rejections) and dividing by the sum of the entire population--(TP + TN)/(TP+TN+FP+FN). Accuracy is an indicator of overall performance. It penalizes equally both types of errors--false positives (false alarms) and false negatives (misses). In child welfare, however, it may not be appropriate to equally weight the two types of errors Failing to detect a case of abuse or neglect (a "miss") might be regarded, for instance, as a more serious error than making a referral that ends up being unsubstantiated (a "false alarm").

7. Because false positives and false negatives may not be regarded as equally serious, it is useful to compare the rate of each type of error. The False Positive/False Negative ratio (i.e., the ratio of false alarms and misses) is one way to make such a comparison. A ratio value of one means that both types of errors occur with equal frequency. A value of more than one means that false positives (false alarms) are more frequent than false negatives (misses). A value of less than one means that there are more false negatives (misses) than false positives (false alarms).

DATA AND DEFINITIONS

Direct estimates based on empirical data are available for some cell entries or marginal totals in the binary classification schema. By combining data from several sources, making some unremarkable assumptions, and using simple arithmetic, it is possible to derive reasonable estimates of all remaining values for the complete matrix at the national level and for the State of California. Doing so permits analyses that address questions about disproportionality in a novel and, hopefully, enlightening manner.

Data for the present analyses come from three sources.

Fourth National Incidence Study of Child Abuse and Neglect (NIS-4)

The first source is the Fourth National Incidence Study of Child Abuse and Neglect (NIS-4), a report to Congress from the Administration for Children and Families, U.S. Department of Health and Human Services (Sedlak et al., 2010). NIS-4 is intended to provide estimates of the incidence of child abuse and neglect in the United States, serving as the nation's needs assessment on child abuse and neglect. NIS-4 included children who were investigated by CPS agencies, but also used a sentinel survey methodology to obtain data on other children who were recognized as maltreated by community professionals. NIS-4 estimates therefore include both abused and neglected children who are in the official CPS statistics and those who are not. NIS-4 is based on data from a nationally representative sample collected during a three-month study period that spanned 2005-2006. The NIS uses standard definitions of abuse and neglect so that estimates of the numbers of maltreated children and incidence rates have a calibrated, standard meaning across various sites, sources, and cycles.

National Child Abuse and Neglect Data System (NCANDS)

The second data source is the National Child Abuse and Neglect Data System (NCANDS). In particular, the analyses use data contained in Child Maltreatment 2006 (U.S. Department of Health and Human Services, 2008), which provides national and state statistics about child maltreatment derived from data collected by CPS agencies. National statistics are based primarily on case-level data. The present analysis used data from Child Maltreatment 2006 rather than from more recent reports so that the analyses combining data from NIS-4 and NCANDS would be based on the same time period.

Child Welfare Services Reports for California

The third data source is the series of Child Welfare Services Reports for California (Needell et al., 2010). The Child Welfare Dynamic Report System is part of the California Child Welfare Performance Indicators Project, reflecting a collaborative effort between the California Department of Social Services and the University of California at Berkeley. This data source is used for analyses at the State of California level.

Endangerment and Harm Standards

The present analyses define maltreatment to include both abuse and neglect and rely on the same standard definitions of maltreatment, abuse, and neglect as used in the NIS-4, NCANDS, and California data bases. NIS-4 uses two standards in estimating the incidence of child maltreatment--the Harm Standard and the Endangerment standard. The Harm Standard is relatively stringent in that it classifies a child as maltreated only if he or she has already experienced demonstrable harm as a result of maltreatment. Incidence estimates based on the Endangerment Standard include all the Harm Standard children, but also include children who were not yet harmed by maltreatment, but who experienced abuse or neglect that placed them in danger of being harmed. The two standards lead to substantially different estimates of the incidence of child maltreatment.

Definitional and Data Issues

Definitions of maltreatment, abuse, and neglect are imprecise and imperfect. The associated ambiguity is amplified because both policy makers and case workers are forced frequently to dichotomize along a continuous scale, drawing a line that distinguishes between behaviors that are classified as abuse or neglect and those that fall just short of that threshold. Adding further complication, some key parameter values in the following analyses cannot be directly observed but must be estimated or inferred. Despite the uncertainties, the analytic framework used in this paper makes it possible to address certain significant questions that would be difficult to assess in any other manner. Also important, the framework makes it straightforward for interested parties to re-do the analyses replacing estimated or inferred values with estimates or inferences of their own choosing. Along these lines, the present paper reports several sensitivity analyses in which certain key assumptions were replaced with alternative plausible assumptions in order to evaluate the degree to which the results are sensitive to such changes in estimates of key parameter values.

Another significant factor related to data sources is that unlike three previous NIS studies (NIS-1, with data from 1979-1980; NIS-2, with data from 1986, and NIS-3 with data from 1993), the NIS-4 reported statistically significant race differences in the incidence of maltreatment, with higher rates in most cases for Black children than for White or Hispanic children (Sedlak et al., 2010). Supplementary analyses (Sedlak, McPherson, & Das, 2010) lead to the conclusion that the statistically reliable race differences in rates of some categories of child maltreatment found in NIS-4 is due at least partly as a consequence of (1) the greater precision of the NIS-4 estimates and (2) the enlarged gap between Black and White children in economic well-being.

Discussions about appropriate interpretation of the NIS-4 results are sure to continue for some time. The integrity of the analyses reported in this paper does not depend, however, on whether the NIS-4 data provide evidence of statistically reliable, independent effects for race. The present analyses focus solely on the ability to diagnose or detect instances of child maltreatment. Nothing in the analyses depends on assumptions about the causes of child maltreatment. Because the analyses are concerned exclusively with diagnoses, not causes, it is immaterial whether there are significant effects for race and, if there are, whether such effects are wholly or partially explained by poverty or other socio-economic predictors.

A final question concerns the appropriateness of combining data from NCANDS and NIS-4, as is done in the analyses of national sample data reported below. In support of this procedure, data files from NCANDS were used in the design of NIS-4 (Sedlak, 2010, Acknowledgements page), the NIS-4 report specifically notes parallels between its results and those of NCANDS (Sedlak, 2010, p. 20), NCANDS was used as a basis for computing annualization multipliers for the NIS data (Sedlak, 2010, p. 2-6; 2-17), NCANDS data were used in making other statistical adjustments to the NIS data (Sedlak, 2010, p. A-4), and the NIS is described in terms of its extension beyond (not its differences with) NCANDS (e.g., https://www.nis4.ore/faq.asp). Moreover, the data reported in both NCANDS and NIS are collected from professionals who work in child welfare or in other child services contexts. The biggest differences between the two data sets is that NCANDS relies on an administrative data extraction approach that makes use of the definitions used by state-level child protective service agencies, whereas NIS makes use of a professional survey methodology that employs standardized definitions. A supplementary study comparing NIS-4 with NCANDS is forthcoming from the Administration for Children and Families. A study by Fallon et. al (2010) comparing NIS-3 and NCANDS found no differences that would invalidate the analytic approach used in this paper.

ANALYSES AND RESULTS

National sample statistics

A binary classification analysis of national data regarding referrals and substantiations of child maltreatment for the Endangerment Standard is given in Table 1. This serves as a base case analysis to which subsequent analyses can be compared.

According to NIS-4 (Sedlak et al., 2010), the national incidence of Endangerment Standard Maltreatment is 39.5 per 1,000 children, as shown in the marginal entry for Total Maltreatment. According to Child Maltreatment 2006 (U.S. Department of Health and Human Services, 2008), the national incidence of referrals is 43.7 per 1,000 children, as shown in the marginal entry for Total Referral. According to Child Maltreatment 2006 (U.S. Department of Health and Human Services, 2008), the national incidence of victimization (as indicated by substantiated referrals) is 12.1 per 1,000 children, as shown in the cell entry for Maltreatment/Referral.

All other cell values can be derived by simple arithmetic, based on these three critical cell entries or marginal totals. Specifically, the Total No Maltreatment marginal (960.5) is derived by subtracting the Total Maltreatment marginal (39.5) from the Grand Total (1000). The No Maltreatment/Referral cell (31.6) is derived by subtracting the Maltreatment/Referral cell (12.1) from the Total Referral marginal (43.7). The Maltreatment/No Referral cell (27.4) is derived by subtracting the Maltreatment/Referral cell (12.1) from the Total Maltreatment marginal (39.5). The No Maltreatment/No Referral cell (928.9) is derived by subtracting the No Maltreatment/Referral cell (31.6) from the Total No Maltreatment marginal (960.5). Finally, the Total No Referral marginal (956.3) can be derived by subtracting the Total Referral marginal (43.7) from the Grand Total (1000). The same basic logic is used in constructing all tables used in subsequent analyses.

The base case analysis yields the following values, shown in column (1) of Table 2. The positive predictive value is .277; in other words, 27.7% of referrals are true positives, or "hits"--they were substantiated as cases involving child maltreatment. (2) The negative predictive value is .971; an estimated 97.1% of those not referred are also not guilty of child maltreatment. These are correct rejections. At the same time, the analysis implies that an estimated 2.9% of those who were not referred are in fact guilty of child abuse or neglect; these are "misses".

The sensitivity is .306; in other words, the analysis estimates that 30.6% of all child maltreatment cases during 2006 were substantiated by CPS agencies (and 69.4% were not). The false alarm rate (1--specificity) is .033; an estimated 3.3% of all children who are not a victim of child abuse or neglect were nonetheless referred to a CPS agency. Accuracy is .941; an estimated 94.1% of all cases are correctly classified as true positives (both guilty of maltreatment and referred) or true negatives (both not guilty and not referred). In terms of the relative frequency of the two possible types of errors, the false positive/false negative ratio is 1.15 indicating that the rate of false positives is slightly higher than the rate of false negatives.

The other columns in Table 2 present the results of sensitivity analyses which vary certain key assumptions. In column (2), the analysis is re-done using the more restrictive Harm Standard, which estimates that the national incidence of child maltreatment is about half that implied by the Endangerment Standard. The major consequence of changing this assumption is to increase the value of sensitivity to .708. When the Harm Standard rate is used, the analysis estimates that approximately 70.8% of all child maltreatment cases are investigated and substantiated. Using the Harm Standard changes the estimated frequency of false positives so that it becomes more than six times the estimated frequency of false negatives. (For the sake of brevity all analyses in subsequent sections will use the Endangerment Standard.)

Columns (3) and (4) of Table 2 re-do the analysis redefining "referral" to limit it to those cases that were screened in by CPS agencies. Restricting the predictor to screened-in referrals improves the positive predictive value considerably, to .449 as compared to .277 when the analysis was based on all referrals regardless of whether or not they were screened-in, investigated, or substantiated. All other performance indices are little changed. (For the sake of brevity, all analyses in subsequent sections will use total referrals.)

National Sample Statistics, By Race And Ethnicity

Binary classification analysis of national data can be used to examine the extent of disproportionality during the referral and substantiation stages of the child welfare entry process. Data for Black, Hispanic, and White populations for the Endangerment Standard appear in Table 3. (3) Certain differences among these populations are readily apparent. As previously discussed, the NIS-4 study estimates different child maltreatment incidence rates for the three groups: 49.6 of 1,000 for blacks, 30.2 for Hispanics, and 28.6 for Whites. As earlier studies have found (e.g., Yaun et al., 2003), the total estimated rate of referrals is considerably higher for Blacks than for the other two groups (70.7 for Blacks, as compared to 38.6 for Hispanics and 38.2 for Whites). The estimated rate of true positives, or hits, (Maltreatment/Referral cell) is higher for Blacks (19.8) than for Hispanics (10.8) and Whites (10.7), but so is the rate of errors. For Blacks, there are an estimated 50.9 false positives (No Maltreatment/Referral cell) as compared to 27.8 for Hispanics and 27.5 for Whites. There are also more false negatives (Maltreatment/ No Referral cell) for Blacks (29.8) than for Hispanics (19.4) and Whites (10.7). In other words, Blacks are referred at a rate more than 80% higher than are Hispanics or Whites and they are about 85% more likely to have that referral substantiated a true positive or "hit". But Blacks are also about 80% more likely to be the subject of an unsubstantiated allegation (a false positive, or false alarm) and roughly 50% more likely not to be referred when abuse or neglect is present (a false negative, or "miss".)

Summary statistics for these data, presented in Table 4, clarify the nature and extent of racial and ethnic differences. The negative predictive value is lower for Blacks (.968) than for the other two groups (.980 for Hispanics and .981 for whites) because of the comparatively higher rate of false negatives ("misses") for Blacks. The sensitivity is higher for Blacks (.399) than for the other two groups because a higher proportion of maltreatment cases are referred and substantiated. The false alarm rate is also higher for Blacks (.054), almost twice as high as the rate for Hispanics (.029) or Whites (.028), indicating that it is more likely for Blacks to be referred in a case that is not subsequently substantiated. The overall accuracy measure is particularly instructive--it is lower for Blacks (.919) than for Hispanics (.953) or Whites (.955)--the error rate is roughly twice as high for Blacks as for the other groups, and the errors are of both types. Finally, the False Positive/False Negative ratio (the ratio of false alarms/misses) is higher for Blacks than for the other two groups, which is to say that unsubstantiated referrals are comparatively more frequent for Blacks than for the other two groups.

If conceptualized as a diagnostic system designed to detect child abuse and neglect, the child welfare referral and substantiation system clearly does not perform in the same manner for Blacks as for Hispanics and Whites. Although a comparatively higher proportion of maltreatment cases involving Blacks enters into the child welfare system, the system is less accurate for Blacks than for the other groups, yielding a higher rate of both false alarms and misses. These results provide support for the conclusion that black children are both over-reported and under-reported in the child welfare system (Barth, 2005).

State Of California Statistics, By Race And Ethnicity

A similar analysis addressing the issue of disproportionality was conducted using 2008 data for the State of California. Analysis at the state level was conducted for three reasons: First, it was important to see if a similar pattern of disproportionality was observed with the most recent available data. This was particularly important because the recent NIS-4 study suggested that significant changes might be occurring in what heretofore has been a relatively stable picture regarding patterns of child neglect and abuse with respect to race and ethnicity. Second, it was important to evaluate whether a similar analytic approach as used with national data could be applied equally well at a lower level of geographic aggregation. Third, to test the robustness of the findings, it seemed wise to perform additional analyses in which the key elements requiring estimation differed. (4)

Binary classification analysis of State of California data for 2008 was used to examine the extent of disproportionality during the referral and substantiation stages of the child welfare entry process. Overall data as well as data for Black, Hispanic, and White populations appear in Table 5. As in the previous analyses, the NIS-4 study is used as a basis for estimates of child maltreatment incidence rates: an overall rate of 39.5 per 1,000, 49.6 out of 1,000 for blacks, 30.2 for Hispanics, and 28.6 for Whites.

Certain differences among these populations are readily apparent. For the 2008 California data, the total rate of referrals is higher for Blacks than for the other two groups. The differences among groups are even more pronounced in California than for the national sample. For Black children the rate of referral is 115.1 per 1,000--a rate roughly two and a half times that of the other groups--48.4 per 1,000 for Hispanics and 40.2 per 1,000 for Whites, and 48.7 per 1,000 overall. The rate of true positives, or hits, (Maltreatment/ Referral cell) is two to three times higher for Blacks (24.6) than for the overall population (9.7), Hispanics (10.1), or Whites (8.4). But, just as in the national sample, the rate of errors associated with Black children is also substantially higher than for the other groups.

The California data exhibit a different pattern of errors from that in the national sample. The estimated rate of false negatives (Maltreatment/ No Referral cell) is somewhat lower for Blacks (24.6) than the overall false negative rate (29.8) although still somewhat higher than for Hispanics (20.1) and Whites (20.2). Differences among groups in the false positive rates, however, are marked. For Blacks, the false positive rate (No Maltreatment/ Referral cell) is 90.1; this compares to an overall rate of 39 and rates of 38.3 for Hispanics and 31.8 for Whites. In other words, Blacks are involved in unsubstantiated referrals at a rate about 2.3 times the overall rate and nearly three times the rate for Hispanics and Blacks. In California, Blacks are more likely than Hispanics or Whites to be referred, more likely to be involved in a substantiated referral, and much more likely to be involved in an unsubstantiated referral.

Summary statistics for these data, presented in Table 6, clarify the nature and extent of racial and ethnic similarities and differences. The analyses indicate that the positive predictive values are virtually identical across all groups, ranging from a low of .199 to a high of .217. This indicates that the percentage of referrals that are substantiated--roughly 20%--is essentially the same for all groups. The analysis thus provides support for the conclusion reached by Fluke and colleagues (2003), who concluded that disproportionality appears to be more pronounced at some decision making points in the process than others. In this instance, while there are substantial differences between groups in the rate of referral there is little difference in terms of the percentage of referrals that are later substantiated.

The sensitivity for Blacks (.504) is much higher than for the overall group (.246), Hispanics (.334), or Whites (.294). This indicates that a much higher proportion of abused or neglected Black children enter the child welfare system than do abused or neglected Hispanic or White children. The analysis estimates that about half of the estimated instances of child maltreatment among Black children resulted in entry into the child welfare system, as compared to only a quarter of such instances for the overall population and a third or fewer of the cases involving Hispanic and White children. The comparatively high value for sensitivity for Blacks is accounted for at least partially by the fact that Black children are two to three times more likely to be referred into the system. The high rate of referral is also a factor, however, in the false alarm rate of .095 for Blacks in California during 2008. This is almost twice as high as the false alarm rate for Blacks in the 2006 national sample. Further, it is two to three times higher than the false alarm rate for the overall California population (.041), for Hispanics (.039) and for Whites (.033). As with the national data, the overall accuracy measure is instructive--it is substantially lower for Blacks (.885) than for the overall population (.931), Hispanics (.942), or Whites (.948). The error rate is roughly twice as high for Blacks as for the other groups. These are primarily false positive errors--false alarms in which Blacks are referred into the system but the allegations are not substantiated. In California, as at the national level, if the child welfare referral and substantiation system is thought of as a diagnostic system designed to detect child abuse and neglect, the system makes far more diagnostic errors for blacks than for the other major racial and ethnic groups. Finally, the False Positive/False Negative ratio (the ratio of false alarms/misses) is much different for Blacks than for other groups. The ratio for Blacks (3.66) shows that the system is far more likely to produce false positive errors as opposed to false negative ones, when compared to the overall population (1.31), Hispanics (1.91), or Whites (1.57).

DISCUSSION: COULD RANDOM ERROR ACCOUNT FOR THE OBSERVED RESULTS?

In much the same way that mammograms help to detect and diagnose breast cancer or PSA tests help to detect and diagnose prostate cancer, the system of referrals, investigations, and substantiation that constitute the front end of the child welfare system can be conceptualized as a complex screening system for detecting and diagnosing child maltreatment. If viewed in this manner, how well does the system perform overall? Further, does it function equally well for all racial and ethnic groups? And what role, if any, does it play in the disproportional representation of minorities in the child welfare system? Analyses of a 2006 national sample suggested that just over 30% of all child abuse and neglect cases were indentified and substantiated by CPS agencies. This means, of course, that about 70% of all cases were not detected. (5) A little over a quarter of all referrals were substantiated. The false alarm rate was a little over 3%. The rate of false positive errors was somewhat higher than the rate of false negative errors. The question "how good is this level of performance?" goes far beyond the scope of this paper. Clearly, there is substantial room for improvement.

In any case, we can address questions about whether the referral and substantiation components of the child welfare system operated in the same manner for all racial and ethnic groups and whether these components contributed to the disproportional representation of minorities in the child welfare system. The short answer to the first question is "No" and the short answer to the second question is "Yes".

The analysis of both national and California data indicated that the Blacks are treated differently from other groups in a number of respects:

The rate of referrals for Blacks is higher than for other groups. This has been widely recognized but the present analysis underscores its fundamental importance. The rate of substantiation of referrals (the positive predictive value) was found to be approximately the same for Blacks as for other groups, suggesting that disproportionality was not appreciably amplified during the screening-in, investigation, or substantiation stages of the process. (For a supporting conclusion concerning the use of standardized risk assessment instruments, see Baird et al., 1999.) This means that a primary driver of disproportionality appears at the earliest stage of the process when referrals are made by mandated reporters and other sources.

The rate of true positives is also higher for Blacks than for other groups. Concretely speaking, this means that Blacks are disproportionally represented in terms of substantiated referrals. This is partially attributable to the fact that the estimated rate of maltreated children is higher for Blacks than for other groups. From this perspective, disproportional representation might be partially attributable to greater levels of need. But the study also found that sensitivity (the proportion of positives identified as true positives) was higher for Blacks than for other groups. This difference is subject to various interpretations. A simple interpretation could be that the differences in sensitivity--the greater likelihood that instances of maltreatment will be referred and substantiated--contribute to the overrepresentation of Blacks in the front end of the child welfare system. An equally plausible interpretation might be that it is a symptom of the underrepresentation of Hispanics and Whites. Yet another interpretation might be that all groups are underrepresented, although Blacks somewhat less so, since the rate of substantiated cases identified by CPS agencies is far lower than the incidence rate suggested by the NIS-4 needs assessment.

The data analyses strongly support the conclusion that the process is simply less accurate for Blacks than for other groups. The accuracy statistic for Blacks was lower than for other groups at both national and California levels. This implies a greater rate of errors for Blacks than for Hispanics or Whites, which is precisely what was found. For the national sample, more errors of both types were found--false negatives (misses), or failures to detect and diagnose cases of maltreatment, as well as false positives (false alarms), or cases involving referrals that were not substantiated. For the California analysis, the level of accuracy for Black was even lower than for the national sample and the errors were more likely to be false positives than false negatives.

Researchers in child welfare have repeatedly found evidence of racial and ethnic disproportionality in the child welfare system, just as was found in the present study. Identifying the causes of disproportionality has proved to be a difficult task, however. The pattern of results from the present analysis suggests that one possible and heretofore neglected explanation for disproportionality is random error.

Systematic error in the form of prejudice or bias, leading to discrimination based on either individual or institutional racism, has often been cited as a possible cause of disproportionality or disparity. It is not generally recognized, however, that disproportional representation might arise from lack of accuracy or reliability stemming from random errors, which may be thought of as simply "honest mistakes." A few examples involving a hypothetical mandated reporter help to illustrate this point. Imagine a population of 1000 cases of potential child abuse or neglect with a true incidence rate of 10%. If the reporter were perfectly valid and reliable, we would observe the results in Table 7. The reporter would refer the 100 cases in which maltreatment was present and would not refer the 900 cases in which maltreatment was not present.

Systematic bias would result in a different pattern of results. Imagine that the reporter exhibits systematic bias against minority group families, but not majority group families, such that all positive cases among minority group families are correctly diagnosed, but 10% of the cases in which no maltreatment is present are incorrectly diagnosed as positive, as shown in Table 8. This is precisely the situation that discussions of measurement error relating to disproportionality commonly focus on--a biased test that consistently overestimates pathology, or underestimates positive attributes, in underrepresented minority groups. In this hypothetical example, 190 referrals would be made, an increase of 90 over the 100 referrals that would have been made by a perfectly valid and reliable reporter. Ninety of these 190 referrals, however, would be false positive errors--mistakes involving over-diagnosis of maltreatment among minority group families.

Alternatively, imagine a reporter who is competent but human--in other words, the reporter is valid (usually gets the right answer and is not systematically biased), but is not perfectly reliable (i.e., this reporter sometimes makes a mistake). Further, imagine that the reporter is more unreliable for minority group families than for majority group families. For example, suppose that for 1000 majority group families, the reporter makes diagnostic misclassifications 10% of the time, with symmetrical error rates for both false positive and false negative errors. This would yield the pattern of results presented in Table 9. The reporter would refer 180 cases and make 100 errors--referring 90 cases that shouldn't be (false positive errors) and failing to refer 10 cases that should have been referred (a false negative error).

Suppose that for minority group families, the reporter is also reasonably accurate, however, he or she makes mistakes more frequently. Assume the reporter makes diagnostic misclassifications 20% of the time, with symmetrical error rates for both false positives and false negatives. For 1000 minority families, this would yield the pattern of results presented in Table 10. The reporter would refer 260 cases and make 200 errors. Note that more minority group families are referred (26%) in this example than in the previous example (Table 9) involving majority group families (18%). Further, twice as many errors would be made for the minority group families as compared to majority group families, and these would be of both types. There would be twice as many false positive errors (making referrals when maltreatment was not present) for minority group families (180) than for majority group families (90). Similarly, there would be twice as many false negative errors (not making referrals when maltreatment was present) for minority group families (20) than for majority group families (10).

These hypothetical analyses illustrate that disproportionality may occur even without systematic bias (for an analysis of the same type of phenomenon in the context of college admissions, see Mumpower, Nath, & Stewart, 2002). More minority group members may be referred in child maltreatment cases (or be arrested and arraigned, or assigned to special education, and so forth) not as a result of systematic bias or even treatment disparities, but because reporters' diagnostic judgments for minority group members involve a greater degree of random error than for majority group members.

At this point it is simply speculative whether random error could help to explain the observed results for Blacks in the child welfare system, as suggested by the above hypothetical example. Note, however, the similarities between the observed data and the hypothetical examples cited above. In comparison to the majority group, one would expect to observe (1) lower rates of accuracy; (2) disproportionately many referrals; (3) disproportionately more errors of both types; and (4) not much difference in the number of true positives. This is just what was found for Blacks in the data analyses reported in earlier sections of this paper.

Further, the contribution of random error to disproportionality could be amplified if reporters lower their threshold for referrals to compensate for lower levels of accuracy for minority group members. Suppose that reporters recognized that they are less accurate for minority group members but wanted to avoid the false negative problem of failing to refer possible cases of child maltreatment. Adopting such a precautionary principle, they might then lower the threshold necessary for referral. Increasing the rate of referrals in this manner will generally lead to an increase in the rate of true positives, but will also further increase the rate of false positives. (For a discussion of the tradeoffs involved in raising or lowering admission thresholds in emergency psychiatry, see Way et al., 1998).

Clearly, the proposed mechanism does not account in a wholly satisfactory fashion for all the observed data. In particular, the proposed mechanism does not provide a satisfactory account for why the data for Hispanics appears much more similar to that for Whites than that for Blacks, an outcome that is reminiscent of the so-called Hispanic paradox in public health (Franzini et al., 2001) in which Hispanics have been found generally to have substantially better health than would be predicted on the basis of socioeconomic risk factors. On the other hand, despite repeated efforts to do so, research has yet to uncover clear evidence to support the proposition that disproportionality results largely from systematic discrimination at either the individual or institutional level. Perhaps both systematic and random errors play a role in the disproportional representation of Black children in the child welfare system.

CONCLUSION

The present study demonstrated that the front end of the child welfare services system--the referral and substantiation components--does not function the same for Blacks as it does for other racial and ethnic groups in terms of diagnosing and detecting instances of child maltreatment. Blacks are disproportionately represented in terms of their referral rate into the system. Further, the system is less accurate for Blacks--the rate of correct diagnoses is lower and the rate of errors, especially false positive errors, is higher than for other groups. Instances of child maltreatment for Blacks are generally detected at a proportionally higher rate than for other groups but this attributable largely to the higher rate of referral. In short, the system does not perform in the same manner for Blacks as it does for other racial and ethnic groups. A series of hypothetical examples were used to demonstrate that random error could produce a pattern of results much like that observed for Black children in the present study.

If random error plays an important role in accounting for the observed results, what can be done to change the situation for the better? From an analytic standpoint the answer is easy--the level of accuracy for Blacks needs to be improved so that it is at least as good as it is for Hispanics, Whites, and other racial and ethnic groups. (For analogous results demonstrating the key role of accuracy in this regard, see the analysis by Mumpower et al. (2002) on affirmative action policies in college admissions and Way et al. (1998) on admissions to psychiatric emergency rooms.) Of course, this begs the question of how to go about improving accuracy. Clearly, this supports the critical importance of education and training, but discussion about how to accomplish this goal goes beyond the scope of the present paper.

Finally, several caveats should be issued about the limitations of the present study. Most notably, there are four.

First, the paper ignored altogether the "back end" of the child welfare system which is an important contributor to disproportional representation of minorities in the child welfare system (e.g., Courtney & Sklyes, 2003; Derezotes et al, 2005). This is not to say that the types of placements that children of different racial and ethnic groups go into, the likelihood of reunification and the likelihood of timely adoption or guardianship are not important contributors to the phenomenon of disproportionality. Clearly, they are, but they lie beyond the scope of the present paper.

Second, the present analysis does not address the question of whether Blacks and other groups are reported for child maltreatment at similar or different rates when controlling for poverty. Using statewide data from Missouri, Drake et al. (2009) did not find high levels of racial disproportionality once poverty was controlled for. Likewise the supplementary analyses of race differences in child maltreatment rates in NIS-4 (Sedlak, McPherson, & Das, 2010) found that race did not have significant independent predictive power for most (but not all) measures of child maltreatment after taking into account poverty and other correlated predictors. Nothing in the present analyses, however, relies on any assumptions about the causes underlying child maltreatment. If race has absolutely no independent predictive power after controlling for poverty and other risk factors, the major conclusion of the present paper would be unchanged: the system makes more diagnostic errors of both types--false positives and false negatives--for Blacks than for Whites or Hispanics. It would likely be quite informative to reanalyze the data in terms of class rather than race, if that were possible, but to date the required data are not available.

Third, the present paper implicitly assumes that the conceptual and operational definitions of child maltreatment used by NCANDS, NIS and similar sources are appropriate for all racial and ethnic groups and that their incidence estimates are accurate and stable. Each of these assumptions is probably on shakier ground than we might hope. The present analysis treats unsubstantiated referrals as if they signified "no maltreatment" and thus classifies them as false alarms.

Some in the child welfare community (e.g., Besharov, 1993; 2000) have argued that there is a substantial problem with over-reporting of child abuse such that cases of inadequate cognitive and social nurturing are inappropriately labeled child neglect or child abuse. Such false alarms, it is argued, lead to inappropriate disruption of families who would have benefited more from supportive intervention. Others have concluded that the empirical evidence demonstrates few, if any, significant clinical differences between substantiated and unsubstantiated referrals in terms of the clinical services that they require or receive (Drake, 1966; Hussey et al., 2005; Kohl et al., 2009). Based on an analysis of data the National Survey of Child and Adolescent Well Being, the Administration for Children and Families (U.S. Department of Health and Human Services, N.D.) concluded that children with substantiated cases of maltreatment do not appear to fare more poorly than children in unsubstantiated cases and that children in unsubstantiated maltreatment cases may have as many social service needs as those in substantiated cases. They report, however, that caseworkers perceived greater social service needs among those with substantiated cases than among those with unsubstantiated cases.

The implications of this debate for the present analysis are not clear. If one accepts the point of view that unsubstantiated referrals simply represent mistakes, then it is clearly appropriate to classify these as false alarms. But, even if differences between substantiated and unsubstantiated cases are negligible in terms of service provision, the primary conclusion stands: the front end of the child welfare services system--the referral and substantiation components--does not function the same for Blacks as it does for other racial and ethnic groups. Substantiation is an imperfect proxy for the variable that we are truly interested in--child maltreatment--and it dichotomizes a continuously distributed variable with attendant problems for analysis. Despite its imperfections, substantiation remains a widely reported and analyzed variable in child welfare and the present analysis reveals distinct differences among Blacks and other racial and ethnic groups in terms of typical patterns of referrals and substantiation.

Fourth, and finally, the present analyses have made use of the best available point estimates of relevant rates of referral and substantiation, for both the overall population and for ethnic and racial subgroups. It is important to remember that the sample data from which those point estimates are derived are of imperfect validity and reliability. Moreover, in the type of two-by-two table used for most of the analyses the error terms within cells are necessarily not independent. The present analyses represent a good effort based on the best available data to estimate key parameters relating to hits, misses, and false alarms at the front end of the child welfare system but the results should be interpreted with appropriate caution given the fallibility of the data upon which they are based.

Thanks are due to Prof. Edwina L. Dorch, Prof. Leroy H. Pelton, and an anonymous reviewer for constructive criticism and comments on this paper. Any remaining shortcomings of the paper are the sole responsibility of the author.

REFERENCES

Ards, S.D., Meyers, S.L., Malkis, A ., Sugrue, E., & Zhou, L. (2003). Racial disproportionality in reported and substantiated child maltreatment and neglect: An examination of systematic bias. Child and Youth Services Review,, 25, 375-392.

Baird, C., Ereth, J., & Wagner, D. (1999). Research-based risk assessment: Adding equity to CPS decision making. Madison, WI: Children's Research Center.

Barth, R.P. (2005). Child welfare and race: Models of disproportionality. In D.M. Derezotes et al. (Eds.) Race matters in child welfare: The overrepresentation of African American children in the system. Washington, D.C.: CWLA Press.

Bartholet, E. (2009). The racial disproportionality movement in child welfare: False facts and dangerous directions. Arizona Law Review, 51, 873-932.

Besharov, D. (1993). Overrreporting and underreporting are twin problems. In R. J. Gelles & D. R. Loseke (Eds.), Current controversies on family violence. Newbury Park, CA: Sage, 257-272.

Besharov, D. J. (2000). Child abuse realities: Over-reporting and poverty. Virginia Journal of Social Policy and the Law, 8, 165-203.

Billingsley, A., & Giovannoni, J. M. (1972). Children of the storm: Black children and American child welfare. New York: Harcourt, Brace, Jovanovich.

Casey Family Programs. (2006). Disproportionality in the child welfare system: The disproportionate representation of children of color in foster care. Retrieved on March 2, 2010, from http://www.ncsl.org/print/cyf/fostercarecolor.pdf

Chapin Hill Center for Children. (2008). Understanding racial and ethnic disparity in child welfare and juvenile justice. Chicago: Chapin Hall Center for Children at the Univesity of Chicago.

Chibnall, S., Dutch, N. M., Jones-Harden, B., Brown, A., Gourdine, R., Smith, J., Boone, A., & Snyder, S. (2003). Children of color in the child welfare system: Perspectives from the child welfare community. Washington, DC: U.S. Department of Health and Human Services, Children's Bureau.

Courtney, M., & Sklyes, A. (2003). Racial disproportionality in the child welfare system. Child and Youth Services Review, 25, 355-358.

Derezotes, D. M., Poertner, J. & Testa, M.F. (2005). Race matters in child welfare: The overrepresentation of African American children in the system. Washington, D.C.: CWLA Press.

Drake, B. (1996). Unraveling "unsubstantiated." Child Maltreatment, 1, 168-175.

Drake, B., Lee, S. M., & Jonson-Reid, M. (2009). Race and child maltreatment reporting: Are Blacks overrepresented? Child and Youth Services Review,, 31, 309-316.

Fallon, B., Trocme, N., Fluke, J., MacLaurin, B., Tonmyr, L., & Yuan, Y.Y. (2009). Methodological challenges in measuring child maltreatment. Child Abuse and Neglect, 34, 70-79.

Franzini, L., Ribble, J. C., & Keddie, A. M. (2001). Understanding the Hispanic paradox. Ethnicity and Disease, 11, 496-518.

Green, D.M., & Swets J.A. (1966). Signal detection theory and psychophysics. New York: Wiley

Hill, R. B. (2006). Synthesis of research on disproportionality in child welfare: An update. Casey-CSSP Alliance for Racial Equity in the Child Welfare System. Retrieved on March 2, 2010 from http://www.racemattersconsortium.org/docs/BobHillPaper_FINAL.pdf

Hussey, J.M., Marshall, J.M., English, D.J., Knight, E.D., Lau, A.S., Dubowitz, H., et al. (2005). Defining maltreatment according to substantiation: Distinction without a difference? Child Abuse and Neglect, 29, 479-492

Kohavi, R., & Provost, F. (1998). Guest editor's introduction: On applied research in machine learning. Machine Learning, 30, 127-32

Kohl, P. L., Jonson-Reid, M., & Drake, B. (2009). Time to leave substantiation behind: Findings from a national probability study. Child Maltreatment, 14, 17-26.

Mumpower, J. L., Nath, R., & Stewart, T. R. (2002). Affirmative action, duality of error, and the consequences of mispredicting the academic performance of African-American college applicants. Journal of Policy Analysis and Management, 21, 63-77.

Needell, B., Brookhart, M.A., & Lee, S. (2003). Black children and foster care placement in California. Child and Youth Services Review, 25, 375-392.

Needell, B., Webster, D., Armijo, M., Lee, S., Dawson, W., Magruder, J., Exel, M., Glasser, T., Williams, D., Zimmerman, K., Simon, V., Putnam-Hornstein, E., Frerer, K., Cuccaro-Alamin, S., Lou, C., Peng, C., Holmes, A. & Moore, M. (2010). Child Welfare Services Reports for California. Retrieved 3/18/2010, from University of California at Berkeley Center for Social Services Research website. URL: http://cssr.berkeley.edu/ucb_childwelfare

Neyman, J., & Pearson, E.S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society, Series A 231, 289-337.

Neyman, J., & Pearson, E.S. (1936). Sufficient statistics and uniformly most powerful test of statistical hypotheses. Statistical Research Memoirs 1936, 1, 113-137.

Ruscio, J. (1998). Information integration in child welfare cases: An introduction to statistical decision making. Child Maltreatment, 3, 145-156.

Sedlak, A.J., McPherson, K., & Das, B. (2010). Supplementary Analyses of Race Differences in Child Maltreatment Rates in the NIS-4. Washington, DC: U.S. Department of Health and Human Services, Administration for Children and Families.

Sedlak, A.J., Mettenburg, J., Basena, M., Petta, I., McPherson, K., Greene, A., & Li, S. (2010). Fourth national incidence study of child maltreatment and neglect (NIS-4): Report to Congress. Washington, DC: U.S. Department of Health and Human Services, Administration for Children and Families.

Shaw, T. V., Putnam-Hornstein, E., Magruder, J., & Needell, B. (2008). Measuring racial disparity in child welfare. Child Welfare, 87, 23-36.

Shlonsky, A., & Wagner, D. (2005). The next step: Integrating actuarial risk assessment and clinical judgment into an evidence-based practice framework in CPS case management. Child and Youth Services Review, 27, 409-427.

Swets, J.A. (1996). Signal detection theory and ROC analysis in psychology and diagnostics: Collected papers. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

Swets, J.A., Dawes, R.M., & Monahan, J. (2000). Psychological science can improve diagnostic decisions. Psychological Science in the Public Interest, 1, 1-26.

Taylor, H.C., & Russell, J.T. (1939). The relationship of validity coefficients to the practical applications of tests in selection. Journal of Applied Psychology, 23, 565-578.

U.S. Department of Health and Human Services, Administration on Children, Youth and Families. (2008). Child Maltreatment 2006. Washington, DC: U.S. Government Printing Office.

U.S. Department of Health and Human Services, Administration on Children, Youth and Families. (ND). Does substantiation of child maltreatment relate to child well-being and service receipt? Findings from the NCSAW Study: Research Brief No. 9.. Washington, DC. Retrieved on July 26, 2010 from http://www.acf.hhs.gov/programs/opre/abuse_neglect/nscaw/ reports/substan_child/s ubstan_child.pdf

U.S. Government Accountability Office. (2007). African American Children in Foster Care: Additional HHS Assistance Needed to Help States Reduce the Proportion in Care. Washington, DC: U.S. Government Printing Office. (GAO-07-816)

Way, B. B., Allen, M. H., Mumpower, J. L., Stewart, T. R., & Banks, S. M. (1998). Interrater agreement among psychiatrist in psychiatric emergency assessments. American Journal of Psychiatry, 155, 1423-8.

Yaun, J., Hedderson, J., & Curtis, P. (2003). Disproportionate representation of race and ethnicity in child maltreatment: Investigation and victimization, Children and Youth Services Review, 25, 359-373.

JERYL L. MUMPOWER

Texas A&M University

(1) The Taylor-Russell framework (Taylor & Russell, 1939) specifies a similar analysis schema. Likewise, in computer science, a similar approach is referred to as a confusion matrix (Kohavi & Provost, 1998).

(2) There is substantial debate in the literature about the validity of substantiation as an indicator of maltreatment, as discussed further in the concluding section.

(3) NIS restricts its breakdowns by race and ethnicity to Blacks, Hispanics, and Whites. Other groups are too small to permit statistically reliable estimates. For this reason, the present analyses are also restricted to these same three groups. Because NIS does not provide data that breaks down substantiation rates by race, the analyses presume that the overall rate is the same across groups. The validity of this assumption is unverified, but data from the State of California, presented in the subsequent section, suggests that it is not an unreasonable one.

(4) Specifically, unlike the national data, the State of California data provided a direct measure of the ratio of substantiated to unsubstantiated referrals for each of the three major racial and ethnic groups. On the other hand, estimates of the overall incidence of child maltreatment in this analysis had to rely on national level 2006 data from NIS-4.

(5) These estimates are based on the Endangerment Standard. If the Harm Standard is used instead, the percentages are essentially reversed. The analysis would estimate that approximately 71% of all cases are detected and 29% are missed.
Table 1
2006 National Child Welfare Referral and Substantiation Data,
Endangerment Standard (Incidence Rates per 1,000 children)

                  No Referral     Referral      Total

Maltreatment          27.4        12.1 (3)      39.5 (1)
No Maltreatment      928.9        31.6          960.5
Total                956.3        43.7 (2)      1000.0

(1) Source: NIS-4, Table 3-3 (Sedlak et al., 2010) 3-3 (U.S.
Dept. of Health and Human Services, 2008)

(2) Source: Child Maltreatment 2006, Table 2-1 (U.S. Dept. of
Health and Human Services, 2008)

(3) Source: Child Maltreatment 2006, Table

Table 2
Summary Statistics for 2006 National Child Welfare
Referral and Substantiation Data

                       All Referrals          Screened-In Referrals
                                                       Only

                 (1) Base Case:   (2)        (3)            (4)
                 Endangerment     Harm       Endangerment   Harm
                 Standard         Standard   Standard       Standard

Incidence rate        39.5          17.1         39.5         17.1

Positive
predictive
value                0.277         0.277        0.449        0.449

Negative
predictive
value                0.971         0.995        0.972        0.995

Sensitivity          0.306         0.708        0.306        0.708

False
Alarm rate
(1-specificity)      0.033         0.032        0.015        0.016

Accuracy             0.941         0.963        0.958        0.980

FP/FN
ratio                 1.15          6.32         0.54         2.97

Table 3
2006 National Child Welfare Referral and Substantiation Data,
Endangerment Standard (Incidence Rates per 1,000 children), by
Race and Ethnicity

                                Black

                  No Referral   Referral   Total

Maltreatment      29.8          19.8 (2)   49.6 (1)
No Maltreatment   899.5         50.9 (3)   950.4
Total             929.3         70.7       1000.0

                                Hispanic

                  No Referral   Referral   Total

Maltreatment      19.4          10.8 (2)   30.2 (1)
No Maltreatment   942.0         27.8 (3)   969.8
Total             961.4         38.6       1000.0

                                  White

                  No Referral    Referral    Total

Maltreatment      17.9           10.7 (2)    28.6 (1)
No Maltreatment   943.9          27.5 (3)    971.4
Total             961.8          38.2        1000.0

(1) Source: NIS-4, Table 4-4 (Sedlak et al., 2010)

(2) Source: Child Maltreatment 2006, Table 3-11 (U.S. Dept. of
Health and Human Services, 2008)

(3) Source:  Estimate based on Child Maltreatment 2006, Tables
2-1 and 3.3 (U.S. Dept. of Health and Human Services, 2008).

Table 4
Summary Statistics for 2006 National Child Welfare Referral and
Substantiation Data, Endangerment Standard (Incidence Rates per
1,000 children), by Race and Ethnicity

                                   Black   Hispanic   White

Incidence rate                     49.6    30.2       28.6
Positive predictive value          0.280   0.280      0.280
Negative predictive value          0.968   0.980      0.981
Sensitivity                        0.399   0.358      0.374
False Alarm rate (1-specificity)   0.054   0.029      0.028
Accuracy                           0.919   0.953      0.955
FP/FN ratio                        1.71    1.43       1.54

Table 5
2008 State of California Child Welfare Referral and
Substantiation Data, Endangerment Standard (Incidence Rates per
1,000 children), by Race and Ethnicity (1)

                        Overall (n=10,003,896)

                  No Referral   Referral     Total

Maltreatment         29.8         9.7       39.5 (2)
No Maltreatment      921.5        39.0       960.5
Total                951.3        48.7       1000.

                        Hispanic (n=4,891,254)

                  No Referral   Referral     Total

Maltreatment         20.1         10.1      30.2 (2)
No Maltreatment      931.5        38.3       969.8
Total                951.6        48.4       1000.

                          Black (n=585,702)

                  No Referral   Referral    Total

Maltreatment         24.6          25      49.6 (2)
No Maltreatment      860.3        90.1      950.4
Total                884.9       115.1      1000.

                         White (n=3,103,380)

                  No Referral   Referral    Total

Maltreatment         20.2         8.4      28.6 (2)
No Maltreatment      939.6        31.8      971.4
Total                959.8        40.2      1000.

(1) All data from Needell et al. (2010) unless otherwise noted

(2) Source: NIS-4, Table 4-4 (Sedlak et al., 2010)

Table 6
Summary Statistics 2008 State of California Child Welfare
Referral and Substantiation Data, Endangerment Standard
(Incidence Rates per 1,000 children), by Race and Ethnicity

                                   Overall   Black   Hispanic   White

Incidence rate                     39.5      49.6    30.2       28.6
Positive predictive value          0.199     0.217   0.209      0.209
Negative predictive value          0.969     0.972   0.979      0.979
Sensitivity                        0.246     0.504   0.334      0.294
False Alarm rate (1-specificity)   0.041     0.095   0.039      0.033
Accuracy                           0.931     0.885   0.942      0.948
FP/FN ratio                        1.31      3.66    1.91       1.57

Table 7
Hypothetical Diagnostic Results for a Perfectly Valid and
Reliable Child Welfare Services Reporter

                  Not Referred    Referred    Totals

Maltreatment           0            100        100
Present             (False         (True
                   Negatives)    Positives)

No maltreatment       900            0         900
present              (True         (False
                   Negatives)    Positives)

Referral Status       900            0         1000

Total Errors = False Negatives + False Positives = 0 + 0 = 0

Table 8
Hypothetical Diagnostic Results for a Systematically Biased
Reporter for Minority Group Families

                          Not Referred           Referred        Totals

Maltreatment Present           0                   100            100
                        (False Negatives)    (True Positives)

No maltreatment               810                   90            900
present                 (True Negatives)    (False Positives)

Referral Status               810                  190            1000

Total Errors = False Negatives + False Positives = 0 + 90 = 90

Table 9
Hypothetical Diagnostic Results for an Imperfectly Reliable
Reporter for Majority Group Families

                    Not Referred          Referred        Totals

Maltreatment             10                  90            100
Present           (False Negatives)   (True Positives)

No maltreatment          810                 90            900
present           (True Negatives)    (False Positives)

Referrals                820                 180           1000

Total Errors = False Negatives + False Positives = 10 + 90 = 100

Table 10
Hypothetical Diagnostic Results for an Imperfectly Reliable
Reporter for Minority Group Families

                    Not Referred          Referred        Totals

Maltreatment             20                  80            100
Present           (False Negatives)   (True Positives)

No maltreatment          720                 180           900
present           (True Negatives)    (False Positives)

Referrals                740                 260           1000

Total Errors = False Negatives + False Positives = 20 + 180 = 200

Figure 1
Binary Classification Analysis Schema for the Child Welfare System

                  No Referral       Referral

Maltreatment      False             True Positives/   Total,
                  Negatives/        "Hits" (TP)       Maltreated
                  "Misses" (FN)                       Children (TP +
                                                      FN)

No Maltreatment   True Negatives/   False             Total,
                  "Correct          Positives/        Non-Maltreated
                  Rejections"       "False Alarms"    Children (TN +
                  (TN)              (FP)              FP)

                  Total,            Total, Referred   Grand Total (TP
                  Non-Referred      Children (TP +    + TN + FP + FN)
                  Children (TN +    FP)
                  FN)
Gale Copyright: Copyright 2010 Gale, Cengage Learning. All rights reserved.