Using a five-step procedure for inferential statistical analyses.
Abstract: Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The described procedure can be used for both parametric and nonparametric inferential tests. The example given is a chi-square goodness-of-fit test of a genetics experiment involving a dihybrid cross in corn that follows a 9:3:3:1 ratio. This experimental analysis is commonly done in introductory biology labs.

Key Words: Five-step procedure; statistical inference; chi-square; goodness-of-fit; dihybrid cross; introductory biology.
Article Type: Report
Subject: Biology (Study and teaching)
Mathematical statistics (Study and teaching)
Teaching (Methods)
Author: Kamin, Lawrence F.
Pub Date: 03/01/2010
Publication: Name: The American Biology Teacher Publisher: National Association of Biology Teachers Audience: Academic; Professional Format: Magazine/Journal Subject: Biological sciences; Education Copyright: COPYRIGHT 2010 National Association of Biology Teachers ISSN: 0002-7685
Issue: Date: March, 2010 Source Volume: 72 Source Issue: 3
Product: Product Code: 8522100 Biology; 8524300 Statistics NAICS Code: 54171 Research and Development in the Physical, Engineering, and Life Sciences
Geographic: Geographic Scope: United States Geographic Code: 1USA United States
Accession Number: 245037750
Full Text: Inferential statistics is an indispensible tool for biological hypothesis testing. Early in their science education, students learn about the scientific method and how inductive rather than deductive reasoning is used to make the logical leap from particular experimental results to one or more general conclusions. However, before any conclusion can be reached, the experimental results must be tested for statistical significance. After all, there is a chance that any difference between two or more experimental treatments or tests is attributable to random events. Therefore, we use statistics "to compare the data with our ideas and theories, to see how good a match there is" (Hand, 2008: p. 10). The five-step procedure presented here was designed to aid in this process.

Science teachers must lead students through a strange new statistical landscape that combines logic, jargon, and mathematical calculations such as variance, standard deviation, sum of squares, and calculated test statistics. Concepts like Type I errors, one-tailed or two-tailed alternative hypotheses, and p value must be defined and related to specific examples. But even in excellent statistics and biostatistics texts, data are given, a value for a (level of significance) is given, and then, typically, a "What do you conclude?" question is asked. As an afterthought, usually a part B to the problem, students are asked to give the p value for their conclusion. This method of posing statistics problems has always struck me as disjointed.

I believe that the following simple procedure allows the given problem to be stated, viewed, and solved as a stand-by-itself organic whole. This procedure both formalizes and crystalizes student thinking. Another advantage of this five-step procedure is that it can be used for essentially all statistical inference tests--both parametric and nonparametric. I was taught this technique in a graduate-level course in statistics, and I have been using it ever since.


* The Five General Steps in Hypothesis Testing

Step 1

Write down the null and alternative hypotheses in both symbols and words, using complete sentences.

Step 2

Calculate the test statistic to the appropriate number of significant figures.

Step 3

(a) State the given a (probability of a Type I error).

(b) Calculate the degrees of freedom.

(c) Give the region of rejection both in symbols and in a graph.

Step 4

Draw a conclusion based on the calculated test statistic.

(a) If the test statistic is in the region of rejection (RR), reject the null hypothesis and state the conclusion in one or more complete sentences.

(b) If the test statistic is not in RR, accept the null hypothesis and state the conclusion in one or more complete sentences.

Step 5

Bracket the p value.


A chi-square goodness-of-fit test is quite commonly used to check the appropriateness of a proposed model that uses categorical data. One popular experiment involves checking to see if a cross involving corn plants results in the Mendelian dihybrid phenotypic ratio of 9 purple smooth to 3 purple wrinkled to 3 yellow smooth to 1 yellow wrinkled corn grains. The following example and data are from such an experiment from one of my botany lab groups.


Step 1

[H.sub.o]: The data fit the model of 9 purple smooth to 3 purple wrinkled to 3 yellow smooth to 1 yellow wrinkled corn grains.

[H.sub.a]: [H.sub.o] is false.

Step 2

Step 3

(a) [alpha] = 0.05

(b) df = 4 - 1 = 3

(c) RR = (7.815, [infinity])

Step 4

[[chi square].sub.calc] = 3.218 does not lie in RR; therefore, I accept [H.sub.o] (the null hypothesis) and conclude that the data fit the model proposed in [H.sub.o] above.

Step 5

0.30 < p < 0.40

* Comments

Step 1

For this example, no symbols were used in Step 1, although one could use, for example, [p.sub.1] = 9/16, [p.sub.2] = 3/16, [p.sub.3] = 3/16, and [p.sub.4] = 1/16. In a test for means equality, the null hypothesis might be as follows: [H.sub.o]: [[mu].sub.1] = [[mu].sub.2]; and [H.sub.a] might be [[mu].sub.1] [not equal to] [[mu].sub.2] or [[mu].sub.1] < [[mu].sub.2] or [[mu].sub.1 > [[mu].sub.2], where y refers to the population mean. Regarding [H.sub.a], for this example, one could state that the data do not fit the proposed model or simply that [H.sub.o] is false.

Step 2

The "expected" counts are calculated under the assumption that [H.sub.o] is true. Thus, the expected count for purple smooth corn grains was calculated as 9/16 x 361 (total of all corn grains). The chi-square statistic is simply the sum of the last column in the table given in Step 2, or 2 [(Obs-Exp).sup.2] / Exp. For this example, it is 3.218. The chi-square statistic was calculated to the same number of significant figures in the chi-square table. It is assumed that the instructor has informed students of the conditions for validity of this test, namely that (1) the data represent a random sample from a large population, (2) the data are whole (counting) numbers and not percentages or standardized scores, and (3) the expected count for each class is [greater than or equal to] 5 (Samuels & Witmer, 2003: chapter 10; Mendenhall et al., 1990: pp. 665-666).

Step 3

The probability of a Type I error, a, must be given as part of the problem. A Type I error is made when a true null hypothesis ([H.sub.o]) is rejected. The degrees of freedom (df) are calculated as k - 1, where k is the number of data classes. The chi-square statistic ([chi square]) has a domain of zero to infinity. The region of rejection (RR) is obtained from a statistical table of chi-square values.

Step 4

This is the important "Decision Rule" of many statistics books. By plotting the [[chi square].sub.calc] value of 3.218 on the graph in Step 3, one can see that 3.218 does not lie in the region of rejection (RR) but, rather, lies in the region of acceptance; this means that the null hypothesis is accepted. Since an absolute truth is not known, in the sense that the conclusion could be wrong, most statisticians prefer stating that there is insufficient evidence to reject the null hypothesis. Failing to reject [H.sub.o], under the constraints of committing a Type I or Type II error, is a better decision than simply accepting it, even though the two choices appear to give a similar conclusion. At this point, depending on time and the level of the class, the instructor may wish to discuss Type II errors. A Type II error is made if a false null hypothesis is accepted (not rejected). The probability of a Type II error ([beta]) can be calculated after the fact (Glover & Mitchell, 2006: section 5.3; Schork & Remington, 2000: pp. 174-181), looked up in tables for some tests (Portney & Watkins, 2009: p. 853), or controlled for by calculating the sample size needed for a given [beta] value (Mendenhall et al., 1990: pp. 443-446). The instructor may also wish to explain why, in most cases, a Type I error is more insidious than a Type II error and that most problems thus give the value for a without ever mentioning [beta].

Step 5

Most statistics books offer excellent explanations for the concept of "p value." One of the best and simplest explanations I have found is: "The term p-value is used to describe the probability that we would observe a value of the test statistic as extreme or more extreme than that actually observed, if the null hypothesis were true" (Hand, 2008: p. 88). In some statistics books, 0.20 is the largest value for p found in the chi-square table. In that case, Step 5 for this example would be written as p > 0.20.

* Discussion

The five-step procedure for general hypothesis testing given here allows students to follow a handy template or procedure for statistical inference tests. This procedure formalizes the approach to problem solving and forces the math and logic involved in such tests to form an organic whole. The five steps stand as a unified entity. The problem is stated, a test statistic is calculated, a conclusion is reached based on a given value for a, and a confidence level is given as the last step (see Step 5 in the Comments section above). The problem and its solution thus stand as a single unit of thought and effort.

DOI: 10.1525/abt.2010.72.3.11


Glover, T. & Mitchell, K. (2006). An Introduction to Biostatistics. Long Grove, IL: Waveland Press.

Hand, D.J. (2008). Statistics: A Very Short Introduction. NY: Oxford University Press.

Mendenhall, W., Wackerly, D.D. & Scheaffer, R.L. (1990). Mathematical Statistics with Applications, 4th Ed. Boston: PWS-Kent.

Portney, L.G. & Watkins, M.P. (2009). Foundations of Clinical Research: Applications to Practice, 3rd Ed. Upper Saddle River, NJ: Prentice Hall.

Samuels, M.L. & Witmer, J.A. (2003). Statistics for the Life Sciences, 3rd Ed. Upper Saddle River, NJ: Prentice Hall.

Schork, M.A. & Remington, R.D. (2000). Statistics with Applications to the Biological and Health Sciences, 3rd Ed. Upper Saddle River, NJ: Prentice Hall.

LAWRENCE F. KAMIN is Professor of Biological Sciences at Benedictine University, 1344 Yorkshire Drive, Carol Stream, IL 60188; e-mail:
Phenotypic Class    Observed   Expected           Exp

Purple smooth         210       203.06          0.2372
Purple wrinkled        74        67.69          0.5882
Yellow smooth          55        67.69          2.3790
Yellow wrinkled        22        22.56          0.0139
Totals:               361       361.00          3.2183
Gale Copyright: Copyright 2010 Gale, Cengage Learning. All rights reserved.