Linear models and effect magnitudes for research,
clinical and practical applications.




Abstract:  Effects are relationships between variables. The magnitude of an effect has an essential role in samplesize estimation, statistical inference, and clinical or practical decisions about utility of the effect. Virtually every effect in research, clinical and practical settings arises from a linear model, an equation in which a dependent variable equals a sum of predictor variables and/or their products. Linear models allow for the effect of one predictor to be adjusted for the effects of other predictors and for the modeling of nonlinearity via polynomials. Effects and models used to estimate them depend on the nature of the dependent variable (continuous, count, nominal) and the predictor variables (numeric, nominal). A continuous dependent gives rise to a difference in a mean with a nominal predictor and a slope or correlation with a numeric predictor. Default magnitude thresholds for difference in a mean come from standardization (dividing by the betweensubject standard deviation): 0.2, 0.6, 1.2, 2.0 and 4.0 for small, moderate, large, very large and extremely large. The same thresholds apply to a slope, provided the slope is evaluated as the difference for 2 SD of the predictor. Thresholds for correlations are 0.1, 0.3, 0.5, 0.7 and 0.9. Many effects and errors are uniform across the range of the dependent variable when expressed as percents or factors, and these should be estimated via log transformation. Nonuniformity of error arising from repeated measurement or from different subject groups should be addressed via withinsubject modeling or mixed modeling, which also provide estimates of individual responses to treatments. Effects on nominal variables and counts are analyzed with various generalized linear models, where the dependent is the log of either the odds of a classification, the hazard (incidence rate) of an event, or the mean count. The effect is estimated initially as a factor representing a ratio between two groups (or per unit or per 2 SD of a numeric predictor) of either odds of a classification, hazards of an event, counts, or count rates. Effects involving common classifications or events can be converted to differences in percent risk and interpreted with magnitude thresholds of 10, 30, 50, 70 and 90; equivalent odds ratios are 1.5, 3.4, 9.0, 32 and 360. Thresholds for common events can also be derived from standardization of log of time to the event. Both sets of thresholds are similar and correspond to hazard ratios of 1.3, 2.3, 4.5, 10 and 100. For counts and rare events, a consideration of proportions attributable to an effect gives rise to ratio thresholds for counts, hazards, risks or odds of 1.1, 1.4, 2.0, 3.3, and 10. Proportional hazards regression is an advanced form of linear modeling for use with events when hazards change with time but their ratio is constant. KEYWORDS: correlation. count ratio, hazard ratio, minimum clinically important difference, odds ratio, relative risk, risk difference, standardization, transformation. 


Article Type:  Report 
Subject: 
Statistical methods
(Evaluation) Correlation (Statistics) (Evaluation) 
Author:  Hopkins, Will G. 
Pub Date:  01/01/2010 
Publication:  Name: Sportscience Publisher: Internet Society for Sport Science Audience: Academic Format: Magazine/Journal Subject: Health Copyright: COPYRIGHT 2010 Internet Society for Sport Science ISSN: 11749210 
Issue:  Date: Annual, 2010 Source Volume: 14 
Geographic:  Geographic Scope: United Kingdom Geographic Code: 4EUUK United Kingdom 
Accession Number:  297427053 
Full Text: 
After presenting the Magnitude Matters slideshow recently in
several workshops, I realized that it needed more on the role played by
linear modeling in estimation of effects. The additive nature of the
linear model is the basis of adjustment for the effects of other factors
to get pure or unconfounded effects and to identify potential mediators
or mechanisms of an effect. The additive nature of linear models also
explains why we should use the log of the dependent variable to estimate
uniform percent or factor effects. A consideration of the error term in
a linear model provides further justification for the use of log
transformation, along with the use of the unequalvariances t statistic
or mixed modeling in analyses where the error term differs between or
within subjects. Finally, the analyses for counts and binary dependent
variables make little sense without understanding how the underlying
linear models require such strange dependent variables as the log of the
odds of a classification or the log of the hazard of a timedependent
event. The new slideshow addresses all these issues and more, using
material from the recent progressive statistics article (Hopkins et al.,
2009) and a book chapter on injury statistics (Hopkins, 2009). The
slideshow hopefully represents a useful combination of theory and
practical advice for anyone who wants to understand and estimate effects
in their research. For more on the way we infer causality, deal with confounders, and account for mechanisms in the relationships between variables, see the slideshow/article on research designs (Hopkins, 2008). My article and spreadsheets on understanding stats via simulations (Hopkins, 2007a) are useful for learning more about log transformation, straightforward analyses, and inferential statistics. Follow this link to a slideshow that details the various approaches to repeated measures and random effects; I presented it at a conference in 2003, but it is still up to date. When it comes to actual data analysis, you will need extra help with the practicalities of the use of a spreadsheet or stats package. Peruse the article on comparing two group means and play with the associated spreadsheet to come to terms with simple comparisons of means and adjustment for a covariate (Hopkins, 2007b). The article on the various controlled trials and the associated spreadsheets are a little more advanced and also full of useful material (Hopkins, 2006). See my item on Sad Stats for an overview of some of the stats packages and for a set of files that are useful for SPSS users. If you already have some experience with the SAS package but need specific advice on Proc Mixed or Proc Genmod, contact me. Reviewer's Commentary The reprint pdf contains this article with a printerfriendly version of the slideshow (six slides per page). Update 28 Aug 2010. Oddsratio thresholds of 1.5, 3.4, 9.0, 32 and 360 now included as an adjunct to proportiondifference thresholds of 10, 30, 50, 70 and 90 percent when modeling and interpreting common timeindependent classifications. These oddsratio thresholds, which I computed directly from the proportion differences centered on 50% (55 vs 45, 65 vs 35, etc.), agree well with a formula devised by Chinn (2000) to convert an odds ratio to a standardized difference in means (ln(odds ra tio)/1.81). Chinn S (2000). A simple method for converting an odds ratio to effect size for use in metaanalysis. Statistics in Medicine 19, 31273131 Hopkins WG (2006). Spreadsheets for analysis of controlled trials, with adjustment for a subject characteristic. Sportscience 10, 4650 Hopkins WG (2007a). Understanding statistics by using spreadsheets to generate and analyze samples. Sportscience 11, 2336 Hopkins WG (2007b). A spreadsheet to compare means in two groups. Sportscience 11, 2223 Hopkins WG (2008). Research designs: choosing and finetuning a design for your study. Sportscience 12, 1221 Hopkins WG (2009). Statistics in observational studies. In: Verhagen E, van Mechelen W (editors) Methodology in Sports Injury Research. OUP: Oxford. 6981 Hopkins WG, Marshall SW, Batterham AM, Hanin J (2009). Progressive statistics for studies in sports medicine and exercise science. Medicine and Science in Sports and Exercise 41, 312. Link to PDF. Published July 2010 [C]2010 Will G Hopkins Sport and Recreation, AUT University, Auckland 0627, New Zealand. Email. Reviewer: Alan M Batterham, School of Health and Social Care, Teesside University, Middlesbrough TS1 3BA, UK. * Importance of Effect Magnitudes * Getting Effects from Models * Linear models; adjusting for covariates; interactions; polynomials * Effects for a continuous dependent * Difference between means; "slope"; correlation * General linear models: t tests; multiple linear regression; ANOVA... * Uniformity of error; log transformation; withinsubject and mixed models * Effects for a nominal or count dependent * Risk difference; risk, odds, hazard and count ratios * Generalized linear models: Poisson, logistic, loghazard * Proportionalhazards regression Getting Effects from Models * An effect arises from a dependent variable and one or more predictor (independent) variables. * The relationship between the values of the variables is expressed as an equation or model. * Example of one predictor: Strength = a + b*Age * This has the same form as the equation of a line, Y = a + b*X, hence the term linear model. * The model is used as if it means: Strength [left arrow] a + b*Age. * If Age is in years, the model implies that older subjects are stronger. * The magnitude comes from the "b" coefficient or parameter. * Real data won't fit this model exactly, so what's the point? * Well, it might fit quite well for children or old folks, and if so... * We can predict the average strength for a given age. * And we can assess how far off the trend a given individual falls. * With kids, inclusion of Size would reduce the effect of Age. To that extent, Size is a mechanism or mediator of Age. * But sometimes a covariate is a confounder rather than a mediator. * Example: Physical Activity (predictor) has a strong relationship with Health (dependent) in a sample of old folk. Age is a confounder of the relationship, because Age causes bad health and inactivity. * Again, including potential confounders as covariates produces the pure effect of a predictor. * Think carefully when interpreting the effect of including a covariate: is the covariate a mechanism or a confounder? * If you are concerned that the effect of Age might differ for subjects of different Size, you can add an interaction. * Example of an interaction: Strength = a + b*Age + c*Size+ d*Age*Size * This model implies that the effect of Age on Strength changes with Size in some simple proportional manner (and vice versa). * It's still known as a linear model. Background: The Rise of Magnitude of Effects * Research is all about the effect of something on something else. * The somethings are variables, such as measures of physical activity, health, training, performance. * An effect is a relationship between the values of the variables, for example between physical activity and health. * We think of an effect as causal: more active [right arrow] more healthy. * But it may be only an association: more active [left and right arrow] more healthy. * Effects provide us with evidence for changing our lives. * The magnitude of an effect is important. * In clinical or practical settings: could the effect be harmful or beneficial? Is the benefit likely to be small, moderate, large...? * In research settings: * Effect magnitude determines sample size. * Metaanalysis is all about averaging magnitudes of studyeffects. * So various research organizations now emphasize magnitude * Example of two predictors: Strength = a + b*Age + c*Size * Additional predictors are sometimes known as covariates. * This model implies that Age and Size have effects on strength. * It's still called a linear model (but it's a plane in 3D). * Linear models have an incredible property: they allow us to work out the "pure" effect of each predictor. * By pure here I mean the effect of Age on Strength for subjects of any given Size. * That is, what is the effect of Age if Size is held constant? * That is, yeah, kids get stronger as they get older, but is it just because they're biggger, or does something else happen with age? * The something else is given by the "b": if you hold Size constant and change Age by one year, Strength increases by exactly "b". * We also refer to the effect of Age on Strength adjusted for Size, controlled for Size, or (recently) conditioned on Size. * Likewise, "c" is the effect of one unit increase in Size for subjects of any given Age. * You still use this model to adjust the effect of Age for the effect of Size, but the adjusted effect changes with different values of Size. * Another example of an interaction: Strength = a + b*Age + c*Age*Age = a + b*Age + c*[Age.sup.2] * By interacting Age with itself, you get a nonlinear effect of Age, here a quadratic. * If c turns out to be negative, this model implies strength rises to a maximum, then comes down again for older subjects. * To model something falling to a minimum, c would be positive. * To model more complex curvature, add d*[Age.sup.3], e*[Age.sup.4] ... * These are cubics, quartics..., but it's rare to go above a quadratic. * These models are also known as polynomials. * They are all called linear models, even though they model curves. * Use the coefficients to get differences between chosen values of the predictor, and values of predictor and dependent at max or min. * Complex curvature needs nonlinear modeling (see later) or linear modeling with the predictor converted to a nominal variable... * Group, factor, classification or nominal variables as predictors: * We have been treating Age as a number of years, but we could instead use AgeGroup, with several levels; e.g., child, adult, elderly. * Stats packages turn each level into a dummy variable with values of 0 and 1, then treat each as a numeric variable. Example: * Strength = a + b*AgeGroup is treated as Strength = a + [b.sub.1] *Child + [b.sub.2] *Adult + [b.sub.3] *Elderly, where Child=1 for children and 0 otherwise, Adult=1 for adults and 0 otherwise, and Elderly=1 for old folk and 0 otherwise. * The model estimates the mean value of the dependent for each level of the predictor: mean strength of children = a + [b.sub.1]. * And the difference in strength of adults and children is [b.sub.2] [b.sub.1]. * You don't usually have to know about coding of dummies, but you do when using SPSS for some mixed models and controlled trials. * Dummy variables can also be very useful for advanced modeling. * For simple analyses of differences between group means with ttests, you don't have to think about models at all! * Or you can model change scores between pairs of trials. Example: * Strength = a + b*Group*Trial, where b has four values, is equivalent to StrengthChange = a + b*Group, where b has just two values (expt and cont) and StrengthChange is the postpre change scores. * You can include subject characteristics as covariates to estimate the way they modify the effect of the treatment. Such modifiers or moderators account for individual responses to the treatment. * A popular modifier is the baseline (pre) score of the dependent: StrengthChange = a + b*Group + c*Group*StrengthPre. * Here the two values of c estimate the modifying effect of baseline strength on the change in strength in the two groups. * And [c.sub.2] [c.sub.1] is the net modifying effect of baseline on the change. * Bonus: a baseline covariate improves precision of estimation when the dependent variable is noisy. * Modeling of change scores with a covariate is built into the controlledtrial spreadsheets at Sportscience. Specific Linear Models, Effects and Threshold Magnitudes * These depend on the four kinds (or types) of variable. * Continuous (numbers with decimals): mass, distance, time, current; measures derived therefrom, such as force, concentration, volts. * Counts: such as number of injuries in a season. * Ordinal: values are levels with a sense of rank order, such as a 4pt Likert scale for injury severity (none, mild, moderate, severe). * Nominal: values are levels representing names, such as injured (no, yes), and type of sport (baseball, football, hockey). * As predictors, the first three can be simplified to numeric. * If a polynomial is inappropriate, parse into 35 levels of a nominal. * Example: Age becomes AgeGroup (514, 1529, 3059, 6079, >79). * Values can also be parsed into equal quantiles (e.g., quintiles). * If an ordinal predictor such as a Likert scale has only 24 levels, or if the values are stacked at one end of the scale, analyze the values as levels of a nominal variable. * Linear models for controlled trials * For a study of strength training without a control group: Strength = a + b*Trial, where Trial has values pre, post or whatever. * b*Trial is really [b.sub.1]*Pre + [b.sub.2] *Post, with Pre=1 or 0 and Post=1 or 0. * The effect of training on mean strength is given by [b.sub.2][b.sub.1], * For a study with a control group: Strength = a + b*Group*Trial, where Group has values expt, cont. * b*Group*Trial is really [b.sub.1]*ContPre + [b.sub.2]*ContPost + [b.sub.3]*ExptPre + [b.sub.4] *ExptPost. * The changes in the groups are given* by [b.sub.2] [b.sub.1] and [b.sub.4][b.sub.3]. * The net effect of training is given by ([b.sub.4] [b.sub.3]) ([b.sub.2] [b.sub.1]). * Stats packages also allow you to specify this model: Strength = a + b*Group + c*Trial + d*Group*Trial. * Group and Trial alone are known as main effects. * This model is really the same as the interactiononly model. * It does allow easy estimation of overall mean differences between groups and mean changes pre to post, but these are useless. * You can include the change score of another variable as a covariate to estimate its role as a mediator (i.e., mechanism) of the treatment. Example: StrengthChange = a + b*Group + d*MediatorChange. * d represents how well the mediator explains the change in strength. * [b.sub.2] [b.sub.1] is the effect of the treatment when MediatorChange=0; that is, the effect of the treatment not mediated by the mediator. * Linear vs nonlinear models * Any dependent equal to a sum of predictors and/or their products is a linear model. * Anything else is nonlinear, e.g., an exponential effect of Age, to model strength reaching a plateau rather than a maximum. * Almost all statistical analyses are based on linear models. * And they can be used to adjust for other effects, including estimation of individual responses and mechanisms. * Nonlinear procedures are available but are more difficult to use. * The most common effect statistic, for numbers with decimals (continuous variables). * Difference when comparing different groups, e.g., patients vs healthy. * Change when tracking the same subjects. patients healthy * Difference in the changes in controlled trials. * The betweensubject standard deviation provides default thresholds for important differences and changes. * You think about the effect ([DELTA]mean)in terms of a fraction or multiple of the SD ([DELTA]mean/SD). * The effect is said to be standardized. * The smallest important effect is [+ or ]0.20 ([+ or ]0.20 of an SD). [GRAPHIC OMITTED] * Relationship of standardized effect to difference or change in percentile: * Can't define smallest effect for percentiles, because it depends what percentile you are on. * But it's a good practical measure. * And easy to generate with Excel, if the data are approx. normal. [GRAPHIC OMITTED] [GRAPHIC OMITTED] Measures of Athletic Performance * For fitness tests of teamsport athletes, use standardization. * For top solo athletes, an enhancement that results in one extra medal per 10 competitions is the smallest important effect. * Simulations show this enhancement is achieved with 0.3 of an athlete's typical variability from competition to competition. * Example: if the variability is a coefficient of variation of 1%, the smallest important enhancement is 0.3%. * Note that in many publications I have mistakenly referred to 0.5 of the variability as the smallest effect. * Moderate, large, very large and extremely large effects result in an extra 3, 5, 7 and 9 medals in every 10 competitions. * The corresponding enhancements as factors of the variability are: trivial 0.3 small 0.9 moderate 1.6 large 2.5 very large 4.0 ext. large * Example: the effect of a treatment on strength [GRAPHIC OMITTED] [GRAPHIC OMITTED] * Interpretation of standardized difference or change in means: Complete scale: trivial 0.2 small 0.6 moderate 1.2 large 2.0 very large 4.0 ext. large Cautions with Standardizing * Choice of the SD can make a big difference to the effect. * Use the baseline (pre) SD, never the SD of change scores. * Standardizing works only when the SD comes from a sample representative of a welldefined population. * The resulting magnitude applies only to that population. * Beware of authors who show standard errors of the mean (SEM) rather than SD. * SEM = SD[square root of (sample size)] * So effects look a lot bigger than they really are. * Check the fine print; if authors have shown SEM, do some mental arithmetic to get the real effect. Other Smallest Differences or Changes in Means * Single 5 to 7pt Likert scales: half a step. * Visualanalog scales scored as 010: 1 unit. * Athletic performance... * Beware: smallest effect on athletic performance depends on method of measurement, because... * A percent change in an athlete's ability to output power results in different percent changes in performance in different tests. * These differences are due to the powerduration relationship for performance and the powerspeed relationship for different modes of exercise. * Example: a 1 % change in endurance power output produces the following changes... * 1% in running timetrial speed or time; * ~0.4% in roadcycling timetrial time; * 0.3% in rowingergometer timetrial time; * ~15% in time to exhaustion in a constantpower test. * A hardtointerpret change in any test following a fatiguing preload. * A slope is more practical than a correlation. * But unit of predictor is arbitrary, so it's ... ., hard to define smallest effect for a slope. * Example: 2% per year may seem trivial, yet 20% per decade may seem large. " * For consistency with interpretation of correlation, Age better to express slope as difference per two SDs of predictor. * It gives the difference between a typically low and high subject. * See the page on effect magnitudes at newstats.org for more. * Easier to interpret the correlation, using Cohen's scale. * Smallest important correlation is [+ or ]0.1. Complete scale: trivial 0.1 small 0.3 moderate 0.5 large 0.7 very large 0.9 ext. large * But note: in validity studies, correlations >0.90 are desirable. [ILLUSTRATION OMITTED] The Names of Linear Models with a Continuous Dependent * You need to know the jargon so you can use the right procedure in a spreadsheet or stats package. * Unpaired t test: for 2 levels of a single nominal predictor. * Use the unequalvariances version, never the equalvariances. * Paired t test: as above, but the 2 levels are for the same subjects. * Simple linear regression: a single numeric predictor. * Multiple linear regression: 2 or more numeric predictors. * Analysis of variance (ANOVA): one or more nominal predictors. * Analysis of covariance (ANCOVA): one or more nominal and one or more numeric predictors. * Repeatedmeasures analysis of (co)variance: AN(C)OVA in which each subject has two or more measurements. * General linear model (GLM): any combination of predictors. * In SPSS, nominal predictors are factors, numerics are covariates. * Mixed linear model: any combination of predictors and errors. * You characterize the error with a standard deviation. * It's also known as the standard error of the estimate or the root mean square error. * In general linear models, the error is assumed to be uniform. * That is, there is only one SD for the residuals, or the error for every datum is drawn from a single "hat". * Nonuniform error is known as heteroscedasticity. * If you don't do something about it, you get wrong answers. * Without special treatment, many datasets show bigger errors for bigger values of the dependent. * This problem is obvious in some tables of means and SDs, in scatter plots, or in plots of residual vs predicted values (see later). * Such plots of individual values are also good for spotting outliers. * It arises from the fact that effects and errors in the data are percents or factors, not absolute values. * Example: an error or effect of 5% is 5 s in 100 s but 10 s in 200 s. * The effect of a nominal predictor can also be expressed as a correlation = [square root of (fraction of "variance explained")]. * A 2level predictor scored as 0 and 1 gives the same correlation. * With equal number of subjects in each group, the scales for correlation and standardized difference match up. * For >2 levels, the correlation can't be applied to individuals. Avoid. * Correlations when controlling for something... * Interpreting slopes and differences in means is no great problem when you have other predictors in the model. * Be careful about which SD you use to standardize. * But correlations are a challenge. * The correlation is either partial or semipartial (SPSS: "part"). * Partial = effect of the predictor within a virtual subgroup of subjects who all have the same values of the other predictors. * Semipartial = unique effect of the predictor with all subjects. * Partial is probably more appropriate for the individual. * Confidence limits may be a problem in some stats packages. The Error Term in Linear Models with a Continuous Dependent * Strength = a + b*Age isn't quite right for real data, because no subject's data fit this equation exactly. * What's missing is a different error for each subject: Strength = a + b*Age + error * This error is given an overall mean of zero, and it varies randomly (positive and negative) from subject to subject. * It's called the residual error, and the values are the residuals. * residual = (observed value) minus (predicted value) * In many analyses the error is assumed to have values that come from a normal (bellshaped) distribution. * This assumption can be violated a lot. Testing for normality is not an issue, thanks to the Central Limit Theorem. * With a count as the dependent, the error has a Poisson (or the related negative binomial) distribution, which is an issue * Address with general/zed linear modelingsee later. * Address the problem by analyzing the logtransformed dependent. * 5% effect means Post = Pre*1.05. * Therefore log(Post) = log(Pre) + log(1.05). * That is, the effect is the same for everyone: log(1.05). * And we now have a linear (additive) model, not a nonlinear model, so we can use all our usual linear modeling procedures. * A 5% error means typically x 1.05 and /1.05, or x//1.05. * And a 100% error means typically x//2.0 (i.e., values vary typically by a factor of 2), and so on. * When you finish analyzing the logtransformed dependent, you backtransform to a percent or factor effect. * Show percents for anything up to ~30%. Show factors otherwise, e.g., when the dependent is a hormone concentration. * Use the logtransformed values when standardizing. * Log transformation is often appropriate for a numeric predictor. * The effect of the predictor is then expressed per percent, per 10%, per 2fold increase, and so on. Example of simple linear regression with a dependent requiring log transformation. * A log scale or log transformation produces uniform residuals. [GRAPHIC OMITTED] * Nonuniformity also arises with different groups and time points. * Example: a simple comparison of means of males and females, with different SD for males and females (even after log transformation). * Hence the unequalvariances t statistic or test. * To include covariates here, you can't use the general linear model: you have to keep the groups separate, as in my spreadsheets. * Example: a controlled trial, with different errors at different time points arising from individual responses and changes with time. * MANOVA and repeatedmeasures ANOVA can give wrong answers. * Address by reducing or combining repeated measurements into a single change score for each subject: withinsubject modeling. * Then allow for different SD of change scores by analyzing the groups separately, as above. * Bonus: you can calculate individual responses as an SD. * See Repeated Measures and Random Effects at sportsci.org and/or the article on the controlledtrial spreadsheets for more. * Or specify several errors and much more with a mixed model ... * For timedependent effects, subjects start "N" but different proportions end up "Y". * Risk or proportion difference = ab. injured (%) * Example: ab = 83%50% = 33%, so at the time point shown, an extra 33 of every 100 males are injured because they are male. * Good for common events, but timedependent. * Can't model risks and estimate differences directly in linear models. * Smallest effect: 10% at time when the risk difference is maximum. * At that time, 1 male in every 10 is injured due to being male. * Complete scale (for common events, where everyone gets affected): trivial 10% small 30% moderate 50% large 70% very large 90% ext. large [GRAPHIC OMITTED] * This scale applies also to timeindependent common classifications. * Rank transformation is another way to deal with nonuniformity. * You sort all the values of the dependent variable, then rank them (i.e., number them 1, 2, 3,...). * You then use this rank in all further analyses. * The resulting analyses are sometimes called nonparametric. * But it's still linear modeling, so it's really parametric. * They have names like Wilcoxon and KruskalWallis. * Some are truly nonparametric: the sign test; neuralnet modeling. * Some researchers think you have to use this approach when "the data are not normally distributed". * In fact, the ranktransformed dependent is anything but normally distributed: it has a uniform (flat) distribution!!! * So it's really an approach to try to get uniformity of effects and error. * Problems: it doesn't necessarily give uniformity; you lose a lot of information; it's hard to convert the rank effects back to raw values. * So use ranks as a last resort. * Mixed modeling is the cuttingedge approach to the error term. * Mixed = fixed effects + random effects. * Fixed effects are the usual terms in the model; they estimate means. * Fixed, because they have the same value for everyone in a group or subgroup; they are not sampled randomly. * Random effects are error terms and anything else randomly chosen from some population; each is summarized with an SD. * The general linear model allows only one error. Mixed models allow: * specification of different errors between and within subjects; * withinsubject covariates (GLM allows only subject characteristics or other covariates that do not change between trials); * specification of individual responses to treatments and individual differences in subjects' trends; * interdependence of errors and other random effects, which arises when you model different lines or curves for each subject. * With repeated measurement in controlled trials, simplify analyses by analyzing change scores, even when using mixed modeling. * Relative risk or risk ratio = a/b. * Example: 83/50 = 1.66 or "66% increase in risk". * Widely used but inappropriate for common timedependent events. * Hazards and hazard ratios are better: see later. * For rare events, risk ratio is OK, because same as hazard ratio. * Can't estimate directly with linear models. * Risk difference or odds ratio are better for common classifications. * Magnitude scale: use risk difference, odds ratio or hazard ratio. For the experts: [GRAPHIC OMITTED] * Number needed to treat (NNT) = 100/(ab). * = number you would have to treat or sample for one subject to have an outcome attributable to the effect. * Promoted in some clinical journals, but not widely used? * Can't estimate directly with linear models. * Magnitude scale (if you ever use it) is given by 1/(risk difference). Odds ratio = (a/c)/(b/d). * Hard to interpret, but must use to express effects and confidence limits for timeindependent classifications, including some casecontrol designs. * Use hazard ratio for timedependent risks. * Magnitudes for common timeindependent classifications. * Either convert to difference in risk between the reference (comparison or control) group and other group. Example shown: if 50% of reference group is affected, and odds ratio is 4.9, then by simple algebra, 83% of other group is affected. Therefore risk difference = 33% (i.e., moderate). * Or use this scale for odds ratios, which correspond to risk differences of 10, 30, 50, 70 and 90% "centered" on 50%:_ trivial 1.5 small 3.4 moderate 9.0 large 32 very large 360 ext. large * Magnitudes for rare classifications: see later. [GRAPHIC OMITTED] Ratio of mean time to event = [t.sub.2]/[t.sub.1]. 100 * Easier for an individual to interpret. * If the hazards are constant, it's also Proportion the inverse of the hazard ratio. * Example: if hazard ratio is 2.5, there is 2.5x the risk of injury. But 1/2.5 = 0.4, 0 so injury occurs in less than half the time, on average. Difference in mean time to event = [t.sub.2][t.sub.1]. * Also easy to interpret, but can't model directly. * Standardization of the log of individual values of time to event leads to another scale for hazard ratios or meantime ratios of common events: 1.3, 2.2, 4.5, 13, 100. * This scale is similar to that given by consideration of maximum risk difference for common events. Averaging the two and simplifying... * Hazardratio thresholds for common events:_ trivial 1.3 small 2.3 moderate 4.5 large 10 very large 100 ext. large [GRAPHIC OMITTED] * Derive and interpret the "slope" (a correlation isn't defined here). * As with a nominal predictor, you have to express effects as odds or hazard ratios (for timeindependent or dependent events) to get confidence limits. * Example shows how chances would change with fitness, and the meaning of the odds ratio per unit of fitness: (b/d)/(a/c). * Odds ratio here is ~ (75/25)/(25/75) = 9.0 per unit of fitness. * Best to express as odds or hazard ratio per 2 SD of predictor. * Magnitude scales are then the same as for nominal predictors. [GRAPHIC OMITTED] * Hazard ratio or incidence rate ratio = e/f. * Hazard = instantaneous risk rate proportion per infinitesimal of time. e = 100 %/5wk = 20 %/wk = 2.9 %/d f = 40 %/5wk = 8 %/wk = 1.1 %/d e/f=100/40 = 20/8 = 2.9/1.1 = 2.5 * Hazard ratio is the best statistical measure for timedependent events. * It's the risk ratio right now: male risk is 2.5x the female risk. * Effects and confidence limits can be derived with linear models. * Obviously not dependent on time if hazard rates are constant. * And even if both rates change, often OK to assume their ratio is constant, which is the basis of proportional hazards regression. * Magnitude scale depends on whether event is common or rare. * For common events and constant hazards, maximum risk differences translate into hazard ratios of 1.3, 2.3, 4.4, 10, and 50. * Hazard ratios for rare events: see later. [GRAPHIC OMITTED] Magnitude Thresholds for Rare Events and Classifications * The focus is the affected few and/or those who deal with them. * Hazard ratio = risk ratio = odds ratio for low risks (or short times). * A ratio of 1.1 would produce a 10% increase or decrease in the workload of anyone dealing with the event. * Anything less might go unnoticed, so 1.1 is the smallest effect. * Or in a group of affected individuals, 1 in 10 able to blame the effect represents a defensible smallest effect. Similarly 3, 5, 7 and 9 individuals in 10 represent the other magnitude thresholds. * Corresponding hazard ratios are 10/9, 10/7, 10/5, 10/3, and 10/1. Hence... * Hazardratio thresholds for rare events: trivial 1.1 small 1.4 moderate 2.0 large 3.3 very large 10 ext. large * By a similar argument, this scale applies to count ratios. * Oddly, these thresholds are smaller than those for common events. * We tolerate relatively higher risk for a rare event, but if we end up as an event, we wish the risk had been lower! * Effect of a nominal predictor is expressed as a ratio (factor) or percent difference. * Example: in their sporting careers, women get 2.3 times more tendon injuries than men. * If the ratio is ~1.5 or less, it can be expressed as a percent: men get 26% (1.26 times) more muscle sprains than women. * Effects of a numeric predictor are expressed as factors or percents per unit or per 2 SD of the predictor. * Example: 13% more tackles per 2 SD of repeatedsprint speed. * Magnitude scale for count ratios is same as for rare events: trivial 1.1 small 1.4 moderate 2.0 large 3.3 very large 10 ext. large Details of Linear Models for Events, Classifications, Counts * Counts, and binary variables representing levels of a nominal, give wrong answers as dependents in the general linear model. * It can predict negative or nonintegral values, which are impossible. * Nonuniformity would also be an issue. * Generalized linear modeling has been devised for such variables. * The generalized linear model predicts a dependent that can range continuously from [infinity] to +[inifinity] just as in the general linear model. * You specify the dependent by specifying the distribution of the dependent and a link function. * For a continuous dependent, specifying the normal distribution and the identity link produces the general linear model. * Don't use this approach with continuous dependents, because the standard procedures for general linear modeling are easier. * Easiest to understand the approach with counts first... * For binary variables representing timeindependent events (e.g., a classification, such as selected or not), the dependent is the log of the odds of the event occurring. * Odds=p/(1p), where p is the probability of the event. * P ranges from 0 to 1, so odds range continuously from 0 to * So log of the odds ranges from [degrees]o to * So the link function is the logodds, also known as the logit. * Specify the distribution for binary events, binomial. * The model is called logistic regression, but logodds regression would be better. * The log of the odds results in effects expressed as odds ratios. * A logodds model may be simplistic or unrealistic, but it's got to be better than modeling p or log p, which definitely does not work. * Some researchers mistakenly use this model for timedependent events, such as development of injury. But... * If proportions of subjects experiencing the event are low, you can model risk, odds or hazards, because the ratios are the same. Three slides for the experts Other Models for Events and Classifications *All have outcomes modeled as ratios (between levels of nominal predictors) or ratios per unit (or per 2 SD) of numeric predictors. *The magnitude scales for common and rare events and classifications are the same as in previous models. *Summary (with examples): * For counts (e.g., each athlete's number of injuries), the dependent is the log of the mean count. * The mean count ranges continuously from 0 to +[infinity]o. * The log of the mean count ranges from [infinity] to +[infinity] * So the link function is the log. * Specify the distribution for counts, Poisson. * The model is called Poisson regression. * The log link results in effects expressed as count ratios. * If the counts accumulate over different periods for different subjects, you can specify the period in the model as an offset or denominator. * You are then modeling rates, and the effects are rate ratios. * Specify a negative binomial distribution if you think the events for each subject tend to occur in clusters rather than truly randomly. * The model thereby reflects the fact that the counts have a bigger variation for a given predicted count than purely Poisson counts. * For binary variables representing timedependent events (e.g., un/injured), the dependent is the log of the hazard. * The hazard is the probability of the event per unit time. * For events that accumulate with a constant hazard (h), the proportion of subjects affected at time t is given by p = 1  [e.sup.h.t]; hence h = log(1  p). * The hazard ranges continuously from 0 to [infinity] * Log of the hazard ranges from [infinity] to +[infinity] * The link function is known confusingly as the complementary loglog: log(log(1p)). * I prefer to refer to the loghazard link. * Specify the distribution for binary events, binomial. * The model has no common name. I call it loghazard regression. * The log of the hazard results in effects expressed as hazard ratios. * You can specify a different monitoring time for each subject. * When hazards aren't constant, use proportional hazards regression. * When the dependent is a nominal with >2 levels, group into various combinations of 2 levels and use the above models, or... * For the advanced class... * Multinomial logistic regression, for timeindependent nominals (e.g., a study of predictors of choice of sport). * Use the multinomial distribution and the generalized logit link (available in SAS in the new Glimmix procedure). * SAS does not provide a link in Glimmix or Genmod for multinomial hazard regression of timedependent nominals. * Cumulative logistic regression, for timeindependent ordinals (e.g., injury severity on a 4point Likert scale). * Multinomial distribution; cumulative logit link. * Use for <5pt or skewed Likert scales; otherwise use general linear. * Cumulative hazard regression, for timedependent ordinals (e.g., uninjured, mild injury, moderate injury, severe injury). * Multinomial distribution; cumulative complementary loglog link. *Generalized linear models for repeated or clustered measures are also known as generalized estimating equations. * Proportional hazards (Cox) regression is another and more advanced form of linear modeling for timedependent events. * Use when hazards can change with time, if you can assume ratios of the hazards of the effects are constant. * Example: hazard changes as the season progresses, but hazard for males is always 1.5x that for females. * A constant ratio is not obvious in this kind of figure. Time (months) * Time to the event is the dependent, but effects are estimated and interpreted as hazard ratios. * The model takes account of censoring: when someone leaves the study (or the study stops) before the event has occurred. [GRAPHIC OMITTED] * Not covered in this presentation: magnitude thresholds for measures of reliability, validity, and diagnostic accuracy. * Counts and nominal dependents (representing classifications and timedependent events) need various generalized linear models. * Examples: Poisson regression for counts, logistic regression for classifications, loghazard regression for events. * The dependent variable is the log of the mean count, the log of the odds of classification, or the log of the hazard (instantaneous risk) of the event. * Effectmagnitude thresholds for counts and nominal dependents: * Percent risk differences for classifications: 10, 30, 50, 70, 90. * Corresponding odds ratios for classifications: 1.5, 3.4, 9.0, 32, 360. * Hazardratio thresholds for common events: 1.3, 2.3, 4.5, 10, 100. * Ratio thresholds for counts and rare events: 1.1, 1.4, 2.0, 3.3,10 (apply equally to count, hazard, risk or odds ratios). * Use proportional hazards regression when hazards vary with time but hazard ratio is constant. Main Points * An effect is a relationship between a dependent and predictor. * Effect magnitudes have key roles in research and practice. * Magnitudes are provided by linear models, which allow for adjustment, interactions, and polynomial curvature. * Continuous dependents need various general linear models. * Examples: t tests, multiple linear regression, ANOVA... * Withinsubject and mixed modeling allow for nonuniformity of error arising from different errors with different groups or time points. * Effects for continuous dependents are mean differences, slopes (expressed as 2 SD of the predictor), and correlations. * Thresholds for small, moderate, large, very large and extremely large standardized mean differences: 0.2, 0.6, 1.2, 2.0, 4.0. * Thresholds for correlations: 0.1, 0.3, 0.5, 0.7, 0.9. * Many dependent variables need log transformation before analysis to express effects and errors as uniform percents or factors. Will G Hopkins Sport and Recreation, AUT University, Auckland 0627, New Zealand. Email. Reviewer: Alan M Batterham, School of Health and Social Care, Teesside University, Middlesbrough TS1 3BA, UK. *As dependents, each type of variable needs a different approach Summary of main effects and models (with examples): Dependent Predictor Effect of predictor continuous Strength nominal Trial difference in means continuous Activity numeric Age nominal InjuredNY nominal Sex differences or ratios of proportions, odds, rates, hazards nominal SelectedNY numeric Fitness slope' (difference or ratio per unit of predictor) count Injuries nominal Sex ratio of counts count Medals numeric Cost slope' (ratio per unit of predictor) Dependent Statistical model continuous Strength (un)pairedt test; (multiple) regression; ANOVA; ANCOVA; continuous Activity general linear; mixed linear nominal InjuredNY logistic regression; loghazard regression; generalized linear; nominal SelectedNY count Injuries Poisson regression; generalized linear; count Medals Dependent Predictor Effect Continuous nominal difference or change in means Strength Trial Cohen Hopkins trivial <0.2 <0.2 small 0.20.5 0.20.6 moderate 0.50.8 0.61.2 large >0.8 1.22.0 very large ? 2.04.0 extremely large ? >0.4 Dependent Predictor Effect continuous numeric "slope" (difference per unit of predictor); correlation Activity Age Dependent Predictor Effect nominal nominal differences or ratios of proportions, odds, rates, hazards, mean event time InjuredNY Sex Dependent Predictor Effect nominal numeric "slope" SelectedNY Fitness (difference or ratio per unit of predictor) Dependent Predictor Effect count nominal ratio of counts Injuries Sex count numeric "slope" (ratio per unit Tackles Fitness of predictor) Dependent Effect of predictor Statistical model multiple proportions odds ratio multinomial logistic choice of several regression; sports generalized linear ordinal injury odds or hazard ratio cumulative logistic severity (4pt Likert) or hazard regression; generalized linear time to event time hazard ratio proportional hazards to injury (Cox) regression 
Gale Copyright:  Copyright 2010 Gale, Cengage Learning. All rights reserved. 
Previous Article: Impact factors of journals in sport and exercise
science and medicine for 2010.
Next Article: A Socratic dialogue on comparison of measures.
Next Article: A Socratic dialogue on comparison of measures.