| |

Go to: Next · Previous · Contents · Search · Home |

MODELS: IMPORTANT DETAILS continued

What I said on the last few pages about t tests of ordinal variables and t tests of Likert-scale variables applies also to counts: t tests are usually OK, and they will fall over only when you have a small sample size and more than 70% of your subjects score zero counts (because then the sampling distribution of the difference between the means won't be close enough to normal).

When you are fitting lines or curves, you also have to worry about non-uniformity
of residuals. With counts, this worry is very real, because the variation in
a given count from sample to sample depends on how big the count is. For example,
the typical variation (standard deviation) in a count is usually simply the
square root of the count, so a count of about 400 injuries varies typically
by ±20, whereas a count of about 40 injuries varies typically by ±6.
I hope it's obvious that the residuals for injury counts of 400 will therefore
be much larger than those for counts of about 40. Rank transformation would
fix these non-uniform residuals, but better approaches are available: **binomial
regression**, **Poisson
regression**, **square-root transformation **and **arcsine-root transformation**.** **

**Binomial and Poisson Regression**

When counts have a smallish upper bound (e.g., the number
of injured players in a squad of 24 is at most 24), the counts from sample to
sample vary according to what is known as a **binomial distribution**. When
the upper bound is very large compared with the observed values of the count
(e.g., the number of spinal injuries in American football each year), the counts
have a **Poisson distribution**. With a good stats program, you can dial
up an analysis that uses either of these distributions. The result is a **binomial regression** or a
**Poisson regression**. In the Statistical Analysis
System, you can do these analyses with Proc Genmod. *Genmod* stands for **generalized linear modeling**, which is an advanced form of general linear modeling that allows for the properties of non-normally distributed variables such as counts and proportions based on counts.

Don't feel intimidated by *binomial* and *Poisson.* Are you happy
with the notion that the values of most variables have the bell-shaped normal
distribution? OK, counts or proportions of something
don't have the normal shape when the counts are small, so we need different
mathematics to describe their shapes, and different names for them. As counts
get larger, the shapes of the binomial and Poisson distributions tend towards
the normal shape. You still have the problem of non-uniform residuals, though,
because the variability from observation to observation for larger counts is
more (in absolute values) or less (in percentage terms) than for smaller counts.
Binomial and Poisson regressions and other forms of generalized linear modeling take care of the non-uniformity. For more on generalized linear modeling, in particular the specification and use of distributions and link functions, read this message I sent to the Sportscience email list in July 2004. .

**Square-root and Arcsine-root Transformation**

One way to deal with non-uniform residuals is to transform
the variable. We've seen that log transformation
works for some variables, and rank transformation
works for most variables as a last resort. Is there a transformation for **counts**
that will allow us to use normal analyses instead of binomial or Poisson regression?
Yes, provided you aren't close to some upper bound in the counts, just use the
**square root** of the counts in the usual analyses. When you've derived
the outcome statistic and its confidence limits, assess their magnitudes with
Cohen's or my scale of effect sizes, as I explained
for rank transformation. You can't back-transform an effect (such as a difference
between means) into a count by squaring it, but you can get a feel for the magnitude
as a count relative to the mean by adding the value of the effect appropriately
to the mean of the square-rooted counts, then squaring it. Square the mean for
comparison. Add each of the confidence limits of the effect to the square-rooted
mean and square it to get a feel for the precision of the magnitude.

Read the cautionary note about how the value of a back-transformed mean is not the same as the mean of the raw variable. For a simple example, imagine you have a team with only one injury this season and another team with nine injuries. The mean of the raw number of injuries is (1+9)/2 = 5. But the mean of the root-transformed injury counts is (1+3)/2 = 2, and when you square 2 to back-transform it you get 4!

**Proportions** require an exotic transformation called **arcsine-root**.
To use this transformation, express the proportion as a number between 0 and
1 (e.g., 210 Type I muscle fibers in a biopsy of 542 total fibers represents
a proportion of 210/542 = 0.387). Now take the square root and find the inverse
sine (arcsine) of the resulting number; in other words, find the angle whose
sine is the square root of the proportion. (The angle can be in degrees or radians,
where 360 degrees is 2 pi radians.) Use that weird variable in your analysis,
but *weight each observation by the number in the denominator of the proportion,*
to ensure that the residuals in the analysis are uniform. You'll have to read
the documentation for your stats program to see how to apply a weighting factor.
To gauge magnitude of effects with an arcsine-root transformed variable, apply the Cohen or Hopkins scale,
as explained for rank transformation. The
appropriate standard deviation is the root-mean square error from the analysis
of the transformed variable, because this error should take into account the
weighting factors. As is the case for counts, back-transformation of the observed
effect works only if you add the effect appropriately to the mean before taking
its sine and squaring it. Multiply the result by 100 if you want it as a percent.
Do the same with the confidence limits.

The square root and arcsine-root transformations work well even for low counts
or zero proportions. As with ordinal variables, you'll get into trouble only
with small sample sizes when more than 70% of your subjects have a score of
zero or a proportion of zero. Then you *have* to use binomial or Poisson
regression.

Phew! The square-root and arcsine-root approaches are complex. I recommend that you come to terms with a stats package that offers binomial and Poisson regression or generalized linear modeling.

Go to: Next · Previous · Contents · Search · Home

webmaster

Last updated 19 Aug 2004