| |

Go to: Next · Previous · Contents · Search · Home |

STATISTICAL MODELS continued

weight <= height sex

This model is called an **analysis of covariance** (ANCOVA) when one predictor
variable is numeric (height) and the other is nominal (sex). *Covariance*
refers to the fact that height "co-varies" with the dependent variable,
so height is also known as a **covariate**. Other names for models with two
or more predictor variables include **multiple linear regression** when all
variables are numeric and **two-way analysis of variance** (or three-way
ANOVA etc) when all are nominal. In essence they are all the same. Before we
go into each model in detail, let's understand what it means to have more than
one predictor variable. Let's stay with the above example.

**What the Model Means
**It's easiest to think about the model as a
tool for predicting weight when you know a person's height and sex. If there
IS a relationship between weight and height, then knowing a person's height
will tell you something about his or her weight. Similarly, if there IS a relationship
between weight and sex, then knowing a person's sex will also allow you to say
something about her or his weight. And if you know both height and sex, you'll
be able to be even more specific about weight. So that's the question that the
overall model poses:

Stats programs can calculate the usual goodness-of-fit
R^{2} for the model, which you can interpret as a measure
of *how much* the independent variables tell you about the dependent variable.
In formal terms the R^{2} is the percentage of the variance in the dependent variable
explained or predicted by the independent variables. You can also get a test
statistic for the full model and its associated p value. You could use the p
value to work out confidence limits for the overall R, but otherwise these statistics
aren't worth worrying about. Much more important are effects derived from the
predictor variables, as I will describe now.

**"Controlling" for Something****
**The overall relationship is seldom the main
focus when you have more than one predictor variable in the statistical model.
Instead, these models are used to address a much more important question: what
is the effect of something when we

What do we really mean when we control for height or take height into account
in the comparison of the weights of boys and girls? Simply this: if boys and
girls had the same height, what would be the difference in weight? And that's
exactly what the statistical analysis tells us: it gives us **the effect of
a predictor with all other predictors held constant**. When you do your usual
estimates or contrasts for the effects you're
interested in, or inspect the solution
of the model, the answers you get are automatically adjusted for the presence
of all the other predictor variables, as if they are all set to come constant
value. For example, you get the difference in the mean weight of boys and girls
who have the same (mean) height. Note that the analysis automatically controls
for every predictor variable, so you can also address the question: what's the
effect of height on weight when you take sex into account? What you get from
the analysis for this question is the average slope of the lines for the boys
and the girls, as if there was an equal number of boys and girls in the study.
I'll delve into these issues more on the next page.

Why do the estimates for a given predictor represent the effect of the predictor
with all other predictors in the model held constant? I'm not sure of the best
way to answer this question. I've satisfied myself by considering that a linear
model with two numeric predictor variables represents a plane in 3-D space.
The stats program finds the least-squares plane of best fit. With a bit of thought
and 3-D doodling I was able to see how the value of the coefficient of each
variable is the "slope" for that variable with the other predictor
variable held constant.

**Mechanism Variables and Confounders
**In the above example, suppose we adjust or
control for height and find no substantial difference in the mean weight of
boys compared with that of girls. Is it therefore reasonable to say that differences
in height are responsible for the differences in weight between boys and girls?
Yes! In fact, I call height a

Some researchers also call height a **confounding variable** or a **confounder**
in the relationship between sex and weight. When you use the word *confounder*
to describe height, you are implying that it sort-of makes the difference between
boys and girls seem bigger than it really is. Boys are heavier than girls, of
course, but height is confounding (or even compounding) the difference. There
might be no difference when you take account of height. In fact, girls might
even be heavier than boys. Fair enough, but the word *confounder* should
be reserved for a different kind of covariate, one that has or could have a
causal effect jointly on the predictor and the dependent. Let's consider another
example to make the point clear. Consider the effect of physical activity on
health in a cross-section of the population. Do the analysis without regard
to the age of the subjects and you will find a really strong relationship. Cool,
jobs for exercise professionals! Now control for age and you will find the relationship
gets a lot weaker. Curses! It's likely that age is the real cause of most of
the relationship between activity and health: age reduces physical activity
and age reduces health. We say that the effect of physical activity on health
is confounded by age. It's only when we control for age that we see the effect
of differences in activity on the health of people of the same age.

What happens in the above example if we make age the predictor variable and
physical activity the covariate? Age on its own will have a strong effect on
health, but control for physical activity and you will find the relationship
gets a lot weaker. So, you would be justified in regarding physical activity
as a possible mechanism for the effect of age on health. Wow, that's cool again!
Whether the effect of physical activity on health is really causal or just coincidental
cannot be resolved with cross-sectional data. You have to do interventions and
a repeated-measures analysis to sort that out. I explain how to include a mechanism
variable as a covariate in such analyses later
on.

**Interactions****
**I now have to introduce you to another fearful
challenge:

Height has an overall effect on weight, and sex has an overall effect on weight. But maybe the effect of height on weight is a bit different for boys than for girls: maybe being taller has a bigger effect on weight for boys than for girls. We show that in the model with the so-called interaction term, which is represented by multiplying height and sex together:

weight <= height sex height*sex

This will all make sense when we deal with the specific models.
Meanwhile one more bit of jargon. Height and sex are called **main
effects**, to distinguish them from the interaction term. When you
have more than two main effects, you can have more than one
interaction. When you have all the different combinations of the
effects, including the interactions, you have what's called a **full
model**.

**A Warning!****
**There are several traps for the unwary when
you have more than one predictor variable.

Go to: Next · Previous · Contents · Search · Home

webmaster

Last updated 22 June 02