Sunday, February 2, 2014

Interpreting interaction terms

I've had the great privilege of developing a class on social epidemiology this semester, and it's been a lot of fun so far. A ton of work, but fun.
There's a problem that keep cropping up though. A bunch of the articles I've picked out for my class to read have botched the interpretation of interaction terms. Even well-established leaders in the field of social epi routinely botch interpreting their interaction terms.

It may sound like arcane statistical mumbo jumbo, but interpreting interaction terms is really important in the following context. Let's say I want to see whether X causes more or less disease (Y) in group A or group B. That's a classic setting for an interaction term.

You could make B the reference group and model it as:
Y = a + b1X + b2A + b3A*X
Or you could make A the reference group and model it as:
Y = a+ b1X + b2B + b3B*X
You should get the same interpretation either way.
You do when you interpret things correctly.
A lot of people interpret b1 as being the effect of X in the referent group (it is), and b3 as being the effect of X in the comparison group. Sometimes it is, but usually that's not the case.

Here's some real data. Let's say we were looking for the effect of state tax revenues per capita on mortality among Blacks vs. Whites in the 10 most populous states. (full disclosure, I started out with income inequality, but the data didn't look good. I figured state tax revenues per capita are probably a good indicator of redistributive potential)

Age-adjusted mortality in the populous states with the highest tax revenues (CA, MI, NY, NC, PA) was 776.9 per 100,000 Whites per year, and 985.2 per 100,000 Blacks per year.
Age-adjusted mortality in the populous states with the lowest tax revenues (FL, GA, IL, OH, TX) was 806.4 per 1000,000 Whites per year, and 1,026.0 per 100,000 Blacks per year.

Let's make White the reference group, as is standard practice. Then, let's do the standard logistic model.

Mortality = -4.85761 + 0.037268*low tax base + 0.237533*Black + 0.0033101*low tax base*Black

According to the flawed interpretation, the effect of having a low tax base among Whites is exp(0.037268) = 1.038, or a 3.8% increase in mortality in low tax base states, and the effect among Blacks is exp(0.0033101) = 1.003, or a 0.3% increase in mortality in low tax base states, so sloppy interpretation would lead you to think that living in a low tax base state has more impact on Whites than it does on Blacks.

But what happens when you switch the reference group to Blacks?

Mortality = -4.62008 + 0.040578*low tax base -0.23753*White - 0.0033101*low tax base*White

Using the flawed approach, we would get that the effect of having a low tax base among Blacks is exp(0.040578) = 1.041, or a 4.1% increase in mortality in low tax base states, and the effect among Whites is exp(-0.0033101) = 0.997, or a 0.3% decrease in mortality in low tax base states, so the sloppy interpretation suggests that living in a low tax base state increases mortality more for Blacks than Whites, and might even be beneficial for Whites (laying aside for the moment the very important issue of the role of stochastic error in the measures).


What's the real answer? Well it's right there when we look at the two models next to each other. Whites in a low tax base state have a 3.8% increase in mortality, but Blacks have a 4.1% increase in mortality. Not much difference, but the effect appears to be slightly stronger among Blacks than Whites. There is a way to get both the 3.8% and the 4.1% from only one model, but that's a bit more complicated than I want to get into in a blog post...