Monday, December 15, 2014

Observations on 3-fold interactions

Sorry about how "mathy" this post is. I'm percolating about what to write about gay blood donors, but I need to think on that for another few days.

The last lecture for my epi class was about effect measure modification (interactions). Most people do it completely wrong, they use an interaction term in a statistical model (Y=a +b1X1 + b2X2 + b3X1X2), and then interpret b3 as though it's telling you something interesting. It isn't (except in extremely unusual circumstances).
What you really want to know is the degree to which being exposed to X1 and X2 produces more disease than you'd expect if all you know was the effect of X1 in the absence of X2 and the effect of X2 in the absence of X1.
Or, in mathy terms, let Rij be the rate of disease when X1=i and X2=j
We want to know whether (R11-R00), The difference that both make when working together, is greater or less than (R01-R00) + (R10-R00), the difference each makes in the absence of the other.

I'm going to skip right to the three-factor effect measure modification - here the idea is whether:
(R111-R000), the effect of all three together,
is comparable to the effect of each of the three in isolation:
(R100-R000) + (R010-R000) + (R001-R000).

First implication: In order to make that assessment, your study needs people with none of the exposures, all of the exposures, and at most one of the exposures. It does not need anyone with two of the exposures, so including any such subjects would be inefficient. That's bizarre.

Second implication: The fact that those people with two exposures are irrelevant actually points to the fact that there could be four quantities of interest: First, the one comparing each of the three in isolation to the effect of all three together, and then three iterations of comparing one in isolation with the other two in combination, i.e.
(R110-R000) + (R001-R000)
or (R101-R000) + (R010-R000)
or (R011-R000) + (R100-R000)
So, there are actually four interaction terms to compare to the joint effect: (R111-R000).

Third implication: I love how the math and the concepts circle around and inform one another. In this case, the fact that there is one comparison to make when there are two exposures, but four to make when there are three, suggests to me that our brains are not well suited to thinking about the issue of three factor interactions, and the whole idea should ideally not be attempted at all.

But hmmm, what happens when we go to four....
(R1111-R0000) would be the joint effect of all four.
The single factors adding up would be:
(R1000-R0000) + (R0100-R0000)  + (R0010-R0000) + (R0001-R0000)
Three together + one more would be:
(R1110-R0000) + (R0001-R0000)
(R1101-R0000) + (R0010-R0000)
(R1011-R0000) + (R0100-R0000)
(R0111-R0000) + (R1000-R0000)
Two together plus each of the other two alone would be:
(R1100-R0000) + (R0010-R0000)  + (R0001-R0000)
(R1010-R0000) + (R0100-R0000)  + (R0001-R0000)
(R1001-R0000) + (R0100-R0000)  + (R0010-R0000)
(R0110-R0000) + (R1000-R0000)  + (R0001-R0000)
(R0101-R0000) + (R1000-R0000)  + (R0010-R0000)
(R0011-R0000) + (R1000-R0000)  + (R0100-R0000)
And then two together plus the other two together would be:
(R1100-R0000) + (R0011-R0000)
(R1010-R0000) + (R0101-R0000)
(R1001-R0000) + (R0110-R0000)

Ai! 14 terms to keep in mind simultaneously.

Sunday, February 2, 2014

Interpreting interaction terms

I've had the great privilege of developing a class on social epidemiology this semester, and it's been a lot of fun so far. A ton of work, but fun.
There's a problem that keep cropping up though. A bunch of the articles I've picked out for my class to read have botched the interpretation of interaction terms. Even well-established leaders in the field of social epi routinely botch interpreting their interaction terms.

It may sound like arcane statistical mumbo jumbo, but interpreting interaction terms is really important in the following context. Let's say I want to see whether X causes more or less disease (Y) in group A or group B. That's a classic setting for an interaction term.

You could make B the reference group and model it as:
Y = a + b1X + b2A + b3A*X
Or you could make A the reference group and model it as:
Y = a+ b1X + b2B + b3B*X
You should get the same interpretation either way.
You do when you interpret things correctly.
A lot of people interpret b1 as being the effect of X in the referent group (it is), and b3 as being the effect of X in the comparison group. Sometimes it is, but usually that's not the case.

Here's some real data. Let's say we were looking for the effect of state tax revenues per capita on mortality among Blacks vs. Whites in the 10 most populous states. (full disclosure, I started out with income inequality, but the data didn't look good. I figured state tax revenues per capita are probably a good indicator of redistributive potential)

Age-adjusted mortality in the populous states with the highest tax revenues (CA, MI, NY, NC, PA) was 776.9 per 100,000 Whites per year, and 985.2 per 100,000 Blacks per year.
Age-adjusted mortality in the populous states with the lowest tax revenues (FL, GA, IL, OH, TX) was 806.4 per 1000,000 Whites per year, and 1,026.0 per 100,000 Blacks per year.

Let's make White the reference group, as is standard practice. Then, let's do the standard logistic model.

Mortality = -4.85761 + 0.037268*low tax base + 0.237533*Black + 0.0033101*low tax base*Black

According to the flawed interpretation, the effect of having a low tax base among Whites is exp(0.037268) = 1.038, or a 3.8% increase in mortality in low tax base states, and the effect among Blacks is exp(0.0033101) = 1.003, or a 0.3% increase in mortality in low tax base states, so sloppy interpretation would lead you to think that living in a low tax base state has more impact on Whites than it does on Blacks.

But what happens when you switch the reference group to Blacks?

Mortality = -4.62008 + 0.040578*low tax base -0.23753*White - 0.0033101*low tax base*White

Using the flawed approach, we would get that the effect of having a low tax base among Blacks is exp(0.040578) = 1.041, or a 4.1% increase in mortality in low tax base states, and the effect among Whites is exp(-0.0033101) = 0.997, or a 0.3% decrease in mortality in low tax base states, so the sloppy interpretation suggests that living in a low tax base state increases mortality more for Blacks than Whites, and might even be beneficial for Whites (laying aside for the moment the very important issue of the role of stochastic error in the measures).

What's the real answer? Well it's right there when we look at the two models next to each other. Whites in a low tax base state have a 3.8% increase in mortality, but Blacks have a 4.1% increase in mortality. Not much difference, but the effect appears to be slightly stronger among Blacks than Whites. There is a way to get both the 3.8% and the 4.1% from only one model, but that's a bit more complicated than I want to get into in a blog post...

Saturday, January 11, 2014

Do homophobes really die sooner?

Two weeks ago, I posted in my Research Worth Reading series about an article that found that heterosexuals harboring ill-will towards gays lived shorter lives. It seemed like a methodologically sound article, but one thing nagged at the back of my brain. The un-adjusted results were huge, and after controlling for a few sensible factors, the adjusted results were still impressive, but much smaller.
That always makes me worry about uncontrolled (or poorly controlled) confounding, and I figured I'd look into it. There were a bunch of analytic choices I would have made differently, but none of them seemed like they'd be a big deal.
I got excited by their analysis and writeup, and wanted to play with the same data myself, try out a few different things, maybe look at different sub-groups, that sort of thing. I also thought it was a great approach, looking at the degree to which people harboring hatred may lead shorter lives.
So, I downloaded the same GSS files the authors used and fiddled around with it myself.

The results I got were not quite as impressive as theirs, and suggest that nearly all the main effects can be explained easily by routine confounding factors. Rather than starting out with a 187% increased death rate that is reduced to 25% after adjustment, my analyses showed a 70% increased death rate that was reduced to 8% after adjusting for similar factors.

There are a few important differences between their approach and mine, but it would be a lot more re-assuring to see similar results despite slightly different approaches, and I'm tempted to put this finding on hold until some replication in another dataset comes forward.

Thursday, January 2, 2014

Research Worth Reading - Homophobia Shortens Lives

Mark L. Hatzenbuehler, Anna Bellatorre, Peter Muennig. (2014). Anti-gay prejudice and all-cause mortality among heterosexuals in the United States. American Journal of Public Health. Published online ahead of print, Dec 12. 2013.

I'm so glad someone has finally done this study!
We all know that homophobia is bad for your health. It could be as direct as gay-bashing, or societal disapproval leading to depression, and less directly by causing high blood pressure and that sort of thing.
But what about the haters? What are the ill effects on people who are themselves homophobic?
In this paper, the authors used 20,226 people who answered the General Social Survey to figure out how much anti-gay prejudice people feel, specifically heterosexuals, then followed them for 5 to 20 years after the survey to see whether straight people who harbor anti-gay prejudices die sooner than those who don't.

They found that heterosexuals with a high degree of anti-gay prejudice were much more likely to die, dying at a rate nearly 3 times as fast as heterosexuals with a lower degree of anti-gay prejudice. That may seem implausibly high, and it is. People who harbor anti-gay prejudice tend to have less formal education, and tend to be older, and both of those factors strongly predict mortality.
But even after adjusting for age and educational attainment (and a few other things), they found that heterosexuals with a high degree of anti-gay prejudice died about 25% faster than heterosexuals with lower anti-gay beliefs. That's more reasonable, but still higher than I'd expect. I suspect that at least some of that difference is due to the fact that the General Social Survey is so long and tedious for respondents that there's a fairly high rate of non-sensical responses in there.

Promising work, but when you see over 80% of the apparent effect (an excess hazard ratio of 187% dropping to 25%) after being "explained" by control factors, what's left has to be treated very skeptically.
I'll be eager to see how this line of inquiry pans out in other datasets, although this is clearly the best dataset to start with, and it may be challenging to find another than could produce comparable results for quite some time.

Well worth reading: the language is pretty accessible even if you're not steeped in the public health world. The methods are a bit challenging, but you can skip the most confusing parts because they don't really make much difference anyway.
Methodologic critique
This study is actually really well done, much better than most public health research these days. Despite the inherent flukiness of the GSS, the authors used methods that should be pretty robust despite the relatively high rates of non-sense that you find in the GSS.
Having given high praise overall, I'll move on to the relatively minor things I'd quibble with... First, the measure of whether a person has a high degree of anti-gay prejudice is based on some questions that are horribly out of date, and were horribly out of date when they were asked, from 1988 to 2002. The items are taken from a series of questions designed to assess general social attitudes about communists, atheists, homosexuals and other "undesireables", so the questions can sound a bit strange to us today, especially the first three, which are probably more about civil liberties than prejudice:
  1. "If some people in your community suggested that a book in favor of homosexuality should be taken out of your public library, would you favor removing this book, or not?"
  2. "Should a man who admits that he is a homosexual be allowed to teach in a college or university, or not?" 
  3. "Suppose a man who admits that he is a homosexual wanted to make a speech in your community. Should he be allowed to speak, or not?"
  4. "Do you think that sexual relations between two adults of the same sex is always wrong, almost always wrong, wrong only sometimes, or not wrong at all?"
If I were doing the study, I'd probably ignore the first three as anachronistic and focus just on the fourth one. But what they did is collapsed the fourth one into a yes/no of "not wrong at all" vs. any of the other responses, and then (as best as I can tell), said that a "yes" to any of the four indicated a high degree of prejudice. It's possible that someone had to say "yes" to any two or more to make it into the high prejudice category. At any rate, it would have been re-assuring to show some kind of dose-response curve from lower endorsement to higher endorsement, and also a check to see if the pattern held when just looking at the fourth item, which is most clearly related to prejudice.
Of course, it would also be nice to have some sort of response codes indicating a positive inclination towards us, rather than assuming that our words and deeds have only the potential to offend.

In terms of potential confounders adjusted for, they used pretty much the same list I would have, but I would have modeled some of them a bit differently. Rather than treating age and education as continuous, I'd want to look at them in categories first to make sure that a linear trend makes a logical fit. And I wouldn't use household income itself, but adjust it first to the size and composition of the household relative to poverty. $20,000 for a single person in 1988 would be a lot more comfortable than $20,000 for a household of four in 2002, and log-transforming the household income doesn't help with those issues at all.
Most importantly, I'd want to explore the year of the survey in a bit more detail. The surveys were conducted from 1988 to 2002, and the follow-up for death ended in 2008, so someone from the early part of the survey could be followed for up to 20 years, while someone interviewed in 2002 might be followed up for as few as five years. They used Cox proportional hazards, which should account for these differences in the length of follow-up, but the fact that anti-gay prejudicial attitudes have shifted rapidly over the same time period makes me less confident that the model did what it was supposed to do. You can probably think of someone who would answer those questions differently in 2002 than they would have in 1988. But the model assumes that they would have answered the same way at both points in time, or at the very least that someone giving a certain answer in 1988 had the same level of prejudice as someone giving the same answer in 2002, despite the fact that it became much less acceptable to express anti-gay attitudes over this time period. It might screw up the model a bit to add year of interview in as a potential confounder, but I'd give it a try anyway, because it's quite possible that what we're seeing is just an artifact of the fact that as the population has developed fewer anti-gay attitudes, they've also been followed for a shorter period of time, and are thus less likely to be seen dying, despite the beauty of the Cox proportional hazards approach in dealing with censored data.

Sunday, December 29, 2013

Paid to Take Another's Punishment

I am, by any rational measure, a product of extraordinary privilege. I have a prestigious job. I own my own house. I can walk into pretty much anywhere and be taken seriously.
And yet, even though I can see that those things are true, it often doesn't "feel" like that.
It's not because as a gay man, I feel like a second-class citizen. I don't.
It goes back to high school. Really before that, but high school makes a better story.

St. John's Chapel, Groton School.
I went to a very prestigious boarding school. The same high school as FDR, and half of JFK's cabinet. When I graduated, I was disappointed because I only got into one Ivy League school, one that I (and most of my compatriots) thought of as a "safety" school.
But I wasn't like most students there, I was the son of a teacher, a "fac brat". My parents paid pennies on the dollar for tuition, and everyone knew my place, including me.
One odd tradition they had there was that if you got caught doing something you weren't supposed to do (like skipping services in the lovely chapel shown here), you got assigned to various work duties, the lowest infractions were "punished" by cleaning up the dining hall, wiping the tables down and straightening up the chairs. One of the most severe punishments was to wash dishes, a messy, hot, wet job that lasted for hours.
In 10th grade, I figured out that you could make a bit of money by doing other people's punishments for them. I used to charge $10 to do a night's worth of dishwashing, then when I figured out you could charge even more than that exorbitant rate, I started raising my rates to $20 and even more if it was a night I didn't want to do it, or if I thought the purchaser was a jerk.
I could (usually) get away with it because these jobs were also ones that everyone had to do on rotation, so the fact that I was washing dishes even though I didn't often break the rules didn't necessarily raise eyebrows. But occasionally, one of the faculty would notice and ask "Hey Bill, didn't I see you washing dishes earlier this week?" and I'd have to lay low, not taking on any more customers until suspicions would no longer be raised.
I loved washing dishes, I loved getting messy and wet, pounding the slop down into a trench where it would become feed for the local pig farm; piling the dishes as efficiently as possible into a washing rack; jamming it into the machine, and then yanking the clean dishes off and stacking them in the appropriate piles, throwing the plates airborne as much as possible to minimize skin contact with their scorching hot surfaces. I did it in college too, as a work-study job.
At the time, it didn't feel the least bit demeaning, I was making money, and having fun while doing it. I even felt a degree of pity for the jerks who paid me to work off their punishments.

When I wonder whether they ever felt bad about it, I doubt it. Maybe a little. But they learned a valuable lesson too, one that you see every time a bank settles rather than accepts blame for screwing people over. Just pay up and move on. Maybe it's even the same guys.

Sunday, December 22, 2013

Firearm-related Deaths, United States, 1968-2010

A few months back, I wrote about trends in motor vehicle accidents, and then about trends in hate crime statistics. Now with all the talk about firearm-related deaths I figured I'd look into those a bit.
So, the first obvious thing from the chart below is that there was a large increase in the firearm death rate from 1994 or so down to 1999, and it's been pretty level since then. There were also ups & downs before that, too.
The next thing I see is that changes in the total firearm-related death rate are closely linked to homicides, although the big drop in the late 1990's was due to a drop both in homicides and intentionally self-inflicted injuries, but that trends in homicides and intentionally self-inflicted injuries often follow each other, but not always (especially 2006-2010).
If 1994 sounds familiar, that could be because that's the year the Brady Handgun Violence Prevention Act took effect, requiring background checks for the sale of handguns. I don't know what happened in 1999-2000 to stop that encouraging trend line.

This next chart is a lot busier than it should be, but a couple things stand out clearly when you break the time trends down by age.
First, there are really different trends over time by age. There's an obvious surge in 20-24 year olds dying from 1985 to 1999, but an even more dramatic surge among 15-19 year olds, who start out (and end up) with some of the lowest firearm-related deaths, but really cranked up during the late 80's -early 90's.
All age groups saw a decline during that critical 1994-1999 period.
But when you look a little closer, something else becomes clear: the firearm-related death rates for 35-64 year olds pretty much decline throughout the whole time frame, while 75-84 year olds build up through the 80's, then decline through the 90's, and the 85+ year old group inclined through the 80's, but didn't really come down as much since then.
You may notice a sudden jump in firearm-related deaths among children in 1979, that's actually a fluke due to a change in the coding system (ICD-8 to ICD-9), but the subsequent rise, and dramatic fall in children's firearm-related mortality from that point on is real.

One of the frustrating things about working with US mortality data is that it's always 3-4 years out of date. I don't know why that's the case, because before there were computers, the delays in getting the death data out were measured in months. But that's a topic for another day...

Monday, November 11, 2013

Where are the Food Deserts?

A food desert is an area where healthy food options are out of reach. You know if you're in one, but it's surprisingly difficult for the Ivory Tower crowd (like me) to figure it out. For one thing, there are at least three components in that definition. What's "healthy"? What's "out of reach"? And even what's an "area"?
When I first started thinking about this, I figured, well, you just measure the distance from where a person lives to the nearest supermarket.
image from Data Underload,
Turns out, that's a lousy definition. Makes for pretty pictures, though, like the one on the right, from Nathan Yau (love your site by the way, Nathan).
The main problem with this approach it doesn't take account of of social space. Using this approach, the biggest food deserts are in actual deserts. Which would be fine if you were plopped down in a random part of the country each morning and had to figure out how to eat from scratch every morning.
But we tend to live, work, play, and "get by" in neighborhoods, neighborhoods that are highly structured in physical space in a way that reflects social relations.

When we looked for food deserts in Alameda county, we found that the "deserts" lay beyond the toney hills, in the outlying commuter suburbs, and the places we expected to see low food availability appeared to be chock-a-block full of supermarkets. Geographic space is part of the food desert picture, but somehow we need to get the idea of social distance in there as well to get at the idea of "out of reach".
And then, there's also the idea of "healthy" food. A supermarket may be short-hand for the availability of affordable healthy food options, but there are plenty of supermarkets whose produce aisle looks like the set for a horror movie, and there are also corner stores with gourmet appeal.
At the APHA meeting, I saw a bunch of posters where people had put a lot of work into figuring out food deserts in their communities, including a very ambitious project to describe food availability in great detail in New Orleans.
But I want to come up with a definition of a "food desert" that I can apply across the country, and without having to visit every supermarket, corner store and farmer's market. Lately, I've been thinking about coming up with some sort of relative distance measure, like the distance to the nearest supermarket, divided by the distance to the nearest outlet that sells tobacco or alcohol. So far, I've downloaded all the supermarket locations across the country, but the number of places that sell tobacco is just too huge. Hmmm.

Then, there are other important aspects to the social space that defines a food desert. I've got a job, and I drive about 40 miles to get there, so in the course of my day, I come across many food shopping alternatives. But there are many places along my route that someone without a car would have a great deal of difficulty getting to decent food. I can walk into any food vendor and get great service, even while wearing a hoodie. But not everyone wearing a hoodie gets the opportunity to pay cold hard cash for food, let alone get decent helpful service in the aisles.
I'm also re-thinking food deserts as being located in a clearly delineated physical space, and instead as a condition of what an individual or family experiences. I might not be in a food desert, but my neighbor might be.