New to the blog? Skip to the Highlight Reel.
On November 6, the voters of Minnesota rejected a proposed amendment to their state Constitution:
"Only a union of one man and one woman shall be valid or recognized as a marriage in Minnesota." It got 48% support, but that support is not at all evenly spread across the state.
Red is in favor of the amendment, green opposed.
The overall trend is that the lowest levels of support were in Minneapolis/Saint Paul, with growing support further from the capitol. It also looks like support for the amendment tended to be a bit lower near the lakes than in land-locked rural areas.
And yeah, it was a lot of work to put this together.
I'm Bill. These are my observations on queer health, and other things I care about for one reason or another. Tuna was my adorable dog, a companion of 16 years.
Monday, November 12, 2012
Sunday, November 4, 2012
41.423560%
So this morning while I was walking the dogs, I was thinking about exposure categorization. When your exposure is continuous (i.e. could be a little higher, a little lower, a lot higher, or anywhere in-between), and you prefer categorical analysis (as I do), then it is always arbitrary where you cut the exposure into different levels. You may have a good rationale for choosing a specific method, but it is always a decision you need to make, explicitly.
At any rate, one of the things I like to do is break my exposure up into three or four categories, to get a sense of the consistency of whether there is a dose-response happening (i.e. more exposure->more disease). And when there's no good reason to pick any particular cut-offs, one of the standard things we do is to cut the exposure into thirds - that is, one third of the sample becomes the lowest exposure (reference group), one third becomes the middle exposure group, and one third becomes the higher exposed group. And then you compare the middle group to the reference group and the higher exposure group to the reference group.
But as I was walking, it occurred to me that that's not the most efficient possible way to break things into three pieces, statistically speaking. And that's because the reference group is in two comparisions, and the middle and higher exposure groups are only in one comparison. So, if one could have a slightly larger reference group, then you would get more statistical power, even if there were fewer people in the other two groups.
Off the cuff, I guessed that if you chose 40% to be the reference group, and 30% each for the middle and higher exposure groups, that would probably be a bit more efficient.
So, when I got home, I tried out some ideas. The main thing I was looking for was to get the confidence limits around the two comparisons as small as possible. In order to test that out in a particular (purely theoretical) example, I assumed that I was trying to estimate the difference between proportions, so the standard errors would be simple to calculate, and then I made another assumption, that the "event rate" was identical in all three groups (that is, there is no dose-response whatsoever). That's not really the assumption I want to make, but it's a simple starting point to work from.
Then, I calculated the standard errors using a third-a third-a third cut-points, and then again using 40% for the reference group, and 30% for the other two, and voila, the 40%:30%:30% splits did have smaller standard errors (red line below) than the 33%:33%:33% ones did (blue line below). It doesn't look much different, but when you're trying to squeeze the maximum statistical power out of the data you've got, this would be a cheap & simple way to do something.
And then I got to thinking, if 40:30:30 is better than 33:33:33, then what is the optimum size for the referent group, in this example? After a bit of futzing around, I figured out that it is about 41.423560%, leaving 29.289322% for each of the comparison groups. That's the green line below - imperceptably more efficient than 40:30:30.
At any rate, one of the things I like to do is break my exposure up into three or four categories, to get a sense of the consistency of whether there is a dose-response happening (i.e. more exposure->more disease). And when there's no good reason to pick any particular cut-offs, one of the standard things we do is to cut the exposure into thirds - that is, one third of the sample becomes the lowest exposure (reference group), one third becomes the middle exposure group, and one third becomes the higher exposed group. And then you compare the middle group to the reference group and the higher exposure group to the reference group.
But as I was walking, it occurred to me that that's not the most efficient possible way to break things into three pieces, statistically speaking. And that's because the reference group is in two comparisions, and the middle and higher exposure groups are only in one comparison. So, if one could have a slightly larger reference group, then you would get more statistical power, even if there were fewer people in the other two groups.
Off the cuff, I guessed that if you chose 40% to be the reference group, and 30% each for the middle and higher exposure groups, that would probably be a bit more efficient.
So, when I got home, I tried out some ideas. The main thing I was looking for was to get the confidence limits around the two comparisons as small as possible. In order to test that out in a particular (purely theoretical) example, I assumed that I was trying to estimate the difference between proportions, so the standard errors would be simple to calculate, and then I made another assumption, that the "event rate" was identical in all three groups (that is, there is no dose-response whatsoever). That's not really the assumption I want to make, but it's a simple starting point to work from.
Then, I calculated the standard errors using a third-a third-a third cut-points, and then again using 40% for the reference group, and 30% for the other two, and voila, the 40%:30%:30% splits did have smaller standard errors (red line below) than the 33%:33%:33% ones did (blue line below). It doesn't look much different, but when you're trying to squeeze the maximum statistical power out of the data you've got, this would be a cheap & simple way to do something.
And then I got to thinking, if 40:30:30 is better than 33:33:33, then what is the optimum size for the referent group, in this example? After a bit of futzing around, I figured out that it is about 41.423560%, leaving 29.289322% for each of the comparison groups. That's the green line below - imperceptably more efficient than 40:30:30.
For a four-group categorization, the optimal size for the reference group is 36.60254%, with 21.2324867% in the three comparison groups.
For a five group categorization, the optimal reference group size is exactly 1/3, with 1/6 in each of the other four groups, and for an eight category breakdown (I can't say I recommend splitting so finely), the optimal reference group would be 27.429189%, with 10.36725871% in each of the other 7 groups.
I probably won't pursue this any further because I see stats as the means to the end, and not super interesting in themselves.
If there is a dose-response, these calculations get a bit more complex, and depend on how much of a dose response, the distribution of the exposure, and so on. My guess is that in that case, the optimal size for the reference category would be a bit larger, and there might even be a bit of efficiency gain by making the middle exposure group a tiny bit larger than the higher exposure group.
But enough with the navel-gazing. Time to get back to my paper on how segregation affects how likely one is to experience racially discriminatory events...
Thursday, November 1, 2012
Torture and Truth: Metaphors of Data Analysis
New reader? Skip to The Highlight Reel...
Francis Bacon is credited, rightly or wrongly, with a major turning point in the scientific method - an insistence on empirical, observable evidence, as opposed to reasoning from first principles. He also served in very prominent positions in English politics, and my down-the-hall neighbor, Carolyn Merchant, has done some terrific work tying together his politics and his science, through a lens of how the man thought of women, including the ultimate in feminine mystique - Nature herself. I'm at great risk of mischaracterizing her work, but I'll do my best.
In his writings (in Latin), Bacon frequently used the verb 'vexare' to describe the methods by which Truth could be extracted from Nature. And Carolyn's work shows that how one translates 'vexare' has quite profound implications.
Most modern translations describe 'vexare' as meaning "to vex", which sounds direct, but according to Carolyn, his meaning was probably closer to another interpretation: "to torture", and that several of his early translators in fact rendered 'vexare' as "torture". At the risk of ridiculous oversimplification, did Francis Bacon see the way to provoke the Truth from Nature by vexing her, or by torturing her? Did he imagine Nature giving up her secrets because he, the scientist, had devised a method of constraining her wild unpredictability into a stress position that required her to give up the answer?
Because in Bacon's day, and in Bacon's own mind, torture was seen as a valid method used to get the Truth. We now know that torture does nothing of the kind - it causes the tortured to say whatever they think the torturer wants to hear.
Well, the reason I bring all this up is that at the APHA conference, I saw some results that looked as though Data herself had been tortured more than interrogated by the analytic methods applied to her. In contrast, I also saw lots of evidence of researchers who had sat down with Data, asked her some questions, and got some answers they didn't expect. Rather than ignoring her, or turning the screws to get her to change her tune, they listened carefully to what Data had to say. The mark of a suberb scientist, I think, is knowing the line between interrogation and torture - and figuring out when the answer to a research question should be believed, when it should be ignored, and when one needs to change one's own understanding of the world, especially when the answers contradict what we had hoped to hear.
There is an opposing problem as well - very often we get an answer that is so in-line with our pre-conceptions that we run off to publish without taking the time to check and re-check whether that answer is valid. In other words, our interrogation techniques need not be harsh, but we do need due diligence.
I wish I could say that I never torture Data, that when she speaks, I listen. But the reality is that I often have a very strong pre-conception of what Data should say, and when I don't get the answer I want to hear, my first reaction is to wonder - did I hear her correctly? (i.e. was there mis-coding, or a programming error that transposed the unexpected answer for her true response). My second reaction is that maybe she mis-understood what I meant to ask, so I ask the question again using different phrasing (use a linear rather than a logistic model; re-classify the exposure cut-points or the outcome characterization; include a different set of control variables, etc.). These methods are usually not torture - they are reasonable reactions to past experiences where I have made programming errors, where classification matters a great deal, where omission of a key control variable does result in mis-leading results. But crossing the line to torture at this second stage is far too easy to justify, especially when I have a lot invested (in reputation, world-view, justifying how grant money was spent, etc.) in getting the answers I want to hear. It is easy at this stage to try out a variety of techniques to transform the answer I don't want to hear into one that I do.
My third reaction to pesky Data is to say she's wak. Maybe Data got high before being dragged into the interrogation room and is just giving weird answers to satisfy her own impenetrable sense of humor. Or in other words, are there sampling errors, and/or systematic biases in the data that generate unreliable results?
It is only after many attempts, in many ways, to discount results I don't want to hear, that I take seriously the idea that I may have the wrong idea, that there is a completely different narrative that Data wants to tell. I will have wondered from the start what might explain contrary findings, but I won't replace my pre-conceptions until I'm utterly convinced that I've gotten it wrong. And I think that's the right approach - usually I do ask the wrong question, or if I ask the right question, I might well ask a dataset that is not well-equipped to give the right answer. But every once in a while, I listen carefully, and I hear a story that's much more interesting than the one I had in my head from the beginning. And those stories I don't want to hear - turns out they have happy endings too.
Francis Bacon is credited, rightly or wrongly, with a major turning point in the scientific method - an insistence on empirical, observable evidence, as opposed to reasoning from first principles. He also served in very prominent positions in English politics, and my down-the-hall neighbor, Carolyn Merchant, has done some terrific work tying together his politics and his science, through a lens of how the man thought of women, including the ultimate in feminine mystique - Nature herself. I'm at great risk of mischaracterizing her work, but I'll do my best.
In his writings (in Latin), Bacon frequently used the verb 'vexare' to describe the methods by which Truth could be extracted from Nature. And Carolyn's work shows that how one translates 'vexare' has quite profound implications.
Most modern translations describe 'vexare' as meaning "to vex", which sounds direct, but according to Carolyn, his meaning was probably closer to another interpretation: "to torture", and that several of his early translators in fact rendered 'vexare' as "torture". At the risk of ridiculous oversimplification, did Francis Bacon see the way to provoke the Truth from Nature by vexing her, or by torturing her? Did he imagine Nature giving up her secrets because he, the scientist, had devised a method of constraining her wild unpredictability into a stress position that required her to give up the answer?
Because in Bacon's day, and in Bacon's own mind, torture was seen as a valid method used to get the Truth. We now know that torture does nothing of the kind - it causes the tortured to say whatever they think the torturer wants to hear.
Well, the reason I bring all this up is that at the APHA conference, I saw some results that looked as though Data herself had been tortured more than interrogated by the analytic methods applied to her. In contrast, I also saw lots of evidence of researchers who had sat down with Data, asked her some questions, and got some answers they didn't expect. Rather than ignoring her, or turning the screws to get her to change her tune, they listened carefully to what Data had to say. The mark of a suberb scientist, I think, is knowing the line between interrogation and torture - and figuring out when the answer to a research question should be believed, when it should be ignored, and when one needs to change one's own understanding of the world, especially when the answers contradict what we had hoped to hear.
There is an opposing problem as well - very often we get an answer that is so in-line with our pre-conceptions that we run off to publish without taking the time to check and re-check whether that answer is valid. In other words, our interrogation techniques need not be harsh, but we do need due diligence.
I wish I could say that I never torture Data, that when she speaks, I listen. But the reality is that I often have a very strong pre-conception of what Data should say, and when I don't get the answer I want to hear, my first reaction is to wonder - did I hear her correctly? (i.e. was there mis-coding, or a programming error that transposed the unexpected answer for her true response). My second reaction is that maybe she mis-understood what I meant to ask, so I ask the question again using different phrasing (use a linear rather than a logistic model; re-classify the exposure cut-points or the outcome characterization; include a different set of control variables, etc.). These methods are usually not torture - they are reasonable reactions to past experiences where I have made programming errors, where classification matters a great deal, where omission of a key control variable does result in mis-leading results. But crossing the line to torture at this second stage is far too easy to justify, especially when I have a lot invested (in reputation, world-view, justifying how grant money was spent, etc.) in getting the answers I want to hear. It is easy at this stage to try out a variety of techniques to transform the answer I don't want to hear into one that I do.
My third reaction to pesky Data is to say she's wak. Maybe Data got high before being dragged into the interrogation room and is just giving weird answers to satisfy her own impenetrable sense of humor. Or in other words, are there sampling errors, and/or systematic biases in the data that generate unreliable results?
It is only after many attempts, in many ways, to discount results I don't want to hear, that I take seriously the idea that I may have the wrong idea, that there is a completely different narrative that Data wants to tell. I will have wondered from the start what might explain contrary findings, but I won't replace my pre-conceptions until I'm utterly convinced that I've gotten it wrong. And I think that's the right approach - usually I do ask the wrong question, or if I ask the right question, I might well ask a dataset that is not well-equipped to give the right answer. But every once in a while, I listen carefully, and I hear a story that's much more interesting than the one I had in my head from the beginning. And those stories I don't want to hear - turns out they have happy endings too.
OK, I'll admit it, that last line is pure schmaltz. You got a better way to wrap this ramble up with a tidy bow?
Subscribe to:
Posts (Atom)