Thursday, November 1, 2012

Torture and Truth: Metaphors of Data Analysis

New reader? Skip to The Highlight Reel...

Francis Bacon is credited, rightly or wrongly, with a major turning point in the scientific method - an insistence on empirical, observable evidence, as opposed to reasoning from first principles. He also served in very prominent positions in English politics, and my down-the-hall neighbor, Carolyn Merchant, has done some terrific work tying together his politics and his science, through a lens of how the man thought of women, including the ultimate in feminine mystique - Nature herself. I'm at great risk of mischaracterizing her work, but I'll do my best.
In his writings (in Latin), Bacon frequently used the verb 'vexare' to describe the methods by which Truth could be extracted from Nature. And Carolyn's work shows that how one translates 'vexare' has quite profound implications.
Most modern translations describe 'vexare' as meaning "to vex", which sounds direct, but according to Carolyn, his meaning was probably closer to another interpretation: "to torture", and that several of his early translators in fact rendered 'vexare' as "torture". At the risk of ridiculous oversimplification, did Francis Bacon see the way to provoke the Truth from Nature by vexing her, or by torturing her? Did he imagine Nature giving up her secrets because he, the scientist, had devised a method of constraining her wild unpredictability into a stress position that required her to give up the answer?
Because in Bacon's day, and in Bacon's own mind, torture was seen as a valid method used to get the Truth. We now know that torture does nothing of the kind - it causes the tortured to say whatever they think the torturer wants to hear.

Well, the reason I bring all this up is that at the APHA conference, I saw some results that looked as though Data herself had been tortured more than interrogated by the analytic methods applied to her. In contrast, I also saw lots of evidence of researchers who had sat down with Data, asked her some questions, and got some answers they didn't expect. Rather than ignoring her, or turning the screws to get her to change her tune, they listened carefully to what Data had to say. The mark of a suberb scientist, I think, is knowing the line between interrogation and torture - and figuring out when the answer to a research question should be believed, when it should be ignored, and when one needs to change one's own understanding of the world, especially when the answers contradict what we had hoped to hear.
There is an opposing problem as well - very often we get an answer that is so in-line with our pre-conceptions that we run off to publish without taking the time to check and re-check whether that answer is valid. In other words, our interrogation techniques need not be harsh, but we do need due diligence.

I wish I could say that I never torture Data, that when she speaks, I listen. But the reality is that I often have a very strong pre-conception of what Data should say, and when I don't get the answer I want to hear, my first reaction is to wonder - did I hear her correctly? (i.e. was there mis-coding, or a programming error that transposed the unexpected answer for her true response). My second reaction is that maybe she mis-understood what I meant to ask, so I ask the question again using different phrasing (use a linear rather than a logistic model; re-classify the exposure cut-points or the outcome characterization; include a different set of control variables, etc.). These methods are usually not torture - they are reasonable reactions to past experiences where I have made programming errors, where classification matters a great deal, where omission of a key control variable does result in mis-leading results. But crossing the line to torture at this second stage is far too easy to justify, especially when I have a lot invested (in reputation, world-view, justifying how grant money was spent, etc.) in getting the answers I want to hear. It is easy at this stage to try out a variety of techniques to transform the answer I don't want to hear into one that I do.

My third reaction to pesky Data is to say she's wak. Maybe Data got high before being dragged into the interrogation room and is just giving weird answers to satisfy her own impenetrable sense of humor. Or in other words, are there sampling errors, and/or systematic biases in the data that generate unreliable results?
It is only after many attempts, in many ways, to discount results I don't want to hear, that I take seriously the idea that I may have the wrong idea, that there is a completely different narrative that Data wants to tell. I will have wondered from the start what might explain contrary findings, but I won't replace my pre-conceptions until I'm utterly convinced that I've gotten it wrong. And I think that's the right approach - usually I do ask the wrong question, or if I ask the right question, I might well ask a dataset that is not well-equipped to give the right answer. But every once in a while, I listen carefully, and I hear a story that's much more interesting than the one I had in my head from the beginning. And those stories I don't want to hear - turns out they have happy endings too.

OK, I'll admit it, that last line is pure schmaltz. You got a better way to wrap this ramble up with a tidy bow?

No comments:

Post a Comment