Monday, April 11, 2011

TBLG Data Geek Nirvana is Coming - Are We Ready?

The year is 2015. An unprecedented amount of data is now available about the health, socioeconomic conditions, and familial relationships of bisexual-, lesbian-, and gay-identified Americans. We now have the opportunity to describe these populations in great detail.

Routine questions intended to allow people to identify as trans have been tested and deployed in a number of surveys, and our ability to describe the trans-identified populations of the United States is light-years ahead of where we were in the dark ages of 2011 when a tenuous estimate of the size of the trans-identified population was first reported from population-based surveys.

Although a fairly wide variety of national surveys had collected sexual orientation data since the mid-1990's, it was a haphazard process in which surveys would add a question on sexual orientation largely based on lobbying by lesbian and gay researchers with strong personal relationships with survey administrators. Not only that, but the idiosyncratic method meant that there was little, if any, co-ordination between surveys, and each one asked about sexual orientation in it's own slightly peculiar way, leading to much methodological navel-gazing, and debates about which results were different because of question wording vs. which differences were the result of population differences, or some other factor.

Then, in 2011, given a push from a prestigious Institute of Medicine report, the ground shifted.

Rather than the default position being that one had to justify adding questions about sexual orientation to a skeptical research committee, survey administrators would now feel pressure to justify _not_ including sexual orientation questions, and have to get creative about how to reliably assess transgender identity in general population studies.

Furthermore, a growing consensus about the precise wording used to assess sexual orientation and gender identity had developed, leading to a much greater ability to compare results across studies, and even to combine multiple studies together in order to overcome the problem of not having sufficient numbers of sexual and gender minority individuals, and also to enable analyses of ever more tightly defined sub-populations - such as elderly Asian-American lesbian and bisexual women, bisexuals in their 20's and those in their 50's, heterosexually-identified men who profess same-sex attraction but have not had sex with another man, and so on.

Not only that, but the National Institutes of Health had made substantial investments in supporting analyses of this data bonanza, and had also invested in programs designed to provide up-and-coming students of this field with mentoring and training to an unprecedented degree.

In short, TBLG data geek Nirvana had arrived!

Since that scenario, or one not far from it, is likely to be in our 5-10 year future, it is worth spending a bit of time getting past the salivating, and doing some serious critical thinking about some potential side effects, so that they can be anticipated and prepared for, rather than an unanticipated surprise and source of frustration.
And that's why I'm writing this piece. I'm as eager as the next TBLG researcher to get my hands on the tsunami of data headed our way. We all want to surf that wave, rather than get pummeled by it.

It's going to be great, and...
I want to think about the "and..." part for a minute.

I have been able to anticipate a few side effects of this coming wave, but I don't want to claim that I can see the future. Some of these may not happen. Undoubtedly other things I haven't considered will catch me by surprise.

The more research, the better.
Well, it depends on the research. See my previous posts for more on this. In short, it depends on what the goals of that research are. More than likely, we're going to see a huge expansion of rather thoughtless research. Specifically in terms of health research, we're going to see a lot more "health disparities" research oriented towards only adverse health disparities. We're going to see a lot of "intersectionality" research oriented towards showing cumulative disparity and adversity. Both of these approaches will miss the interesting exceptions: what health advantages do TBLG populations have? When does the assumption of cumulative adversity inherent to much intersectionality work fail to capture the points of resistance and resilience that provide opportunities for effective health promotion and community pride?

Who will be the gatekeepers?
"Nobody" is the short answer. Which we should welcome with open arms. Expanding access to data about TBLG people will be the inevitable result of this process of asking more questions. The thing to watch out for here is that people who have not in the past had any particular interest in TBLG populations will be chiming in for the first time, and making rookie mistakes in interpreting what they look for, what they find, and how they report it. The other phenomenon to wacth out for related to this is people who have had an interest in TBLG populations, but will now feel empowered to look at data across a variety of disciplines other than where they have spent most of their time to date. Of course that is to be welcomed and encouraged, but there will be bumps along the way.

A shift in prerogative
Survey administrators will feel the shift from having to justify adding sexual orientation and gender identity items to surveys to including these questions unless there is a strong reason not to. Similarly, people analyzing these surveys and studies will feel an expectation that they should at least try to see if there are differences across categories of sexual orientation.
And, when they do these exploratory analyses, a variety of possibilities may arise. They may see a disparity that it truly there and is significant in a statistical sense. Since the idea of disparity and deficit fits well with assumptions about the social order, these exploratory analyses are more likely to get published than health similarities or health advantages that are truly there and also significant in a statistical aspect (technically a similarity can't be significant, but the point is that if no difference is seen in a large sample, that is evidence of little difference). Disparites that are truly there, but not significant in a statistical sense may also be commented on, and disparities that are observed as statistically significant, but in reality aren't disparities at all are also likely to be reported out as adverse health disparities. The probability that a health advantage that is not significant would be reported is, I propose, vanishingly small.
As a result, we are likely to see reports of a wide variety of health disparities, many of which will be "real", and also many of which will be spurious findings that are less likely to be replicated in other groups of people. But we won't be able to tell which is which until replication studies are done. And the way science works, debunking a falsely significant finding that jibes with the overall assumption of TBLG people being deficient can take several, even dozens of failed attempts to reproduce that finding in other populations.
Conversely, findings that go against the notion of inherent deficit (health similarities and advantages) will be viewed as tentative, and may require dozens of significant findings before being viewed as worthy of comment.

The upshot of all that is that as we shift from analyses designed to examine TBLG health from a theoretically-oriented perspective to a time when routine, but atheoretical analysis of these populations become commonplace, we should expect to see a large increase in spurious findings of TBLG deficits.

No comments:

Post a Comment