Thursday, September 23, 2021

Census Household Pulse Survey - Tips for Analyzing Sexual Orientation and Gender Identity

 Hooray! The US Census has finally provided estimates of the sexual and gender minority populations in the United States!

I am in the process of learning how these numbers work, and am eager to pass along what I've learned to other researchers.

As part of the "Household Pulse Survey", a weekly survey of the entire US population designed to gather vital information on the COVID-19 pandemic and related topics, the Census included items on sexual orientation and gender identity starting on week 34. As of this writing, there are 3 weeks of data to work with - a bit over 200,000 respondents, already approaching the sample size of a full year of BRFSS data! (and even more important, not preselected by which state you live in, or whether you are answering an out-of-state cell phone (see my recent article in AJPM for more detail on that)).


Comparability to BRFSS

Many of the questions are identical to those fielded in BRFSS, or can be easily transformed to a comparable format. The sexual orientation item is nearly identical, simply requiring a recode of 5="I don't know" to 7, and -99 to 9.

The sex at birth and gender identity questions are not exactly comparable, however. There are some complications that require a bit of finesse before using the gender identity variables.

The raw data files can be downloaded from: www.census.gov/programs-surveys/household-pulse-survey/datasets.html#phase3.2 .


Tip #1: Restrict to AGENID_BIRTH=2.

For both sexual orientation and gender identity, any analysis should be restricted to cases where AGENID_BIRTH=2. AGENID_BIRTH is a variable indicating whether sex at birth was imputed (1) or not (2). Census used a "hot deck" imputation technique to impute missing values for several key variables, including sex at birth (EGNEID_BIRTH) and current gender identity (GENID_DESCRIBE). When sex at birth or current gender identity are imputed, Census replaces these missing values with values from other respondents, in a (not quite) random fashion. As a result, about half of the respondents randomly assigned male at birth are assigned a current gender identity of female (and vice versa), which would indicate that they are transgender. Because sex at birth is imputed for about 3% of the total population, about 1.5% of people are unintentionally imputed to be transgender when they are in fact cisgender - a common enough occurrence that it overwhelms the population of people who are actually transgender.

The great majority of researchers who don't want to go to all the trouble of performing a full multiple imputation where these variables strongly inform one another (as opposed to being treated as nearly independent as this particular hot deck imputation technique appears to assume), should just take the simple route of restricting the analysis to AGENID_BIRTH=2.

By implication, anyone looking at sexual orientation should probably also make this restriction, especially when also looking at sex (which one should always do when looking at sexual orientation), otherwise you'll get gay men in your lesbian group, and so on. Not as large an error as for gender identity, but why use analytic groups you know are premixed in such a way as to minimize distinctions between the groups?


Tip #2: Use an expansive definition of transgender.

Don't be fooled by the simplicity of the "current gender identity" variable (GENID_DESCRIBE), which looks like it differentiates between people who are transgender and cisgender male or female (and another group "none of these" - I'm holding this group out separately because I haven't yet examined this group in detail).

But GENID_DESCRIBE is about respondent's current gender identity, and many transgender people prefer to identify as "male" or "female" rather than as "transgender". Therefore, to identify transgender people in the Household Pulse Survey, one should also look for people whose sex at birth was male and whose current gender identity is female (and vice versa).

Here is SAS code to accomplish that recode. It puts the results into a format closer to BRFSS (but where there is no "gender non-conforming" option (BRFSS=3), and "none of these" is held out as a separate category (HPS=4, recoded to 5 for convenience).

if AGENID_BIRTH=2 then do;
* Male to Female Transgender
if EGENID_BIRTH=1 and GENID_DESCRIBE in(2,3) then TRNSGNDR=1; 
* Female to Male Transgender ;
else if EGENID_BIRTH=2 and GENID_DESCRIBE in(1,3) then TRNSGNDR=2; 
* Cisgender ;
else if (EGENID_BIRTH=1 and GENID_DESCRIBE in(1,-99))
or (EGENID_BIRTH=2 and GENID_DESCRIBE in(2,-99)) then TRNSGNDR=4;
* None of these
else if GENID_DESCRIBE=4 then TRNSGNDR=5;
end;

In many surveys, this sort of recoding is not recommended, because any slip-up in coding sex at birth or current gender identity is much too likely to result in falsely identifying cisgender respondents as transgender. However, in the Household Pulse Survey (and some other surveys), there is a follow-up question to confirm when people identify as one sex at birth and a different current gender, so this data-cleaning as it happens is probably sufficient protection against miscoding.


Tip #3: Combine waves, but adjust the weights

While each weekly wave of the Household Pulse Survey is a large survey, breaking the numbers down into subpopulations (e.g. by age, state, health status, etc.) can result in some pretty unstable estimates. Combining multiple waves is a great way to combat this instability - but be warned, the weights should to be adjusted to account for the fact that each week's weights are intended to represent the whole US population. The quick and dirty way to do this is simply to divide the weights by the number of waves you are combining. For instance, I started with 3 waves, so my adjusted weights are simply generated as PWEIGHT/3. Eventually, I'll probably do something a bit more sophisticated with adjusting the weights when combining waves, particularly if the sample size starts changing dramatically from one wave to another, or the balance between state-level sampling fractions is fiddled with. I may also want to multiply some sort of "recency bias" into the weights if the outcome is one where up-to-the-minute estimation is more conceptually important (i.e. making more recent observations weigh more than distantly past ones). But all that is in the future. For now, a simple division by the number of waves concatenated is sufficient.

I have also included the "wave" identifier as a stratum in proc surveyfreq. No strong theoretical basis for doing so, but it seemed like a good idea. Very much open to suggestions from others about how to best utilize the stratum and psu specifications.

more to come...

Saturday, January 9, 2021

Vaccines. Risk Groups. Scarcity. Efficiency.

Vaccinating the high risk people first is (usually) not the most efficient way to get the greatest number of high risk people vaccinated first.

Sounds like the opposite of a tautology, whatever that is called. Let me break it down.

I'm starting from the premise that we all want to see the highest risk people (especially the oldest among us, and particularly the most vulnerable African American, Latinx & Native elders) protected from the ravages of SARS-CoV-2 infections through vaccination, as fast as possible. So, the question I'm addressing is, what is the fastest, most efficient means to meet that goal?

It may seem obvious that morality dictates that we must get the vaccine to the highest risk people first, to minimize the toll of this terrible pandemic. I agree. But there are some hidden obstacles that thinking about people and populations in term of "risk" that get in the way.

First among these is that there are inefficiencies in identifying "high risk" people, and scheduling them to come to a specific place at a specific time to get vaccinated. In the first few weeks, to maybe a month or two, this inefficiency will not be particularly apparent, so long as the supply of vaccines is a strongly limiting factor. But once the supply of vaccine outstrips our ability to administer them in a heavily calculated risk-first approach (which may already be upon us), it may serve us better to reframe our efforts around getting vaccines out efficiently, rather than focusing as heavily on who "deserves" or "needs" it most. I have to stress that I am still thinking in a social justice frame here - my goal is to get the vaccines to the most vulnerable among us as fast as possible. I understand it may not feel like that yet...

Another hidden obstacle that a risk orientation places before us is that few people like to think of themselves as being "at risk". Ironically enough, putting a lot of emphasis and importance on deciding who is at risk, and who is at highest risk sets up ways of thinking in most people's mind that are contrary to what we've been taught in public health school.

People who are told that they are "at risk" often go through a thought process like: "I'm a good, honest, careful, responsible person. Maybe I've made a mistake here or there, but I'm doing OK, and I'll be OK. Even though they are telling me I'm 'at risk', I know and do things to considerably lower my risk, and therefore there are other people out there who need the vaccine more than I do." It may be hard for public health types to hear this, but even the people you think of as being totally irresponsible and very high risk often have thoughts like that, displacing the stigma of risk even further to the margins of society than they see themselves.

A third hidden obstacle is that this is not the first time these populations have been identified as being "at risk". "At risk" usually means something on the range of unpleasant to bad news. Otherwise, we would call it "privileged" or some such verbal indicator of elevated, but infrequent, status; or maybe "normal", the blandest of accolades. When you've experienced being treated as "at risk", particularly multiple times, you come to learn that "at risk" is literally stigmatizing - elided with stereotypes of being childish, immature or dim-witted (should know better, or poor thing didn't get the right education); untrustworthy (what else are they hiding from me); even dirty. It may come as no surprise then that the very act of identifying particular populations as "at risk" feeds directly into hard-earned insights about prior treatment by healthcare and other parentalist social structures. You may interpret that as being about a "lack of trust in medicine"; I agree, just putting it in a slightly different frame to encourage my public health colleagues to see things from a different perspective. In other words, being part of a "target" population for purposes of healthcare outreach sounds different when you are a target population, in a very literal sense, on the streets of America.

Getting to a solution is even more important than laying out a polemic set of (admittedly) hypothetical concerns. I mean, we have a real-life full-blown health crisis to deal with here, and vaccines are a critical juncture in turning the tide. In my opinion, we need to shift as quickly as practicable (i.e. when the pace of vaccine supply outstrips the pace of getting it into people's arms - a point we have already reached in many states) to mass vaccination strategies. We can still encourage older folks to get to the head of the line, but we need to start getting the vaccines out in a more haphazard fashion than scheduling people and hoping they will show up on time, in the right place. Anecdotally, we are already experiencing the tragedy of tossing out otherwise perfectly good vaccine because only 60-80% of the people scheduled for vaccinations showed up in the right place at the right time, or close enough that they could get processed. We need to shift to older models, like what we did for polio and smallpox. Line 'em up and go go go.

Let history guide us. I will cite four examples where risk-oriented vaccination campaigns were less effective at reaching high risk (and particularly the highest risk) individuals with vaccines than mass vaccination approaches.

Hepatitis B vaccine is a true life-saver. And the most vulnerable population is infants, who can get it from their mother during (and after?) childbirth. Once hep B vaccines became available, CDC bent itself into knots trying to identify mothers at highest risk and getting the vaccine to them. Problem was, the highest risk populations, including Alaska Native mothers, had persistently low vaccination rates, lower than populations with a lower prevalence of hepatitis B infection, and much lower than moderate risk groups like doctors and other healthcare workers, in whom vaccination was nearly universal. How did they fix this? By abandoning the "high risk" approach, and making HBV vaccination routinely recommended for all mothers. This recommendation made it logistically easier to get the vaccine out into clinics (rather than requiring a separate visit, on a particular day, when the vaccine would be available and the correct cold chain could be guaranteed). Vaccination rates grew dramatically higher in the highest risk mothers once the vaccine was simply recommended for all children. I wrote more detail about this story 13 years ago in a post called "New camera & national vaccine strategy".

Human Papillomavirus (HPV) vaccine is now widely adopted, finally. But at first, it was targeted towards girls in their tweens and early teens. The rationale was that the vaccine would prevent against cancer due to infection with the (largely sexually transmitted) HPV virus. And since girls were at risk for the most commonly-caused HPV-related cancer, cervical cancer, it was completely logical to vaccinate girls. Turns out, this made a lot of parents think about their girls having sex. And vaccination uptake was not particularly fast, particularly in adherents to some Protestant communities. So, there was great umbrage taken in public health circles about the ignorance of this way of thinking, with a large investment in trying to think of clever ways to preempt or combat the narratives of these backwards backwoods clowns (or that's how many of us perceived this form of vaccine hesitancy). In a post from 11 years ago "HPV Vaccine for Boys?", I detail how expanding the vaccine recommendation to boys actually got more girls vaccinated.

Influenza, the pandemic that never left. These days, we've heard all kinds of parallels to the great influenza epidemic of 1917/18. While influenza killed unfathomable millions in those years, the death toll in the century since has been absolutely crippling. You probably get a flu shot every year, and still, influenza is a major killer. In part because the vaccines aren't as effective as we'd like them to be, and in great part because a lot of people still see the flu shot as optional, and really only necessary for high risk people (and remember, "high risk" almost always means "someone else", even among the riskiest populations - see above). For years, we put our efforts into identifying the most vulnerable and getting them vaccinated. That actually worked reasonably well in nursing homes, but failed to get high enough rates of vaccination in non-institutional elderly persons to keep the "great scythe of mortality" from reaping a horrific annual harvest. In recent years, the goal has been to get as many adults vaccinated as possible, through as many means as possible, rather than spending a lot of time trying to identify the highest risk adults and making special arrangements to get them vaccinated. While there's still a long way to go, we have been blessed with lower incidence of circulating influenza virus (because more people are vaccinated), and more importantly, higher vaccination rates in the most vulnerable elderly.

Long time, no smallpox. Or polio (in the Americas). The reason I bring up these vaccination campaigns is that they occurred in an earlier era of public health. An era where we weren't tempted to risk stratify the population and reach for the lowest-hanging fruit. We just went all out and vaccinated (nearly) everyone - enough people to reach a "herd immunity" so robust it resulted in eradication. Wait, why didn't we identify the highest risk people and vaccinate where it would be most "efficient"? If history documents a line of progress, we immediately think we weren't sophisticated enough to, or our understanding of the world was impaired. We just got lucky that mass vaccination turns out to have been so successful, but we know better now. Except that we eliminated smallpox and polio using a mass, undifferentiated public health approach. And now, in our more enlightened age, we use more sophisticated, targeted, efficient approaches that fail for years, often decades, to make a substantial dent in the public health problems of our day.


Thanks for bearing with me on this journey. Hope you got something out of it!