Tuesday, March 28, 2023

Interview Completion: Are Sexual and Gender Minority People More or Less Likely to Engage in Research?

This post is part of a series about engagement of sexual and gender minority populations in survey research. For an overview, start here.

One of the most direct measures of research participation I'm looking at is interview completion, getting to the end of the survey. Certainly there are many reasons for cutting an interview short; some of these are more closely related to a lack of interest in engaging in the research effort than others, but on the whole, the more a respondent is engaged, the more likley they are to get to the end of the survey.

Alas, out of the 25 surveys I have tabulated so far, only 3 report whether the interview was completed (the National Health Interview Survey (NHIS), the Health Information National Trends Survey (HINTS), and the American National Election Survey (ANES). I also created  mwasures of interview completion for 2 more surveys (the Behavioral Risk Factor Surveillance System (BRFSS) and the Household Pulse Survey (HPS)) by looking at patterns of missing values―if all values sequentially after a given point in the interview are missing, then I code that as an interview termination. Developing those measures is time-consuming, and I doubt I'll do it for any others.

So, here's what I found in these 5 surveys, from most to fewest nummber of respondents:

Household Pulse Survey (waves 34-55, 2021-2023, Internet survey)
Overall, LGBT interview completion was a bit lower than among cisgender heterosexuals, lower among transgender respondents, perhaps higher for gay cismen, and lower for cismen of "another" sexual orientation. These are raw (weighted) percents, so to get an estimate of relative likelihood of interview completion after adjusting for respondent age, state or residence, and time trend, I estimated a logistic model to get adjusted odds ratios fro interview completion:
After adjustment, interview completion was actually higher for LGBT people averaged together (on this chart, 1.0 means equally likely as the comparison group), for cisgender sexual minority women, cisgender gay and bisexual men, and about the same as cisgender heterosexuals  for transgender respondents.


Behavioral Risk Factor Surveillance Survey (2014-2021, Telephone survey, restricted to states where SOGI items were asked in the demographics section)
In the crude rates, LGBT respondents were just about as likely to complete the interview as cisgender heterosexuals. Transgender people somewhat less likely, and as in HPS, gay and bisexual cismen more likely to complete the interview, while cismen of another seuxal orientation were less likely to complete the interview.
Again, I did a logistic model to adjust for respondent age and state of residence, and time trend.
After adjustment, LGBT people were slightly more likely to complete interviews, transgender people less likely to, gay and bi cismen more likely, and cismen of another seuxal orientation were less likely to complete interviews.


National Health Interview Survey (2014-2021, Face-to-face interviews)
Although I was able to combine several years of data here, the sample size of NHIS is considerably smaller than HPS or BRFSS, so the comparison between sexual minority adults and heterosexuals is robust, getting into some of the subgroups gets harder to interpret. NHIS did not collect gender identity, but they did identify people who said they weren't sure about their seuxal orientation.
Overall, sexual minority respondents were about as likely to complete interviews, and it looks like the questioning groups may have been less likely to complete the interview.
Again, a logistic model, adjusted for respondent age, region of residence, and time trend:
Overall, the relative likelihood of interview completion for sexual minority respondents was slightly higher, in the same range as the two larger surveys above. Subgroups are too small to interpret here.


Health Information for National Trends Survey (2017-2020, Internet & Mail)
HINTS is a very rich survey, lots of in-depth information about experience with cancer and beliefs about cancer prevention. However, with about 15,000 respondents after pooling 4 annual surveys, the sample is just too small to say anything confidently about sexual minority respondents relative to heterosexuals. Also, the reported interview completion rate is very high, which is great, but it also means there's not a lot of variation to look at from a statistical perspective.
I'm hesitant even to show model results because of this, but for the sake of completeness, here they are:



American National Election Survey (2016, 2020, mostly Internet, some face-to-face, televideo, and telephone interviews)
Really nothing to say about this survey, given that it is shy of 10,000 respondents, and again with such a high interview completion rate that there isn't much statistical variation to play with.


All 5 Surveys Together
The main value of looking at these 5 surveys, with different methodologies, covering different subject matter, and over (somewhat) different time frames is being able to look at them all together. Here are the results of the five logistic models for LGB(T) populations compared to cisgender heterosexuals, all on the same scale:
With only 5 surveys, it doesn't make sense to do a formal meta-analysis, especially given that they surveys are really quite different from one another. Nonetheless, it is reassuring to see that the three largest studies have relative completion rates that are compatible with one another (the 2 smaller studies are also compatible with these, but also compatible with such a wide range of alternate possibilities that they are simply not informative).
It may come as a surprise to some readers that LGBT people are, at least in terms of interview completion, more likely to engage in research, and thus perhaps slightly over-represented in research datasets.






Wednesday, February 22, 2023

Proportion LGBT in 20 Probability Samples

Six surveys asked about sexual orientation and gender identity:

01) Behavioral Risk Factor Surveillance System (2014-2021; 18+; Telephone; n=1,602,144)
    LGBT:   6.05%

02) Household Pulse Surveys (2021-2023; 18+; Internet; n=1,206,436)
    LGBT:   9.74%

03) National Crime Victimization Surveys (2017-2021; 18+; Face-to-face & telephone; n=760,408)
    LGBT:   2.39%

04a) Associated Press VoteCast (2018; 18+; Internet & telephone; n=39,864)
    LGBT:   7.73%

04b) Associated Press VoteCast (2020; 18+; Internet & telephone; n=34,868)
    LGBT:   9.27%

05) California Health Interview Survey (2021; 18+; Internet & telephone; n=24,441)
    LGBT: 10.95%

06) Collaborative Multi-racial Post-election Surveys (2012, 2016-2017; 18+; Internet; n=12,660)
    LGBT:   8.58%

Fifteen more surveys asked about sexual orientation, but not gender identity:

07) National Health Interview Survey (2014-2021; 18+; Face-to-face & telephone; n=240,719)
    LGB:   3.93%

08) National Drug Use and Health Surveys (2015-2020; 18+; Face-to-face; n=236,145)
    LGB:   5.21%

09) New York City Community Health Surveys (2001-2020; 18+; Telephone; n=155,714)
    LGB:   4.64%

10) Health Reform Monitoring Surveys (2013-2020; 18-64; Internet; n=147,203)
    LGB:   7.50%

11) National Adult Tobacco Surveys (2012-2014; 18+; Telephone; n=120,017)
    LGB:   4.22%

12) Canadian Community Health Surveys (2017-2018; 15+; Face-to-face & telephone; n=103,217)
    LGB:   3.35%

13) National Survey of Family Growth (2011-2019; 15-49; Face-to-face; n=41,174)
    LGB:   6.94%

14a) Population Assessment of Tobacco and Health, Wave 1 (2011; 18+; Face-to-face; n=31,515)
    LGB:   4.92%

14b) Population Assessment of Tobacco and Health, Wave 4 (2016; 18+; Face-to-face; n=33,415)
    LGB:   8.68%

15) Well-Being and Basic Needs Surveys (2017-2020; 18-64; Internet; n=27,449)
    LGB:   7.50%

16) National Health and Nutrition Examination Surveys (2001-2016; 18-64; Face-to-face; n=25,529)
    LGB:   5.38%

17) General Social Surveys (2008, 2010, 2012, 2014, 2016, 2018, 2021; 18+; Face-to-face, Internet & telephone; n=12,815)
    LGB:   4.72%

18) American National Election surveys (2016, 2020; 18+; Internet, face-to-face, video call & telephone; n=9,254)
    LGB:   6.57%

19a) Supplementary Empirical Teaching Units in Political Science (2016; 18+; Telephone, Internet & video call; n=3,464)
    LGB:   6.26%

19b) Supplementary Empirical Teaching Units in Political Science (2020; 18+; Face-to-face & Internet; n=7,089)
    LGB:   6.99%

20) National Social Life, Health, and Aging Project (2015-2016; 50+; Face-to-face; n=3,392)
    LGB:   2.41%

Thursday, February 2, 2023

Queer and Trans Representation in Research

WHAT IS REPRESENTATION?

    A common refrain in research on LGBT populations is that we are underrepresented in research. In many ways, that is undoubtedly true. Many data collection systems do not include items on sexual orientation, and even fewer include gender identity. And many of those that do are small enough that there is not enough of a queer/trans population to provide reliable estimates. Arguably funding for research on (not often enough with) sexual and gender minority populations is decades overdue and falls short of the mark. And publications about sexual and gender minority comprise a tiny fraction of the published scientific literature.

    And yet. The number of research datasets with reliable information on sexual orientation and gender identity has expanded rapidly, first in large survey datasets, more recently in infectious disease tracking systems, and coming soon in medical records datasets and large administrative databases. Funding has increased dramatically in recent years, and there are now established journals dedicated to sexual and gender minority population research.

    In a more limited sense of 'representation', we really don't know the degree to which sexual and gender minority populations are represented in these datasets - in other words, how much more or less likely are these populations to be included in research? With age, race/ethnicity, and geography, we can use Census records to compare the distribution of people included in a study with respect to their expected distribution in the population, and "sex" in a broad sense, although this breaks down when considering gender identity. When we know these distributions, we can re-weight the analytic dataset to reflect the population at large.

    But, with sexual and gender minority populations, there is no Census standard - in fact, these surveys themselves are the closest thing we have to a standard. But there is considerable variability from one survey to another in terms of the proportion of people identifying as sexual and gender minorites, as well as variation in the questions asked - and responses options offered.


DATA SOURCES

    In a series of posts I plan to explore here, I'll be looking at representation in this narrow sense (likelihood of responding to an invitation to engage in survey research) across a wide range of large probability surveys in the US, namely:

    Behavioral Risk Factor Surveillance System (2014-2021)

    Household Pulse Survey (2021-2023)

    National Health Interview Survey (2013-2021)

    National Health and Nutrition Examination Survey (1999-2019)

    National Survey of Drug Use and Health (2015-2020)

    National Survey of Family Growth (2011-2019)

    National Adult Tobacco Survey (2012-2014)

    Population Assessment of Tobacco and Health (2011, 2016)

    California Heath Interview Survey (2021)

    New York City Community Health Survey (2003-2020)

    National Crime Victimization Survey (2017-2021)

    Health Reform Monitoring Surveys (2013-2020)

    Well-Being and Basic Needs Survey (2017-2020)

    General Social Survey (2008, 2010, 2012, 2014, 2016, 2018, 2021)

    American National Election Surveys (2016, 2020)

    Collaborative Multiracial Post-election Surveys (2012, 2016, 2017)

    Associated Press VoteCast (2018)

    These 17 large surveys with public use data reflect a broad range of sampling strategies (random telephone dial, internet recruitment from Census lists, panels recruited by established survey firms, quite a few in-person interviews based on physical addresses, and one using televideo interviews), on a variety of topics (puplic opinion polling, health surveys, crime), using a variety of question wording and response options. They are heavily weighted towards recent years, but there are some going back decades. Is there a dataset you think I've overlooked? Let me know!


MEASURING REPRESENTATION

    If there are no Census data (or other standard) for the distribution of sexual and gender identity data, how do I propose to look at relative representation of these groups in these research datasets? Indirectly. I plan to use measures that are fairly inuitive correlates of research participation, as determined in prior research.

    One of the most intuitive is how many attempts it took to get a successful interview. Presumably, people who answer the call to participate immediately are "easy" interviews, and those who take 20-50 attempts to connect with are less eager to participate. So, we can look at the distribution of how many contact attmepts were made to connect for an interview as a proxy for eagerness to participate. Alas, this measure is only reported publically in 2 of the above studies.

    Another fairly intuitive proxy is how likely a respondent is to complete the interview once started. Presumably people who hang on to the end of an interview are more invested in the research endeavor than those who break off after a short period. This measure is available for 3 of the above surveys - many of these surveys only report out complete interviews (or impute values for missing data) so that there are no "short" interviews to compare to. For others, the sexual orientation and/or gender identity items are asked late enough in the interview that there is not information on these items among those who cut the interview short.

    A measure that seems to make sense (but may be less useful than it appears) is the weight assigned to a respondent. If the weighting system the survey is using works well enough, respondents who are harder to reach will have a higher weight, and those easy to reach will have a lower weight. The factors that go into these weights typically include sex, race/ethnicity, age, geography, how many phones the respondent uses and how many people could answer the phone, and interactions betweeen these factors. So, to the degree that how much more or less likely sexual and gender minority people are to respond because of these factors, the relative weighting could be informative. But, to the extent that sexual and gender minority respondents elect to engage with researchers is related to being LGTBQ over and above those delineated factors that go into the weighting, the relative weights will fail to reflect participation.

    A less intuitive measure is called the "fraction of missing information", or how many items the respondent used a "don't know" or "not sure" response, or declined to answer. Presumably, a person who declines to answer a larger proportion of questions is less invested in the research than a person who answers everything. Of course, there are many reasons to leave questions blank or say "don't know" that have nothing to do with eagerness to participate. And I have to be careful to distinguish between questions left blank because the respondent heard it and didn't answer vs. the question was skipped on purpose, or skipped because the interview already ended. Another difficulty with this measure is that an awful lot of people answer every question, even when they truly don't know or aren't sure, so the median number of blank items is 0, a rough distribution to work with from a statistical perspective. On the plus side, these measures are available for all the studies above, except one that imputes missing and don't know values to "known" values before releasing the public use datatset.

    I've gone ahead and split these missing information measures into two categories: one based on demographic items (race/ethnicity, marital status, educational attainment, employment status, income, household composition, citizenship, language), and another based on all other items on the survey, which I've called 'substantive items' for lack of a better generic term for "everything else", whether that be a history of cancer to presidential candidate preference.


STATEMENT OF EXPECTED HYPOTHESES

    What do I expect to see in all this? It's hard to say for sure, which is what makes it especially interesting to me. I do expect to see heterogeneity. I expect to see greater participation from sexual and gender minority populations from Internet-based recruiting than telephone, for instance. I expect to see greater participation from gay men than lesbian women, and greater still than bisexual men and women; from cisgender people than transgender. Overall, I think that LGBT people will probably be somewhat more likley to participate in research, but if I had to guess, I'd say the difference is probably pretty small, compared to differences in participation related to age, race/ethnicity and sex. I suspect that the variation between sexual and gender minority groups will be greater than the difference between LGBT people as a whole and cisgender heterosexual adults.

    I would say I don't have a strong expectation about participation between transfeminine and transmasculine people. I don't have as solid a foundation of experience to draw from. I'm also not sure about what to expect about younger or older LGBT people relative to younger or older cisgender heterosexuals, or about LGBT people belonging to minoritized racial/ethnic groups relative to cisgender heterosexual non-Hispanic Whites.


WHAT YOU SHOUOLD EXPECT

    Over the next weeks to months, I plan to post a variety of analyses here related to this topic. Expect to see analyses based on one survey at a time. Expect to see an analysis of the same quetion or proxy outcome across multiple surveys. Expect to see analyses of missing data due to particular items across multiple surveys and populations. Expect to see analyses looking at trends over time, differences across survey methodologies, differences with respect to survey topics (drug use, general health, crime victimization, politics). In other words, this topic is too big (at least in my mind) for synthesis into a single paper for publication. I want to explore it with you and figure out along the way what the paper(s) within the topic are to pursue for publication in a more formal setting.