Wednesday, February 22, 2023

Proportion LGBT in 20 Probability Samples

Six surveys asked about sexual orientation and gender identity:

01) Behavioral Risk Factor Surveillance System (2014-2021; 18+; Telephone; n=1,602,144)
    LGBT:   6.05%

02) Household Pulse Surveys (2021-2023; 18+; Internet; n=1,206,436)
    LGBT:   9.74%

03) National Crime Victimization Surveys (2017-2021; 18+; Face-to-face & telephone; n=760,408)
    LGBT:   2.39%

04a) Associated Press VoteCast (2018; 18+; Internet & telephone; n=39,864)
    LGBT:   7.73%

04b) Associated Press VoteCast (2020; 18+; Internet & telephone; n=34,868)
    LGBT:   9.27%

05) California Health Interview Survey (2021; 18+; Internet & telephone; n=24,441)
    LGBT: 10.95%

06) Collaborative Multi-racial Post-election Surveys (2012, 2016-2017; 18+; Internet; n=12,660)
    LGBT:   8.58%

Fifteen more surveys asked about sexual orientation, but not gender identity:

07) National Health Interview Survey (2014-2021; 18+; Face-to-face & telephone; n=240,719)
    LGB:   3.93%

08) National Drug Use and Health Surveys (2015-2020; 18+; Face-to-face; n=236,145)
    LGB:   5.21%

09) New York City Community Health Surveys (2001-2020; 18+; Telephone; n=155,714)
    LGB:   4.64%

10) Health Reform Monitoring Surveys (2013-2020; 18-64; Internet; n=147,203)
    LGB:   7.50%

11) National Adult Tobacco Surveys (2012-2014; 18+; Telephone; n=120,017)
    LGB:   4.22%

12) Canadian Community Health Surveys (2017-2018; 15+; Face-to-face & telephone; n=103,217)
    LGB:   3.35%

13) National Survey of Family Growth (2011-2019; 15-49; Face-to-face; n=41,174)
    LGB:   6.94%

14a) Population Assessment of Tobacco and Health, Wave 1 (2011; 18+; Face-to-face; n=31,515)
    LGB:   4.92%

14b) Population Assessment of Tobacco and Health, Wave 4 (2016; 18+; Face-to-face; n=33,415)
    LGB:   8.68%

15) Well-Being and Basic Needs Surveys (2017-2020; 18-64; Internet; n=27,449)
    LGB:   7.50%

16) National Health and Nutrition Examination Surveys (2001-2016; 18-64; Face-to-face; n=25,529)
    LGB:   5.38%

17) General Social Surveys (2008, 2010, 2012, 2014, 2016, 2018, 2021; 18+; Face-to-face, Internet & telephone; n=12,815)
    LGB:   4.72%

18) American National Election surveys (2016, 2020; 18+; Internet, face-to-face, video call & telephone; n=9,254)
    LGB:   6.57%

19a) Supplementary Empirical Teaching Units in Political Science (2016; 18+; Telephone, Internet & video call; n=3,464)
    LGB:   6.26%

19b) Supplementary Empirical Teaching Units in Political Science (2020; 18+; Face-to-face & Internet; n=7,089)
    LGB:   6.99%

20) National Social Life, Health, and Aging Project (2015-2016; 50+; Face-to-face; n=3,392)
    LGB:   2.41%

Thursday, February 2, 2023

Queer and Trans Representation in Research

WHAT IS REPRESENTATION?

    A common refrain in research on LGBT populations is that we are underrepresented in research. In many ways, that is undoubtedly true. Many data collection systems do not include items on sexual orientation, and even fewer include gender identity. And many of those that do are small enough that there is not enough of a queer/trans population to provide reliable estimates. Arguably funding for research on (not often enough with) sexual and gender minority populations is decades overdue and falls short of the mark. And publications about sexual and gender minority comprise a tiny fraction of the published scientific literature.

    And yet. The number of research datasets with reliable information on sexual orientation and gender identity has expanded rapidly, first in large survey datasets, more recently in infectious disease tracking systems, and coming soon in medical records datasets and large administrative databases. Funding has increased dramatically in recent years, and there are now established journals dedicated to sexual and gender minority population research.

    In a more limited sense of 'representation', we really don't know the degree to which sexual and gender minority populations are represented in these datasets - in other words, how much more or less likely are these populations to be included in research? With age, race/ethnicity, and geography, we can use Census records to compare the distribution of people included in a study with respect to their expected distribution in the population, and "sex" in a broad sense, although this breaks down when considering gender identity. When we know these distributions, we can re-weight the analytic dataset to reflect the population at large.

    But, with sexual and gender minority populations, there is no Census standard - in fact, these surveys themselves are the closest thing we have to a standard. But there is considerable variability from one survey to another in terms of the proportion of people identifying as sexual and gender minorites, as well as variation in the questions asked - and responses options offered.


DATA SOURCES

    In a series of posts I plan to explore here, I'll be looking at representation in this narrow sense (likelihood of responding to an invitation to engage in survey research) across a wide range of large probability surveys in the US, namely:

    Behavioral Risk Factor Surveillance System (2014-2021)

    Household Pulse Survey (2021-2023)

    National Health Interview Survey (2013-2021)

    National Health and Nutrition Examination Survey (1999-2019)

    National Survey of Drug Use and Health (2015-2020)

    National Survey of Family Growth (2011-2019)

    National Adult Tobacco Survey (2012-2014)

    Population Assessment of Tobacco and Health (2011, 2016)

    California Heath Interview Survey (2021)

    New York City Community Health Survey (2003-2020)

    National Crime Victimization Survey (2017-2021)

    Health Reform Monitoring Surveys (2013-2020)

    Well-Being and Basic Needs Survey (2017-2020)

    General Social Survey (2008, 2010, 2012, 2014, 2016, 2018, 2021)

    American National Election Surveys (2016, 2020)

    Collaborative Multiracial Post-election Surveys (2012, 2016, 2017)

    Associated Press VoteCast (2018)

    These 17 large surveys with public use data reflect a broad range of sampling strategies (random telephone dial, internet recruitment from Census lists, panels recruited by established survey firms, quite a few in-person interviews based on physical addresses, and one using televideo interviews), on a variety of topics (puplic opinion polling, health surveys, crime), using a variety of question wording and response options. They are heavily weighted towards recent years, but there are some going back decades. Is there a dataset you think I've overlooked? Let me know!


MEASURING REPRESENTATION

    If there are no Census data (or other standard) for the distribution of sexual and gender identity data, how do I propose to look at relative representation of these groups in these research datasets? Indirectly. I plan to use measures that are fairly inuitive correlates of research participation, as determined in prior research.

    One of the most intuitive is how many attempts it took to get a successful interview. Presumably, people who answer the call to participate immediately are "easy" interviews, and those who take 20-50 attempts to connect with are less eager to participate. So, we can look at the distribution of how many contact attmepts were made to connect for an interview as a proxy for eagerness to participate. Alas, this measure is only reported publically in 2 of the above studies.

    Another fairly intuitive proxy is how likely a respondent is to complete the interview once started. Presumably people who hang on to the end of an interview are more invested in the research endeavor than those who break off after a short period. This measure is available for 3 of the above surveys - many of these surveys only report out complete interviews (or impute values for missing data) so that there are no "short" interviews to compare to. For others, the sexual orientation and/or gender identity items are asked late enough in the interview that there is not information on these items among those who cut the interview short.

    A measure that seems to make sense (but may be less useful than it appears) is the weight assigned to a respondent. If the weighting system the survey is using works well enough, respondents who are harder to reach will have a higher weight, and those easy to reach will have a lower weight. The factors that go into these weights typically include sex, race/ethnicity, age, geography, how many phones the respondent uses and how many people could answer the phone, and interactions betweeen these factors. So, to the degree that how much more or less likely sexual and gender minority people are to respond because of these factors, the relative weighting could be informative. But, to the extent that sexual and gender minority respondents elect to engage with researchers is related to being LGTBQ over and above those delineated factors that go into the weighting, the relative weights will fail to reflect participation.

    A less intuitive measure is called the "fraction of missing information", or how many items the respondent used a "don't know" or "not sure" response, or declined to answer. Presumably, a person who declines to answer a larger proportion of questions is less invested in the research than a person who answers everything. Of course, there are many reasons to leave questions blank or say "don't know" that have nothing to do with eagerness to participate. And I have to be careful to distinguish between questions left blank because the respondent heard it and didn't answer vs. the question was skipped on purpose, or skipped because the interview already ended. Another difficulty with this measure is that an awful lot of people answer every question, even when they truly don't know or aren't sure, so the median number of blank items is 0, a rough distribution to work with from a statistical perspective. On the plus side, these measures are available for all the studies above, except one that imputes missing and don't know values to "known" values before releasing the public use datatset.

    I've gone ahead and split these missing information measures into two categories: one based on demographic items (race/ethnicity, marital status, educational attainment, employment status, income, household composition, citizenship, language), and another based on all other items on the survey, which I've called 'substantive items' for lack of a better generic term for "everything else", whether that be a history of cancer to presidential candidate preference.


STATEMENT OF EXPECTED HYPOTHESES

    What do I expect to see in all this? It's hard to say for sure, which is what makes it especially interesting to me. I do expect to see heterogeneity. I expect to see greater participation from sexual and gender minority populations from Internet-based recruiting than telephone, for instance. I expect to see greater participation from gay men than lesbian women, and greater still than bisexual men and women; from cisgender people than transgender. Overall, I think that LGBT people will probably be somewhat more likley to participate in research, but if I had to guess, I'd say the difference is probably pretty small, compared to differences in participation related to age, race/ethnicity and sex. I suspect that the variation between sexual and gender minority groups will be greater than the difference between LGBT people as a whole and cisgender heterosexual adults.

    I would say I don't have a strong expectation about participation between transfeminine and transmasculine people. I don't have as solid a foundation of experience to draw from. I'm also not sure about what to expect about younger or older LGBT people relative to younger or older cisgender heterosexuals, or about LGBT people belonging to minoritized racial/ethnic groups relative to cisgender heterosexual non-Hispanic Whites.


WHAT YOU SHOUOLD EXPECT

    Over the next weeks to months, I plan to post a variety of analyses here related to this topic. Expect to see analyses based on one survey at a time. Expect to see an analysis of the same quetion or proxy outcome across multiple surveys. Expect to see analyses of missing data due to particular items across multiple surveys and populations. Expect to see analyses looking at trends over time, differences across survey methodologies, differences with respect to survey topics (drug use, general health, crime victimization, politics). In other words, this topic is too big (at least in my mind) for synthesis into a single paper for publication. I want to explore it with you and figure out along the way what the paper(s) within the topic are to pursue for publication in a more formal setting.