Tuesday, March 19, 2019

Don Juan and Other Bad Data in Mark Regnerus' New Family Structures Study

Mark Regnerus' New Family Structures Study administered a survey to a large, random sample of American young adults to study the children of parents who have had same-sex relationships as compared with other children in other kinds of family structure.  If you look at Regnerus' databook, you will discover that many of his respondents are extraordinary people.

For example, when the men were asked how many female sexual partners they had had over the previous 12 months, one man reported 785 (Regnerus 2012b, 1370)!  Wow!  There's a real Don Juan.  And he was not the only one.  20 of the men reported that they had had sexual relationships over their life with over 100 women--"100+" was the highest number in the survey questionnaire.  16 women reported that they had had sexual relationships with over 100 men (1368).  Asked how often they had had sex in the past 2 weeks, 15 respondents reported more than 30 times; 5 had had it 45-50 times; and one person had had sex 75 times in 2 weeks (1382).  (I reported this to my wife, but she was remarkably skeptical.)

Isn't it amazing that these folks can keep such an accurate count of their couplings?  Mozart's Don Giovanni had Leporello keeping a catalogue of the Don's conquests--1,003 in Spain alone!  Do these people have someone keeping a record for them?

Moreover, these folks started their sex lives really early.  They were asked: "How old were you (in years) the first time you ever had vaginal intercourse?"  10 of them answered 0 (1365)!  They started in their mother's womb!

The respondents to Regnerus' survey were unusual in other ways besides their sexual prowess.  One man reported that he was 7 feet 10 inches tall (1290).  They were asked: "How old were you (in years) the last time you were arrested?"  3 people said 1.  4 people said 0 (1338).  Maybe, these 4 were arrested before they were born for their sex crimes in the womb.

There is another explanation that might have occurred to you.  Is it possible that some of these respondents were jokesters who amused themselves by making up ridiculous answers?  Regnerus assures us that this cannot be the case, because "the data collection was conducted by Knowledge Networks (or KN), a research firm with a very strong record of generating high-quality data for academic projects" (Regnerus 2012a, 756).

High-quality data?  When Regnerus' paper was first published in Social Science Research in June of 2012, not even his critics questioned the quality of his data, although they did question his analysis of the data.  But, then, a few months later, Darren Sherkat reported that his study of Regnerus' databook revealed some "unlikely responses" (Sherkat 2012, 1348).  Three years later, Cheng and Powell (2015, 619-20) identified some "unreliable cases in his data" that might be the work of some "mischievous jokesters."

Regnerus has not commented on the dubious methods of Knowledge Networks in conducting online surveys.  Knowledge Networks is a marketing survey business that was acquired in 2011 by GfK, Germany's largest market research company.  Knowledge Networks recruits people for its "KnowledgePanel" by mailing envelopes to the homes of consumers telling them that they have been randomly selected to fill out online market research surveys.  The letter contains $2.  And they are promised compensation for their work.  If they do not have a computer or internet access, this is provided to them by the company for as long as they fill out surveys.  For every survey completed, the participants receive "points" that can be redeemed for a spin of a "Prize Wheel" or other contests, small gift cards, or a check for cash.  Some panel members have complained that the Prize Wheel is a scam, because it is programmed so that the spinner never lands on the Grand Prize.  One panel member who broke into the source code for the web page discovered this, and saw that after 1,000 spins, he never landed on the Grand Prize.

Most panel members work for cash.  But it takes over 8 hours to earn a $25 check.  Panel members generate about 1,000 points (worth $1) for each survey.  And each survey takes at least 20 minutes to complete.  So they're making about $3 per hour.

Needless to say, there is little incentive for panel members to work hard and think carefully as they fill out their online surveys.  Presumably, this often becomes boring, and they rush through their work; or they decide to amuse themselves by making up fraudulent answers to the questions.

There is no procedure at Knowledge Networks for "cleaning up the data" by throwing out answers that look suspicious.  And clearly Regnerus did not attempt to remove the bad data, because he was confident about Knowledge Network's "strong record of generating high-quality data."

Unfortunately, much of the social science research reported these days is based on this kind of data collection through online surveys with no attempt to identify fraudulent data.

Regnerus (2015) has written a response to Cheng and Powell's paper in which he admits that there were some "questionable cases" in his data.  He writes:
"To their credit, the authors helpfully pointed out a handful of cases that were questionable--respondents whose unlikely answers to other questions (like height, weight, etc.) suggest they weren't being honest survey-takers.  Such a critique is certainly fair and welcome; it's part of the long-term process of cleaning and clarification in any dataset of substantial size.  And removing those questionable cases actually strengthened my original analytic conclusions--and the authors say so. . . ."
". . . And while I welcome the documentation and removal of a handful of odd cases, it's a very different thing to suggest that the many respondents who report that they lived with their 'lesbian mother' or 'gay father' for a year or less are suspect cases, or 'misclassified.'  They are what they are . . . ."
"Social science was never going to save marriage's male-female infrastructure.  I never presumed it could or would.  What it can do--and that's what I will always love about it--is reveal what is going on. . . ."
Once we throw out "a handful of cases that were questionable," Regnerus suggests, we can rely on all of the other cases in his data to reveal to us what is going on.   So, for example, when the respondents report that they lived with their "lesbian mother" or "gay father" for a year or less, we know that these reports are true.  "They are what they are."

Keep in mind what we are talking about here.  The "cases" in Regnerus' data are self-reported answers to an internet questionnaire from people working for $3 an hour as members of a Knowledge Networks panel.  There was no attempt by anyone to check their answers to see if they were honest, unbiased, and accurate.

One of the most fundamental problems with all survey research--and particularly internet survey research--is "self-reporting bias": people answering survey questions cannot be trusted to be honest, unbiased, and accurate.  We can never know whether respondents' answers correspond to their actual behavior, to the behavior of others, or to their true beliefs.  For that reason, some researchers argue that reliable data in social research can only come from direct observation, experimentation, and multiple sources of data (Beam 2017).

Regnerus' silence about self-reporting bias suggests that he does not see this as a problem.  Is he suggesting that except for "a handful of odd cases," most self-reporting respondents to internet surveys are completely honest, unbiased, and accurate in their answers?  Some researchers such as Seth Stephens-Davidowitz (2017) have warned that survey research is unreliable because "Everyone lies."  Is Regnerus suggesting that no, except for "a handful of odd cases," everyone tells the truth?

Regnerus says that when a young adult reports having lived with a "lesbian mother" or "gay father" for a certain period of time, we can know this to be a factual truth.  Actually, the question in his internet survey questionnaire was this: "From when you were born until age 18 (or until you left home to be on your own), did either of your parents ever have a romantic relationship with someone of the same sex?"  The respondents were offered three possible answers: "(1) Yes, my mother had a romantic relationship with another woman.  (2) Yes, my father had a romantic relationship with another man.  (3) No."  Those who answered (1) were immediately classified by Regnerus as living in a "lesbian mother" (LM) household, while those who answered (2) were classified as living in a "gay father" (GF) household.

Notice that the respondents were not asked whether their parents were "lesbians" or "gay."  Rather, they were asked whether their parents had ever had a same-sex "romantic relationship."  What exactly is a same-sex "romantic relationship"?  And is this enough to identify a parent as "lesbian" or "gay"?  Moreover, can we rely on these young adults to answer this question accurately?  How reliable are their childhood memories?  Is it possible that their memory of something like this could be distorted?  Is it possible that their answers are not honest?  We could check their answers by interviewing their parents or others who knew them and by observational evidence of their behavior.  But, apparently, Regnerus believes that is unnecessary because the answers of the children are reliably telling us "what is going on."

One of the questions on Regnerus' survey was "Did you vote in the last presidential election?"  Political scientists studying voting behavior have discovered that many people cannot answer this question honestly and accurately, because if you check the voting records, you will see that many people who report having voted did not really vote.  They are either lying, or their memories are inaccurate.  Regnerus did not check the voting records for his respondents, because he apparently assumes that their answers on internet surveys are reliable.

Regnerus also asked questions about church attendance, criminal records, sexual behavior, educational achievement, and physical and mental health.  Other researchers have found that self-reported answers to such questions are often unreliable, and therefore they need to be checked against other data.  Regnerus seems to disagree with this because, again, he thinks that when people answer questions on an internet survey they are telling us "what is going on."  Is that plausible?


Beam, George. 2017. The Problem with Survey Research. New York: Routledge.

Cheng, Simon, and Brian Powell. 2015. "Measurement, Methods, and Divergent Patterns: Reassing the Effects of Same-Sex Parents." Social Science Research 52: 615-26.

Regnerus, Mark. 2012a. "How Different Are the Adult Children of Parents Who Have Same-Sex Relationships? Findings from the New Family Structures Study." Social Science Research 41: 752-70.

Regnerus, Mark. 2012b. "New Family Structures Study (ICPSR 34392)." Inter-university Consortium for Political and Social Research.  Ann Arbor, Michigan.  Available online.

Regnerus, Mark. 2015. "Making Differences Disappear: The Evolution of Science on Same-Sex Households." The Public Discourse, May 12, 2015.  Available online.

Sherkat, Darren E. 2012. "The Editorial Process and Politicized Scholarship: Monday Morning Editorial Quarterbacking and a Call for Scientific Vigilance." Social Science Research 41: 1346-1349.

Stephens-Davidowitz, Seth. 2017. Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are. New York: Dey Street Books.

No comments: