RECKONING WITH THE PAST
Sometimes I wonder if I should be fixing myself more to drink.
No, this is not going to be an optimistic post.
If you want bubbles and sunshine, please see my friend Simine Vazire’s post on why she is feeling optimistic about things. If you want nuance and balance, see my co-moderator Alison Ledgerwood’s new blog*. Instead, if you will allow me, I want to wallow.
如果你想要泡沫和阳光，我朋友Simine Vazire的文章会告诉你为什么她如此积极乐观。如果你想要情绪间的微妙平衡，看我同僚Alison Ledgerwood的新博客。而我，只想好好吐槽一番。
I have so many feelings about the situation we’re in, and sometimes the weight of it all breaks my heart. I know I’m being intemperate, not thinking clearly, but I feel that it is only when we feel badly, when we acknowledge and, yes, grieve for yesterday, that we can allow for a better tomorrow. I want a better tomorrow, I want social psychology to change. But, the only way we can really change is if we reckon with our past, coming clean that we erred; and erred badly.
To be clear: I am in love with social psychology. I am writing here because I am still in love with social psychology. Yet, I am dismayed that so many of us are dismissing or justifying all those small (and not so small) signs that things are just not right, that things are not what they seem. “Carry-on, folks, nothing to see here,” is what some of us seem to be saying.
Our problems are not small and they will not be remedied by small fixes. Our problems are systemic and they are at the core of how we conduct our science. My eyes were first opened to this possibility when I read Simmons, Nelson, and Simonsohn’s paper during what seems like a different, more innocent time.
我们的问题不小，想轻易补救谈何容易。我们的问题是系统性的，而且密切关系到我们如何进行科研。我起初发现有可能出了问题是在我读了 Simmons, Nelson, 和Simonsohn合著的论文之后，那时情况看起来和如今还有所不同，还是一个更纯真的年代。【编注：该论文发表于2011年】
This paper details how small, seemingly innocuous, and previously encouraged data-analysis decisions could allow for anything to be presented as statistically significant. That is, flexibility in data collection and analysis could make even impossible effects seem possible and significant.
What is worse, Andrew Gelman made clear that a researcher need not actively p-hack their data to reach erroneous conclusions. It turns out such biases in data analyses might not be conscious, that researchers might not even be aware of how their data-contingent decisions are warping the conclusions they reach. This is flat-out scary: Even honest researchers with the highest of integrity might be reaching erroneous conclusions at an alarming rate.
Third, is the problem of publication bias. As a field, we tend only to publish significant results. This could be because as authors we choose to focus on these; or, more likely, because reviewers, editors, and journals force us to focus on these and to ignore nulls.
This creates the infamous file drawer that altogether warps the research landscape. Because it is unclear how large the file drawer is for any research literature, it is hard to determine how large or small any effect is, if it exists at all.
I think these three ideas—that data flexibility can lead to a raft of false positives, that this process might occur without researchers themselves being aware, and the unknown size of the file drawer—explains why so many of our cherished results can’t replicate. These three ideas suggest we might have been fooling ourselves into thinking we were chasing things that are real and robust, when we were pursuing neither.
As someone who has been doing research for nearly twenty years, I now can’t help but wonder if the topics I chose to study are in fact real and robust. Have I been chasing puffs of smoke for all these years?
I have spent nearly a decade working on the concept of ego depletion, including work that is critical of the model used to explain the phenomenon. I have been rewarded for this work, and I am convinced that the main reason I get any invitations to speak at colloquia and brown-bags these days is because of this work.
The problem is that ego depletion might not even be a thing. By now, many people are aware that a massive replication attempt of the basic ego depletion effect involving over 2,000 participants found nothing, nada, zip. Only three of the 24 participating labs found a significant effect, but even then, one of these found a significant result in the wrong direction!
There is a lot more to this registered replication than the main headline, and there is still so much evidence indicating fatigue is a real phenomenon. I promise to get to these thoughts in a later post, once the paper is finally published. But for now, we are left with a sobering question: If a large sample pre-registered study found absolutely nothing, how has the ego depletion effect been replicated and extended hundreds and hundreds of times? More sobering still: What other phenomena, which we now consider obviously real and true, will be revealed to be just as fragile?
As I said, I’m in a dark place. I feel like the ground is moving from underneath me and I no longer know what is real and what is not.
I edited an entire book on stereotype threat, I have signed my name to an amicus brief to the Supreme Court of the United States citing stereotype threat, yet now I am not as certain as I once was about the robustness of the effect. I feel like a traitor for having just written that; like, I’ve disrespected my parents, a no no according to Commandment number 5.
But, a meta-analysis published just last year suggests that stereotype threat, at least for some populations and under some conditions, might not be so robust after all. P-curving some of the original papers is also not comforting.
Now, stereotype threat is a politically charged topic and there is a lot of evidence supporting it. That said, I think a lot more pain-staking work needs to be done on basic replications, and until then, I would be lying if I said that doubts have not crept in. Rumor has it that a RRR of stereotype threat is in the works.
To be fair, this is not social psychology’s problem alone. Many other allied areas in psychology might be similarly fraught and I look forward to these other areas scrutinizing their own work—areas like developmental, clinical, industrial/organizational, consumer behavior, organizational behavior, and so on, need an RPP project or Many Labs of their own. Other areas of science face similar problems too.
公正地说，不止是社会心理学领域存在此问题。心理学中的许多其它类似领域可能同样受影响，我希望这些领域中的研究工作被仔细检验，如进化的、临床的、产业的/组织的、消费行为的、组织行为的心理学等等，都需要一个研究参与池项目【译注：RPP，Research Participation Pool，是一个协调管理研究参与对象的项目】或者“多重实验室”项目【译注：多重实验室项目，Many Labs Project是一个旨在对心理科学多种效应进行可重复性验证的项目】。其他领域的科学研究同样面临类似问题。
During my dark moments, I feel like social psychology needs a redo, a fresh start. Where to begin, though? What am I mostly certain about and where can my skepticism end? I feel like there are legitimate things we have learned, but how do we separate wheat from chaff? Do we need to go back and meticulously replicate everything in the past? Or do we use those bias tests Joe Hilgard is so sick and tired of to point us in the right direction? What should I stop teaching to my undergraduates? I don’t have answers to any of these questions.
This blogpost is not going to end on a sunny note. Our problems are real and they run deep. Okay, I do have some hope: I legitimately think our problems are solvable. I think the calls for more statistical power, greater transparency surrounding null results, and more confirmatory studies can save us. What is not helping is the lack of acknowledgement about the severity of our problems. What is not helping is a reluctance to dig into our past and ask what needs revisiting.
Time is nigh to reckon with our past. Our future just might depend on it.
*In case you haven’t heard, Alison started a wonderful Facebook discussion group that I have the privilege of co-moderating. If you’re tired of bickering and incivility, but still want a place to discuss ideas, PsychMAP just might be for for you.