科学的退化
Scientific Regress

The problem with science is that so much of it simply isn’t. Last summer, the Open Science Collaboration announced that it had tried to replicate one hundred published psychology experiments sampled from three of the most prestigious journals in the field. Scientific claims rest on the idea that experiments repeated under nearly identical conditions ought to yield approximately the same results, but until very recently, very few had bothered to check in a systematic way whether this was actually the case.

学研究的问题在于,它们中的很大一部分其实根本不科学。去年夏天,开放科学合作组织(OSC)宣布他们曾试图重复100个选自三本行业权威杂志上的心理学实验。科学论断建基于这样一个观念:在几乎相同的条件下重复实验,其结果也应该相同。但是直到最近为止,此前几乎没有人系统性地验证是不是真的如此。

The OSC was the biggest attempt yet to check a field’s results, and the most shocking. In many cases, they had used original experimental materials, and sometimes even performed the experiments under the guidance of the original researchers. Of the studies that had originally reported positive results, an astonishing 65 percent failed to show statistical significance on replication, and many of the remainder showed greatly reduced effect sizes.

OSC小组的工作是迄今最大规模的对于心理学的验证,结果非常惊人。小组几乎采用了原初的实验材料,有些甚至在原来研究者的指导下进行实验。在所有结果阳性的研究中,竟然有65%在统计上不显著,剩下中也有很多的重复结果不如原先的显著。

Their findings made the news, and quickly became a club with which to bash the social sciences. But the problem isn’t just with psychology. There’s an unspoken rule in the pharmaceutical industry that half of all academic biomedical research will ultimately prove false, and in 2011 a group of researchers at Bayer decided to test it. Looking at sixty-seven recent drug discovery projects based on preclinical cancer biology research, they found that in more than 75 percent of cases the published data did not match up with their in-house attempts to replicate.

他们的发现上了新闻,并且很快成了用来攻击社会科学的大棒。但是问题不只是出在心理学领域。医药产业心照不宣的法则是,半数生物医学研究最终会被证明为假,而在2011年拜耳的一组研究者们决定试验一下。在研究了最近的67个基于临床前癌症生物学研究的新药计划之后,他们发现其中75%以上的实验发表的数据和他们内部重复实验的数据对不上。

These were not studies published in fly-by-night oncology journals, but blockbuster research featured in Science, Nature, Cell, and the like. The Bayer researchers were drowning in bad studies, and it was to this, in part, that they attributed the mysteriously declining yields of drug pipelines. Perhaps so many of these new drugs fail to have an effect because the basic research on which their development was based isn’t valid.

这些研究都不是那些发表在无足轻重的肿瘤学期刊上的研究,而是发表在《科学》、《自然》、《细胞》之类期刊上的大手笔。他们发现人们被垃圾研究淹没了,认为这就是临床药物试验离奇衰落的原因。或许如此多的新药研制失败是因为它们所基于的科学研究不靠谱。

When a study fails to replicate, there are two possible interpretations. The first is that, unbeknownst to the investigators, there was a real difference in experimental setup between the original investigation and the failed replication. These are colloquially referred to as “wallpaper effects,” the joke being that the experiment was affected by the color of the wallpaper in the room. This is the happiest possible explanation for failure to reproduce: It means that both experiments have revealed facts about the universe, and we now have the opportunity to learn what the difference was between them and to incorporate a new and subtler distinction into our theories.

当研究结果无法被重复时,有两种可能性。一种是,确实有某项研究者不知道的实验装置区别存在。这种情况俗称“墙纸效应”,戏谑的认为实验会被墙纸的颜色所影响。这是一个皆大欢喜的解释,表明这两个实验揭示了一些事实,现在我们有机会研究这些差异并将这个更微妙的新发现融入理论中。

The other interpretation is that the original finding was false. Unfortunately, an ingenious statistical argument shows that this second interpretation is far more likely. First articulated by John Ioannidis, a professor at Stanford University’s School of Medicine, this argument proceeds by a simple application of Bayesian statistics. Suppose that there are a hundred and one stones in a certain field. One of them has a diamond inside it, and, luckily, you have a diamond-detecting device that advertises 99 percent accuracy. After an hour or so of moving the device around, examining each stone in turn, suddenly alarms flash and sirens wail while the device is pointed at a promising-looking stone. What is the probability that the stone contains a diamond?

而另一种可能是,原实验的结果为假。很不幸的是,一项设计巧妙的统计学论证显示出第二种解读更有可能。该论证最早由斯坦福医学院John Ioannidis教授提出,现在被一个简单的贝叶斯统计应用所取代。假设一块田里有101块石头,其中的一块里面有钻石,并且,你正好有个号称准确率99%的钻石探测器。在经过了近一个小时的来回,一个一个的检查石头之后,突然警报响起,探测器指向一个有可能的石头。该石头含钻石的可能性是多少?

Most would say that if the device advertises 99 percent accuracy, then there is a 99 percent chance that the device is correctly discerning a diamond, and a 1 percent chance that it has given a false positive reading. But consider: Of the one hundred and one stones in the field, only one is truly a diamond. Granted, our machine has a very high probability of correctly declaring it to be a diamond. But there are many more diamond-free stones, and while the machine only has a 1 percent chance of falsely declaring each of them to be a diamond, there are a hundred of them. So if we were to wave the detector over every stone in the field, it would, on average, sound twice—once for the real diamond, and once when a false reading was triggered by a stone. If we know only that the alarm has sounded, these two possibilities are roughly equally probable, giving us an approximately 50 percent chance that the stone really contains a diamond.

大多数人会说既然探测器的准确率是99%,那么就有99%的可能性该探测器正确的判断出了钻石的所在,和1%的可能性探测器给出了误报。但是请考虑这一点:101块石头中,只有一块有钻石。毋庸置疑,我们的探测器可以以很高的可能性正确判断一块石头里面是否有钻石。但是大多数石头里面是没有钻石的,所以尽管探测器仅有1%的可能性错误的判断出它们中的某一个有钻石,但是这样的石头有100个。于是如果我们在每一块石头上挥舞探测器,则它会报警的期望值是两次,一次为真正的钻石,一次为误报。如果我们仅仅只是听到报警而已,那么这两个情况出现的可能性是相等的,得出的结论就是石头里面有钻石的可能性是大约50%。

This is a simplified version of the argument that Ioannidis applies to the process of science itself. The stones in the field are the set of all possible testable hypotheses, the diamond is a hypothesized connection or effect that happens to be true, and the diamond-detecting device is the scientific method. A tremendous amount depends on the proportion of possible hypotheses which turn out to be true, and on the accuracy with which an experiment can discern truth from falsehood. Ioannidis shows that for a wide variety of scientific settings and fields, the values of these two parameters are not at all favorable.

这是Ioannidis教授关于科学研究过程的统计学论证的一个简化版本。田里的石头就是所有可验证的理论假设的集合,钻石就是那个恰好为真的假设,而探测器就是科学的方法。至关重要的两个参数是真假设占所有可行假设的比例,以及用实验来判断真假的准确性。Ioannidis教授向我们说明了在大部分科研情景和领域里面,这两个参数的值都不容乐观。

For instance, consider a team of molecular biologists investigating whether a mutation in one of the countless thousands of human genes is linked to an increased risk of Alzheimer’s. The probability of a randomly selected mutation in a randomly selected gene having precisely that effect is quite low, so just as with the stones in the field, a positive finding is more likely than not to be spurious—unless the experiment is unbelievably successful at sorting the wheat from the chaff. Indeed, Ioannidis finds that in many cases, approaching even 50 percent true positives requires unimaginable accuracy. Hence the eye-catching title of his paper: “Why Most Published Research Findings Are False.”

比如说,设想一个分子生物学研究小组想要决定人类无数基因中的某一个基因变异是否会增加阿尔兹海默症的风险。一个随机选择的基因里面产生的随机变异,正好产生一个给定的效果,这个可能性是很低的。所以就像田里的石头一样,阳性结果在大多数情况下都很有可能是假的,除非该实验有着令人难以置信的准确率。确实,Ioannidis教授发现在很多情况下,即使是接近50%的真阳性结果也需要惊人的准确率。正是因为这样,他才给他的论文起了个吸引眼球的标题:“为什么被发表的多数研究结论都是假的?”

What about accuracy? Here, too, the news is not good. First, it is a de facto standard in many fields to use one in twenty as an acceptable cutoff for the rate of false positives. To the naive ear, that may sound promising: Surely it means that just 5 percent of scientific studies report a false positive? But this is precisely the same mistake as thinking that a stone has a 99 percent chance of containing a diamond just because the detector has sounded. What it really means is that for each of the countless false hypotheses that are contemplated by researchers, we accept a 5 percent chance that it will be falsely counted as true—a decision with a considerably more deleterious effect on the proportion of correct studies.

准确率又如何呢?也不是太令人乐观。首先,许多研究领域实际上能接受的上限是20个结果里面有一个假阳性。对普通人来说,这个听起来很不错:这想必表明仅仅只有5%的科学研究结果是假阳性的吧?但这和那些认为诱发探测器报警的石头会有99%的可能性藏有钻石的人正好犯了同一个错误。这个数字真正的意义在于,对于研究者们考虑的无数种可行假设中的每一个错误理论,我们接受有5%的可能性它们会被当成是正确理论。这是一个可以显著减少结果正确科学研究的做法。

Paradoxically, the situation is actually made worse by the fact that a promising connection is often studied by several independent teams. To see why, suppose that three groups of researchers are studying a phenomenon, and when all the data are analyzed, one group announces that it has discovered a connection, but the other two find nothing of note. Assuming that all the tests involved have a high statistical power, the lone positive finding is almost certainly the spurious one. However, when it comes time to report these findings, what happens? The teams that found a negative result may not even bother to write up their non-discovery. After all, a report that a fanciful connection probably isn’t true is not the stuff of which scientific prizes, grant money, and tenure decisions are made.

吊诡的是,当数个独立研究小组对同一理论假设做研究的时候,情况反而更糟了。这里用一个例子来说明为什么。设想有三个小组在研究同一现象,在分析完了所有数据之后,一个小组宣布他们发现了现象之间的联系,但是其它两个小组没有发现任何值得一提的东西。假如所有的实验都具有很强的统计学判断力,那么这个孤立的阳性结果几乎一定是可疑的。尽管如此,当要对实验结果做报告发表的时候,会发生什么呢?得出阴性结论的小组甚至都不会去把他们的毫无建树的实验写成论文。毕竟,科研奖项、经费、或是终身教授是不会给一个对有前景的理论假说持否定结论的。

And even if they did write it up, it probably wouldn’t be accepted for publication. Journals are in competition with one another for attention and “impact factor,” and are always more eager to report a new, exciting finding than a killjoy failure to find an association. In fact, both of these effects can be quantified. Since the majority of all investigated hypotheses are false, if positive and negative evidence were written up and accepted for publication in equal proportions, then the majority of articles in scientific journals should report no findings. When tallies are actually made, though, the precise opposite turns out to be true: Nearly every published scientific article reports the presence of an association. There must be massive bias at work.

而且就算他们写成了论文,也很可能不会被发表。期刊之间会争夺学术界的注意力和“影响因子”,因此更乐意发表激动人心的新发现,而不是那些煞风景的阴性结果。事实上,这两个效应是可以被量化的。既然大多数被研究的理论假设应该为假,则如果阴性结果和阳性结果一样被写成论文发表的话,那么大多数期刊论文都应该报告说没有任何发现才对。可是事实上却恰好相反,几乎所有得以发表的论文都认为现象之间存在关联。这个过程中必有大量的偏差。

Ioannidis’s argument would be potent even if all scientists were angels motivated by the best of intentions, but when the human element is considered, the picture becomes truly dismal. Scientists have long been aware of something euphemistically called the “experimenter effect”: the curious fact that when a phenomenon is investigated by a researcher who happens to believe in the phenomenon, it is far more likely to be detected.

即便科学家都如同天使一般,不受任何恶意驱使,Ioannidis教授的论证也一样成立。但是在考虑到人为因素之后,情况就真的差到难以想象了。科学家很久以来都熟悉所谓“观察者期望效应”的委婉说法,即当研究者相信某些现象存在的时候,他们就更有可能在实验中发现这些现象。

Much of the effect can likely be explained by researchers unconsciously giving hints or suggestions to their human or animal subjects, perhaps in something as subtle as body language or tone of voice. Even those with the best of intentions have been caught fudging measurements, or making small errors in rounding or in statistical analysis that happen to give a more favorable result. Very often, this is just the result of an honest statistical error that leads to a desirable outcome, and therefore it isn’t checked as deliberately as it might have been had it pointed in the opposite direction.

这种效应很多源自于:研究者无意识的给他们的人类或动物被试的一些暗示建议,这些暗示可以微妙到肢体语言或是声调变化。就算是最自律的研究者也曾被发现捏造测量,或是在取整的时候犯些小错误,抑或是偏向于统计分析给出的好结果等。经常是一个无心的统计偏差造成了研究者想要的结果,因而就不会被刻意的复查。如果结果指向相反的结论,恐怕就不会被这么轻易的放过了。

But, and there is no putting it nicely, deliberate fraud is far more widespread than the scientific establishment is generally willing to admit. One way we know that there’s a great deal of fraud occurring is that if you phrase your question the right way, scientists will confess to it. In a survey of two thousand research psychologists conducted in 2011, over half of those surveyed admitted outright to selectively reporting those experiments which gave the result they were after. Then the investigators asked respondents anonymously to estimate how many of their fellow scientists had engaged in fraudulent behavior, and promised them that the more accurate their guesses, the larger a contribution would be made to the charity of their choice.

但难以粉饰的事实是,学术圈内造假的广泛程度已经远超学界主流共识所愿意承认的那些。有一种方式可以让我们知道大批的造假行为正在发生,那就是巧妙的使用问卷调查来让科学家们坦白。在2011年的一次涉及两千多位心理学家的问卷调查里,半数以上直接承认了自己有选择性的报告了想要的实验结果。调查者之后让他们匿名估算同事中有多少人从事学术不诚信行为,并许诺向他们指定的慈善机构捐款,额度和估算的准确程度正相关。

Through several rounds of anonymous guessing, refined using the number of scientists who would admit their own fraud and other indirect measurements, the investigators concluded that around 10 percent of research psychologists have engaged in outright falsification of data, and more than half have engaged in less brazen but still fraudulent behavior such as reporting that a result was statistically significant when it was not, or deciding between two different data analysis techniques after looking at the results of each and choosing the more favorable.

经过数轮匿名估算,辅以自我报告的学术不诚信行为数字以及其它的间接测量,调查者得出的结论是:大约有10%的心理学家曾经直接伪造数据,并且半数以上曾经有过相对不那么无耻的学术不端行为,例如将非统计显著的结果报告为统计显著,或是在比较了两种数据分析结果之后再选择对自己有利的分析方法等。

Many forms of statistical falsification are devilishly difficult to catch, or close enough to a genuine judgment call to provide plausible deniability. Data analysis is very much an art, and one that affords even its most scrupulous practitioners a wide degree of latitude. Which of these two statistical tests, both applicable to this situation, should be used? Should a subpopulation of the research sample with some common criterion be picked out and reanalyzed as if it were the totality? Which of the hundreds of coincident factors measured should be controlled for, and how? The same freedom that empowers a statistician to pick a true signal out of the noise also enables a dishonest scientist to manufacture nearly any result he or she wishes.

许多形式的统计造假极难被抓住,或是太过于接近真实的分析决断,从而可以充分拒绝造假的指控。数据分析更像是一门艺术,即使是最严谨的数据分析者也有相当多的自由度可供发挥。两个同样适用的统计检验方法,该用哪个?是否应该将样本中的符合公共准则的子样本挑出来代表整体重新分析?数百个里面,我应该控制哪个?如何控制?使统计学家可以从噪音中挑出信号的那种自由度,同时让不诚实的科学家可以炮制出他/她想要的任何结果。

Cajoling statistical significance where in reality there is none, a practice commonly known as “p-hacking,” is particularly easy to accomplish and difficult to detect on a case-by-case basis. And since the vast majority of studies still do not report their raw data along with their findings, there is often nothing to re-analyze and check even if there were volunteers with the time and inclination to do so.

通过不断诱导数据从而得出不存在的显著统计,是一种通常被称作“p值操纵”的作弊法。做起来很容易,但是要检验出其是否被使用,却是极难。【译注:p值操纵指研究者轮番使用不同的统计方法和数据,直到结果显著为止。与正常的数据分析所采用的提出假设之后用数据验证假设的流程相反,p值操纵旨在找到具有显著性的关联,并在此基础上建立假设,因此导致假阳性。】并且大部分研究结果的原始数据还是不公开的,就算有人肯花时间来检查,也没有资源。

One creative attempt to estimate how widespread such dishonesty really is involves comparisons between fields of varying “hardness.” The author, Daniele Fanelli, theorized that the farther from physics one gets, the more freedom creeps into one’s experimental methodology, and the fewer constraints there are on a scientist’s conscious and unconscious biases. If all scientists were constantly attempting to influence the results of their analyses, but had more opportunities to do so the “softer” the science, then we might expect that the social sciences have more papers that confirm a sought-after hypothesis than do the physical sciences, with medicine and biology somewhere in the middle.

在估算这种学术不端的广泛性方面,有一个有创意的尝试,涉及到比较各学科的“硬”度。始作俑者Daniele Fanelli认为,一个学科离(最硬的)物理学越远,在实验方法上就更具有自由度,对于科学家们有意无意的错误的约束也越少。假如所有的科学家都试图影响实验分析的结果,而较“软”的学科里这么做更加容易,结果就是我们可能会发现,相比于物理学,社会科学发表的文章中更多的证实了那些倍受青睐的假说,而医学和生物学处于这两个学科之间的某个位置。

This is exactly what the study discovered: A paper in psychology or psychiatry is about five times as likely to report a positive result as one in astrophysics. This is not necessarily evidence that psychologists are all consciously or unconsciously manipulating their data—it could also be evidence of massive publication bias—but either way, the result is disturbing.

这正是研究发现的结果:心理学或是精神病学研究论文报告阳性结果的可能性是天体力学的五倍左右。这并不必然表明心理学家们在有意无意的篡改数据,也可能是论文发表系统的大规模选择性偏见,但是无论如何,令人担忧。

Speaking of physics, how do things go with this hardest of all hard sciences? Better than elsewhere, it would appear, and it’s unsurprising that those who claim all is well in the world of science reach so reliably and so insistently for examples from physics, preferably of the most theoretical sort. Folk histories of physics combine borrowed mathematical luster and Whiggish triumphalism in a way that journalists seem powerless to resist. The outcomes of physics experiments and astronomical observations seem so matter-of-fact, so concretely and immediately connected to underlying reality, that they might let us gingerly sidestep all of these issues concerning motivated or sloppy analysis and interpretation.

到物理学,对于这个最硬的学科,结果又如何呢?至少看起来比别的强。因而,不出意料的是,几乎所有认为科学世界安然无恙的那些人会放心的坚持从物理学里寻找例证,最好还是偏理论方向。民间物理学的历史以一种让记者们无法抵御的方式将数学的光泽和辉格式凯旋主义相结合。物理实验和天文观测的结果看上去如此注重事实,如此具体而又直接关联到其表象之下的现实世界,以至于可以让我们小心翼翼的绕开那些别有用心的或是不合格的分析和解读。

“E pur si muove,” Galileo is said to have remarked, and one can almost hear in his sigh the hopes of a hundred science journalists for whom it would be all too convenient if Nature were always willing to tell us whose theory is more correct.

“不管你怎么想,它(地球)就是在动的”,这据说是伽利略的名言,而从他的这句感叹中我们几乎能听到一百个科学报道者的祈祷,因为对他们来说,大自然若是肯轻易透露谁的理论更正确,那简直就是太方便了。

And yet the flight to physics rather gives the game away, since measured any way you like—volume of papers, number of working researchers, total amount of funding—deductive, theory-building physics in the mold of Newton and Lagrange, Maxwell and Einstein, is a tiny fraction of modern science as a whole. In fact, it also makes up a tiny fraction of modern physics. Far more common is the delicate and subtle art of scouring inconceivably vast volumes of noise with advanced software and mathematical tools in search of the faintest signal of some hypothesized but never before observed phenomenon, whether an astrophysical event or the decay of a subatomic particle.

即使如此,向物理学寻求庇护也多少泄露一些信息。因为无论怎么看,不论是从发表文章数、研究员数量、还是研究经费方面来看,被牛顿、拉格朗日、麦克斯韦和爱因斯坦所铸造的基于演绎和理论构建的物理学,在整个现代科学界里面也仅仅只是一小撮。实际上,就算是在现代物理学里也是少数。更为普遍的情况则是那些精细微妙的艺术,能够使用先进的软件和数学工具,从难以想象的大规模数据中分离噪音,去找某种极其微弱的从未被观测到的理论信号。

This sort of work is difficult and beautiful in its own way, but it is not at all self-evident in the manner of a falling apple or an elliptical planetary orbit, and it is very sensitive to the same sorts of accidental contamination, deliberate fraud, and unconscious bias as the medical and social-scientific studies we have discussed. Two of the most vaunted physics results of the past few years—the announced discovery of both cosmic inflation and gravitational waves at the BICEP2 experiment in Antarctica, and the supposed discovery of superluminal neutrinos at the Swiss-Italian border—have now been retracted, with far less fanfare than when they were first published.

这类工作自有其难点和引人之处,但是绝不像落下的苹果或是椭圆的行星轨道那样不证自明,且和我们所讨论过的医学以及社会科学一样,非常容易受到意外污染、刻意造假和下意识的偏见所影响。过去几年里最饱受赞誉的两项物理学科研成果——北极BICEP2实验发现的宇宙暴涨和引力波,以及在瑞士-意大利边境发现的超光速中微子——现在已经被撤回,相应关注也比它们刚发表时少了许多。

Many defenders of the scientific establishment will admit to this problem, then offer hymns to the self-correcting nature of the scientific method. Yes, the path is rocky, they say, but peer review, competition between researchers, and the comforting fact that there is an objective reality out there whose test every theory must withstand or fail, all conspire to mean that sloppiness, bad luck, and even fraud are exposed and swept away by the advances of the field.

许多现有科研领域的辩护者承认这些问题,又称赞科学方法自有纠错能力。是的,道路是曲折的,他们说,但是同行评议、研究者之间的竞争、以及存在客观现实以检验理论这些令人舒心的事实,都会随着科学的进展潜移默化的将懒惰、倒霉、甚至欺诈等因素暴露并且驱逐出科研领域。

So the dogma goes. But these claims are rarely treated like hypotheses to be tested. Partisans of the new scientism are fond of recounting the “Sokal hoax”—physicist Alan Sokal submitted a paper heavy on jargon but full of false and meaningless statements to the postmodern cultural studies journal Social Text, which accepted and published it without quibble—but are unlikely to mention a similar experiment conducted on reviewers of the prestigious British Medical Journal.

教条就是这样口口相传。但是这些声明几乎从未被像科学假设那样检验过。新科学至上主义的支持者们乐于重复“Sokal恶作剧”(指物理学家Alan Sokal向后现代文化研究期刊《社会文本》递交了一篇充满着行话但却全是错误和无稽表述的论文,却被接受并且毫无异议的发表了),却不太可能提到一个类似的实验,对象是具有很高声望的英国医学期刊的评审者们。

The experimenters deliberately modified a paper to include eight different major errors in study design, methodology, data analysis, and interpretation of results, and not a single one of the 221 reviewers who participated caught all of the errors. On average, they caught fewer than two—and, unbelievably, these results held up even in the subset of reviewers who had been specifically warned that they were participating in a study and that there might be something a little odd in the paper that they were reviewing. In all, only 30 percent of reviewers recommended that the intentionally flawed paper be rejected.

实验者有意更改了一篇论文,使之包含八个不同的重大错误,分散于实验设计、方法论、数据分析、和结果解读方面。在221个评审者中,没有一个人挑出全部错误。他们平均抓到少于两个错误。并且,令人难以置信的是,当告诉一个分组的评审者们他们面对的论文有问题时,该结论也成立。总而言之,只有30%的评审者认为这篇有意制造的问题论文应该被拒绝发表。

If peer review is good at anything, it appears to be keeping unpopular ideas from being published. Consider the finding of another (yes, another) of these replicability studies, this time from a group of cancer researchers. In addition to reaching the now unsurprising conclusion that only a dismal 11 percent of the preclinical cancer research they examined could be validated after the fact, the authors identified another horrifying pattern: The “bad” papers that failed to replicate were, on average, cited far more often than the papers that did! As the authors put it, “some non-reproducible preclinical papers had spawned an entire field, with hundreds of secondary publications that expanded on elements of the original observation, but did not actually seek to confirm or falsify its fundamental basis.”

如果有一件事情是同行评议机制所擅长的,那就是让不受欢迎的想法不被发表。来看看另一个可重复性研究吧(对,另一个),这次是来自于一些癌症研究人员的。在得出只有11%的癌症临床前研究可被事后验证的令人毫不惊讶的结论之外,研究者们发现了另一个恐怖的现象:那些结果难以被重复的“坏”的论文,平均引用次数大于结果能被重复的那些!正如研究者们所提到的那样:“一些不可重复的临床前实验论文创造了一整个研究领域,和基于原初观察结论所衍生出的数百篇论文,但却没有认真确证或是证伪其研究基础。”

What they do not mention is that once an entire field has been created—with careers, funding, appointments, and prestige all premised upon an experimental result which was utterly false due either to fraud or to plain bad luck—pointing this fact out is not likely to be very popular. Peer review switches from merely useless to actively harmful. It may be ineffective at keeping papers with analytic or methodological flaws from being published, but it can be deadly effective at suppressing criticism of a dominant research paradigm. Even if a critic is able to get his work published, pointing out that the house you’ve built together is situated over a chasm will not endear him to his colleagues or, more importantly, to his mentors and patrons.

可他们没有提到的是,当整个研究领域被创造出来,当事业、经费、职务和声望都和一个实验结论所绑定,这个结果是假造的,无论是出于有意欺骗还是仅仅只是运气不好,将事实捅出去看来不是很受欢迎的做法。由此,同行评审从纯粹无用变成了积极为害,它在为论文排除分析上的或方法论上的缺陷方面很没用,但是在压制对主流研究范式的批评方面却非常有效。就算批评者最终可以将他的作品发表,指出整个研究领域是空中楼阁这种行为也不会受到同事、甚至导师和赞助方的青睐。

Older scientists contribute to the propagation of scientific fields in ways that go beyond educating and mentoring a new generation. In many fields, it’s common for an established and respected researcher to serve as “senior author” on a bright young star’s first few publications, lending his prestige and credibility to the result, and signaling to reviewers that he stands behind it. In the natural sciences and medicine, senior scientists are frequently the controllers of laboratory resources—which these days include not just scientific instruments, but dedicated staffs of grant proposal writers and regulatory compliance experts—without which a young scientist has no hope of accomplishing significant research. Older scientists control access to scientific prestige by serving on the editorial boards of major journals and on university tenure-review committees. Finally, the government bodies that award the vast majority of scientific funding are either staffed or advised by distinguished practitioners in the field.

在科学领地的开拓上,有资历的科学家除了对新一代传道授业之外,还可以在其它方面施加很大影响。在很多学科领域,卓有建树且受人尊敬的老学者以论文通讯作者的方式为年轻有为的新学者站台,用自己的名声和信誉向论文评审者对实验结果做出担保,这是很常见的。在自然科学和医学领域,有资历的科学家往往也掌握重要的研究资源,这些资源如今已不仅仅是科学仪器,还包括专门的研究基金申请书写作小组和合规问题专家等。没有这些资源,资历浅的研究员很难做出有影响力的研究。前辈们还掌控着重要的学术声誉,他们往往在重要期刊和终身教职的评审委员会列席。最后,许多主要的科研经费来自于政府机构,而政府的研究理事会要么由行内卓越人士担任,要么向他们寻求建议。

All of which makes it rather more bothersome that older scientists are the most likely to be invested in the regnant research paradigm, whatever it is, even if it’s based on an old experiment that has never successfully been replicated. The quantum physicist Max Planck famously quipped: “A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.”

这一切都会使情况变的更麻烦,因为有资历的科学家更有可能站在主流的研究范式一边,无论该范式是什么,就算是建立在一个从未被成功重复的年代久远的实验结果之上。量子物理学家马克思·普朗克有句至理名言:“新的科学理论战胜旧理论,并非是论敌被说服了,而是论敌们最终都死掉了,新的一代成长起来并逐渐适应了新理论。”

Planck may have been too optimistic. A recent paper from the National Bureau of Economic Research studied what happens to scientific subfields when star researchers die suddenly and at the peak of their abilities, and finds that while there is considerable evidence that young researchers are reluctant to challenge scientific superstars, a sudden and unexpected death does not significantly improve the situation, particularly when “key collaborators of the star are in a position to channel resources (such as editorial goodwill or funding) to insiders.”

普朗克可能有些过于乐观了。最近一篇来自于国家经济研究办公室的报告,研究了当明星学者在他们最为高产的时候突然死亡所带来的影响,发现虽然有大量的证据表明年轻学者不愿意去挑战明星学者,但是明星学者的突然意外死亡并不能显著改变这个情境,特别是当“明星学者的重要合作者依然掌控着学科内资源(如论文评审时的青睐或是研究经费)的分配渠道”时。

In the idealized Popperian view of scientific progress, new theories are proposed to explain new evidence that contradicts the predictions of old theories. The heretical philosopher of science Paul Feyerabend, on the other hand, claimed that new theories frequently contradict the best available evidence—at least at first. Often, the old observations were inaccurate or irrelevant, and it was the invention of a new theory that stimulated experimentalists to go hunting for new observational techniques to test it.

在理想化的波普尔式科学进步图景中,新理论应该能够解释新的证据,而这些证据是和旧理论所做出的预测是相悖的。与之相反,离经叛道的科学哲学家Paul Feyerabend认为,新理论常常和能够获得的最好证据相悖,至少在一开始是这样的。旧的观察方式往往不够精确或不是非常有关联,而正是新理论的发明促使实验者们去寻找新的观察技术来验证它们。

But the success of this “unofficial” process depends on a blithe disregard for evidence while the vulnerable young theory weathers an initial storm of skepticism. Yet if Feyerabend is correct, and an unpopular new theory can ignore or reject experimental data long enough to get its footing, how much longer can an old and creaky theory, buttressed by the reputations and influence and political power of hundreds of established practitioners, continue to hang in the air even when the results upon which it is premised are exposed as false?

但是这种“非正式”的过程能够成功的关键,取决于当脆弱的新理论一开始被怀疑的风暴包围时能否以一种天真乐观的方式来无视既有证据。尽管如此,就算Feyerabend是对的,且不受欢迎的新理论能够无视或拒绝实验数据以至于站稳脚跟,那些陈腐古板的旧理论,即便其所基于的实验结论已被证明是错的,背后有着数百名业内人士的名誉、影响力、和政治权力的支持,会继续滞留多久呢?

The hagiographies of science are full of paeans to the self-correcting, self-healing nature of the enterprise. But if raw results are so often false, the filtering mechanisms so ineffective, and the self-correcting mechanisms so compromised and slow, then science’s approach to truth may not even be monotonic. That is, past theories, now “refuted” by evidence and replaced with new approaches, may be closer to the truth than what we think now.

科学的圣传中充斥着凸显其自我纠正和自我治愈能力的光辉事迹。但如果原始结果是如此容易出错,筛选过程如此无效,且自我纠正机制如此迟缓且经常不被遵守的话,那么科学发掘事实真相的过程甚至不一定是单调的。即,过去的理论,现在已经被新证据“证伪”且被新方法取代的那些,可能比我们所想的更接近事实。

Such regress has happened before: In the nineteenth century, the (correct) vitamin C deficiency theory of scurvy was replaced by the false belief that scurvy was caused by proximity to spoiled foods. Many ancient astronomers believed the heliocentric model of the solar system before it was supplanted by the geocentric theory of Ptolemy. The Whiggish view of scientific history is so dominant today that this possibility is spoken of only in hushed whispers, but ours is a world in which things once known can be lost and buried.

这种倒退在以前也曾经发生过:在19世纪,对于坏血病的(正确的)维他命C缺乏理论被错误的理论取代,该理论认为是坏掉的食物导致了坏血病。许多古代天文学者相信日心说的太阳系模型,直到它被托勒密的地心说取代。以辉格史观看待科学发展的历程支配着当前主流看法,以至于倒退的可能性仅仅存在于窃窃私语中。但是在我们身处的世界里,知识是可以被掩埋和失传的。

And even if self-correction does occur and theories move strictly along a lifecycle from less to more accurate, what if the unremitting flood of new, mostly false, results pours in faster? Too fast for the sclerotic, compromised truth-discerning mechanisms of science to operate? The result could be a growing body of true theories completely overwhelmed by an ever-larger thicket of baseless theories, such that the proportion of true scientific beliefs shrinks even while the absolute number of them continues to rise. Borges’s Library of Babel contained every true book that could ever be written, but it was useless because it also contained every false book, and both true and false were lost within an ocean of nonsense.

而且就算自我纠正确实发生了,且理论的发展严格遵循从模糊到精确的周期,可是如果那些新的、大部分是错误的结果以更快的速度涌现呢?这速度如果快过让迟钝且不完善的科学真理判定机制来做出反应,情况又会怎样呢?结果可能是增长的正确理论被完全淹没在更快速增长的无稽理论中,以至于正确理论的绝对数量在增加,而同时它们所占的比例却逐渐减小。博尔赫斯的“巴别图书馆”里有每一本可能的包含真正知识书籍,但这毫无用处,因为它也收藏了每一本由错误知识构成的书【译注:“巴别图书馆”的藏书包含了25个书写符号任意排列组合组成的所有可能书籍】,结果就是正确和错误的知识都消散于无意义的海洋里。

Which brings us to the odd moment in which we live. At the same time as an ever more bloated scientific bureaucracy churns out masses of research results, the majority of which are likely outright false, scientists themselves are lauded as heroes and science is upheld as the only legitimate basis for policy-making. There’s reason to believe that these phenomena are linked. When a formerly ascetic discipline suddenly attains a measure of influence, it is bound to be flooded by opportunists and charlatans, whether it’s the National Academy of Science or the monastery of Cluny.

我想起生活中的怪事。一方面科学官僚们产生日渐臃肿的研究结果,其中大部分很可能是错误的,另一方面,科学家受到英雄般的尊崇,而科学被视为制定政策的唯一合理依据。我们有理由认为这些现象之间是有联系的。当一个曾经冷门的领域突然获得了一定的影响力的时候,必然遭到一批投机者和骗子的入侵,无论是国家科学院还是克吕尼修道院,都是一样的情况。【译注:克吕尼修道院,是公元910年在法国克吕尼建立的天主教修道院,以禁欲著称,是天主教改革运动克吕尼改革的发源地。】

This comparison is not as outrageous as it seems: Like monasticism, science is an enterprise with a superhuman aim whose achievement is forever beyond the capacities of the flawed humans who aspire toward it. The best scientists know that they must practice a sort of mortification of the ego and cultivate a dispassion that allows them to report their findings, even when those findings might mean the dashing of hopes, the drying up of financial resources, and the loss of professional prestige.

这个比较并不是那么的荒谬:就像修道主义,科学也拥有一个超人的目标,其成就远非有缺陷的人类能力所及。最好的科学家懂得要忍辱负重并培养出冷静的心境,以便他们能够忠实地公布科学发现,尽管有时候这些发现意味着希望的破灭,财政的干涸,以及职业声誉上的损失。

It should be no surprise that even after outgrowing the monasteries, the practice of science has attracted souls driven to seek the truth regardless of personal cost and despite, for most of its history, a distinct lack of financial or status reward. Now, however, science and especially science bureaucracy is a career, and one amenable to social climbing. Careers attract careerists, in Feyerabend’s words: “devoid of ideas, full of fear, intent on producing some paltry result so that they can add to the flood of inane papers that now constitutes ‘scientific progress’ in many areas.”

不必惊奇,尽管科学的实践超出了修道的范畴,它仍然能吸引到不顾自身利益而追求真理的人们,尽管在历史上的大部分时期,投身科学无财无名。而现在,科学,特别是科技官僚,是一项职业,顺应社会攀爬。它会吸引一心求名求利的人,用Feyerabend的话说,这些人“毫无创见,充满恐惧,只想制造出某些琐碎的结论以便加入构成很多领域里所谓的‘科学进步’的论文大军”。

If science was unprepared for the influx of careerists, it was even less prepared for the blossoming of the Cult of Science. The Cult is related to the phenomenon described as “scientism”; both have a tendency to treat the body of scientific knowledge as a holy book or an a-religious revelation that offers simple and decisive resolutions to deep questions.

如果说科学界对突然涌入的利益分子缺乏准备,那么面对爆发的科学教派就更是措手不及了。这个教派和被称为“科学至上主义”的现象有很大联系。二者都倾向于将科学知识视为圣经或是某种非宗教意义上的启示,认为它对于深刻的问题可以带来简单且具有决定意义的解答。

But it adds to this a pinch of glib frivolity and a dash of unembarrassed ignorance. Its rhetorical tics include a forced enthusiasm (a search on Twitter for the hashtag “#sciencedancing” speaks volumes) and a penchant for profanity. Here in Silicon Valley, one can scarcely go a day without seeing a t-shirt reading “Science: It works, b—es!” The hero of the recent popular movie The Martian boasts that he will “science the sh— out of” a situation.

但是科学教在此之上又多了一点夸夸其谈和一点不知脸红的无知。在修辞上体现为一种强迫症式的狂热(在推特上搜一下“sciencedancing”的主题标签就知道了)和对脏话的嗜好。在我们硅谷,走在大街上经常看到有人的T恤上印着诸如“科学:贼好用,婊子们!”最近的热门电影《火星救援》里的主人公面对危机时的豪言壮语则是“用科学把它捅出屎”。

One of the largest groups on Facebook is titled “I f—ing love Science!” (a name which, combined with the group’s penchant for posting scarcely any actual scientific material but a lot of pictures of natural phenomena, has prompted more than one actual scientist of my acquaintance to mutter under her breath, “What you truly love is pictures”). Some of the Cult’s leaders like to play dress-up as scientists—Bill Nye and Neil deGrasse Tyson are two particularly prominent examples— but hardly any of them have contributed any research results of note. Rather, Cult leadership trends heavily in the direction of educators, popularizers, and journalists.

脸书上最大的团体之一的名字是“我真他妈的爱科学!”(这个名字,加上该团体对于发表大量自然现象的图片而不是科学内容的爱好,已经让不止一个我认识的真正的科学家嘀咕“你们爱的其实是图片吧”)。某些科学教的领袖们喜欢装扮成科学家的样子——Bill Nye和Neil deGrasse Tyson是其中的两个典型——但他们几乎没有任何值得一提的研究贡献。与之相对的是,这些领袖们在教育者、科普者、和媒体从业者中非常受欢迎。

At its best, science is a human enterprise with a superhuman aim: the discovery of regularities in the order of nature, and the discerning of the consequences of those regularities. We’ve seen example after example of how the human element of this enterprise harms and damages its progress, through incompetence, fraud, selfishness, prejudice, or the simple combination of an honest oversight or slip with plain bad luck. These failings need not hobble the scientific enterprise broadly conceived, but only if scientists are hyper-aware of and endlessly vigilant about the errors of their colleagues … and of themselves. When cultural trends attempt to render science a sort of religion-less clericalism, scientists are apt to forget that they are made of the same crooked timber as the rest of humanity and will necessarily imperil the work that they do. The greatest friends of the Cult of Science are the worst enemies of science’s actual practice.

总之,科学在最好的时候,是具有非凡目标的人类事业:在自然的秩序中发现常理,并且用这些常理来推断事情的后果。我们看到了这项事业里的人类因素一个又一个危害进步的例子,有些出于无能、欺瞒、自私、偏见,有些只是出于某种诚实的忽视和一点坏运气。这些失败不能成为科学事业的羁绊,但这需要科学家对于同事们和自己的错误非常了解,并且保持高度警惕。当文化潮流试图将科学表述成某种区别于宗教的圣职专权时,科学家们非容易忘记他们是和其他人一样易于腐蚀的朽木,随时有可能危害从事的行业。最狂热的科学教徒是科学实践最大的敌人。

William A. Wilson is a software engineer in the San Francisco Bay Area.
本文作者William A. Wilson 是旧金山湾区的一名软件工程师。

翻译:小聂(@PuppetMaster)
校对:龙泉
编辑:辉格@whigzhou

相关文章

comments powered by Disqus