Is the FDA Too Conservative or Too Aggressive?
FDA,过于保守还是过于激进?

I have long argued that the FDA has an incentive to delay the introduction of new drugs because approving a bad drug (Type I error) has more severe consequences for the FDA than does failing to approve a good drug (Type II error). In the former case at least some victims are identifiable and the New York Times writes stories about them and how they died because the FDA failed. In the latter case, when the FDA fails to approve a good drug, people die but the bodies are buried in an invisible graveyard.

我一直认为,FDA有充分的动机来延迟新药审批,因为对于FDA来说,批准一种不合格的药(第一型错误)比拒绝一种合格的药(第二型错误)后果要严重得多。在前一种情况下,至少某些受害人的身份是可以被证实的,《纽约时报》会报道他们的故事,和他们是怎么因为FDA的失败而死亡的。而在后一种情况下,FDA没能批准合格的药品,虽然导致死亡,但是受害者藉藉无名,只会被人们忘记。

In an excellent new paper (SSRN also here) Vahid Montazerhodjat and Andrew Lo use a Bayesian analysis to model the optimal tradeoff in clinical trials between sample size, Type I and Type II error. Failing to approve a good drug is more costly, for example, the more severe the disease. Thus, for a very serious disease, we might be willing to accept a greater Type I error in return for a lower Type II error. The number of people with the disease also matters. Holding severity constant, for example, the more people with the disease the more you want to increase sample size to reduce Type I error. All of these variables interact.

在一篇杰出的新论文中,Vahid Montazerhodjat和Andrew Lo使用了贝叶斯分析来为临床试验中的病人数量、第一型错误和第二型错误之间的取舍进行了建模。举个例子来说,疾病越严重,不批准一种优秀药品所造成的损失就越大。因此,对于非常严重的疾病,人们可能会愿意接受更大的犯第一型错误的可能,来换取较小的犯第二型错误的几率。患有疾病的人群数量也很关键。比如说在建模中控制疾病的严重性为常量之后,患病的人群数量越大,就越需要在临床试验中招募更多的病人,以此来降低犯第一型错误的可能性。以上提到的这些变量都是互相影响的。

In an innovation the authors use the U.S. Burden of Disease Study to find the number of deaths and the disability severity caused by each major disease. Using this data they estimate the costs of failing to approve a good drug. Similarly, using data on the costs of adverse medical treatment they estimate the cost of approving a bad drug.

作者创新性的使用了美国疾病负担研究结果中所有主要疾病所导致的死亡和残疾的数据。他们利用这些数据估算未批准一种合格药品所造成的损失。同样,他们也使用了药物不良反应带来的成本,以此来估算批准一种不合格药品所造成的损失。

Putting all this together the authors find that the FDA is often dramatically too conservative:

综上所述,作者发现FDA在新药审批方面通常是极为保守的:

“…we show that the current standards of drug-approval are weighted more on avoiding a Type I error (approving ineffective therapies) rather than a Type II error (rejecting effective therapies). For example, the standard Type I error of 2.5% is too conservative for clinical trials of therapies for pancreatic cancer—a disease with a 5-year survival rate of 1% for stage IV patients (American Cancer Society estimate, last updated 3 February 2013). The BDA-optimal size for these clinical trials is 27.9%, reflecting the fact that, for these desperate patients, the cost of trying an ineffective drug is considerably less than the cost of not trying an effective one.”

“……我们的结果显示,现有的药品审批标准更偏向于避免第一型错误(批准无效的疗法)而不是避免第二型错误(拒绝有效的疗法)。譬如,对于胰腺癌——一种四期病人五年内存活率仅有1%的疾病(美国癌症协会预测,最后更新于2013年2月3日)——标准的2.5%第一型错误率实在是太过保守。这些临床试验经过贝叶斯决策分析优化过的容错标准为27.9%,这表明对于这些绝望的患者们,试用一种无效药物的成本大大低于不尝试一种有效药物的成本。”

(The authors also find that the FDA is occasionally a little too aggressive but these errors are much smaller, for example, the authors find that for prostate cancer therapies the optimal significance level is 1.2% compared to a standard rule of 2.5%.)

(作者还发现FDA偶尔也会过于激进,但是偏离的程度小得多。例如,前列腺癌治疗的最优显著率是1.2%,而不是标准的2.5%。)

The result is important especially because in a number of respects, Montazerhodjat and Lo underestimate the costs of FDA conservatism. Most importantly, the authors are optimizing at the clinical trial stage assuming that the supply of drugs available to be tested is fixed. Larger trials, however, are more expensive and the greater the expense of FDA trials the fewer new drugs will be developed. Thus, a conservative FDA reduces the flow of new drugs to be tested.

该结果十分重要,尤其因为在很多方面,Montazerhodjat和Lo低估了FDA坚持保守标准的成本。最关键的一点在于,作者们假定了待评估药物的供给是恒定的,并在此基础之上来优化临床试验阶段的容错标准。然而大型临床试验往往花费更高,这又导致新药研发的萎缩。因此,保守的FDA会降低新药研发的数量。

In a sense, failing to approve a good drug has two costs, the opportunity cost of lives that could have been saved and the cost of reducing the incentive to invest in R&D. In contrast, approving a bad drug while still an error at least has the advantage of helping to incentivize R&D (similarly, a subsidy to R&D incentivizes R&D in a sense mostly by covering the costs of failed ventures).

从某种意义上说,错误的拒绝一种好的药品有两种成本,一是没能拯救那些本来可以被拯救的病人的机会成本,二是减少了对新药研发做投资的激励所带来的成本。与之相对的是,批准一种不合格的药品,尽管仍旧是个错误,但是至少可以给新药研发带来正面的激励(类似的,对研发进行补贴的一个主要形式就是支付那些失败的研发项目经费,以此来激励更多的新药研发)。

The Montazerhodjat and Lo framework is also static, there is one test and then the story ends. In reality, drug approval has an interesting asymmetric dynamic. When a drug is approved for sale, testing doesn’t stop but moves into another stage, a combination of observational testing and sometimes more RCTs–this, after all, is how adverse events are discovered. Thus, Type I errors are corrected. On the other hand, for a drug that isn’t approved the story does end. With rare exceptions, Type II errors are never corrected.

而且,Montazerhodjat和Lo的分析框架是静态的,一个试验完了,故事就结束了。可实际上,药物审批流程有个独特的非对称机制。当药物被批准上市之后,测试并非就此结束,而是进入下一个阶段,往往由一系列观测性的测试,有时甚至是随机临床试验构成——毕竟,这是发现不良反应的方式。因此,第一型错误往往得到纠正。另一方面,对于一种不被批准的药物,故事到这里就结束了。第二型错误几乎没有被纠正过。

The Montazerhodjat and Lo framework could be interpreted as the reduced form of this dynamic process but it’s better to think about the dynamism explicitly because it suggests that approval can come in a range–for example, approval with a black label warning, approval with evidence grading and so forth. As these procedures tend to reduce the costs of Type I error they tend to increase the costs of FDA conservatism.

Montazerhodjat和Lo的框架可以被视为这个机制的一个简化版本,但最好还是能具体的思考一下这个机制,因为这暗示了对于新药的审批结果其实可以是一个范围——比如说,批准(但是带有一个黑色警示标签),或是带有证据强度分级的批准,等等。因为这些举措可以有效降低第一型错误的成本,它们倾向于使FDA在过于保守时受到惩罚。

Montazerhodjat and Lo also don’t examine the implications of heterogeneity of preferences or of disease morbidity and mortality. Some people, for example, are severely disabled by diseases that on average aren’t very severe–the optimal tradeoff for these patients will be different than for the average patient. One size doesn’t fit all. In the standard framework it’s tough luck for these patients.

Montazerhodjat 和Lo也并没有检验新药特征的不均一性所带来的影响,这些不均一性主要体现于病人对于治疗结果的偏好或是疾病的发病率和死亡率。例如,有些病人被那些平均而言并不太严重的疾病弄成了严重残疾,对这些病人来说,最优的取舍显然不同于一般的病人。同一个标准并不适用于所有的情况。所以在标准的优化框架里面,这些病人就被忽略了。

But if the non-FDA reviewing apparatus (patients/physicians/hospitals/HMOs/USP/Consumer Reports and so forth) works relatively well, and this is debatable but my work on off-label prescribing suggests that it does, this weighs heavily in favor of relatively large samples but low thresholds for approval. What the FDA is really providing is information and we don’t need product bans to convey information. Thus, heterogeneity plus a reasonable effective post-testing choice process, mediates in favor of a Consumer Reports model for the FDA.

而如果非FDA的评价机构(包括病人、医生、医院、卫生保健组织、美国药典、消费者报告,等等)相对来说起作用的话——这个观点虽然有待商榷,但我给病人开非处方药的经验表明它们是起作用的——这些评价机构就更适合于那些需要大量病人但是批准门槛较低的药。FDA真正提供的是信息,而我们没法从一刀切的禁令中获取有效信息。因此,不均一性,加上一个合理有效的试验后选择机制,间接的指向一个更好的消费者报告式的FDA模式。

The bottom line, however, is that even without taking into account these further points, Montazerhodjat and Lo find that the FDA is far too conservative especially for severe diseases. FDA regulations may appear to be creating safe and effective drugs but they are also creating a deadly caution.

就算不考虑以上这些引申观点,最起码,Montazerhodjat 和 Lo的研究表明,FDA在新药的审批上,尤其是针对特别严重的疾病时,显得过于保守了。FDA的监管或许给我们带来了安全和有效的药品,但是同时也带来了致命的谨慎。

翻译:小聂(@PuppetMaster)
校对:babyface_claire (@许你疯不许你傻)
编辑:辉格@whigzhou

相关文章

comments powered by Disqus