Saturday, February 15, 2014

Stopping rules and Bayesian analysis

I happened to receive two questions about stopping rules on the same day.
First, from Tom Cunningham:
I’ve been arguing with my colleagues about whether the stopping rule is relevant (a presenter disclosed that he went out to collect more data because the first experiment didn’t get significant results) — and I believe you have some qualifications to the Bayesian irrelevance argument but I don’t properly understand them.
Then, from Benjamin Kay:
I have a question that may be of interest for your blog. I was reading about the early history of AIDS and learned that the the trial of AZT was ended early because it was so effective:
The trial reported in the New England Journal of medicine, had produced a dramatic result. Before the planned 24 week duration of the study, after a mean period of participation of about 120 days, nineteen participants receiving placebo had died while there was only a single death among those receiving AZT. This appeared to be a momentous breakthrough and accordingly there was no restraint at all in reporting the result; prominent researchers triumphantly proclaimed the drug to be “a ray of hope” and “a light at the end of the tunnel”. Because of this dramatic effect, the placebo arm of the study was discontinued and all participants offered 1500mg of AZT daily.
It is my understanding that this is reasonably common when they do drug studies on humans. If the treatment is much, much better than the control it is considered unethical to continue the planned study and they end it early.
I certainly understand the sentiment behind that. However, I know that it isn’t kosher to keep adding time or sample to an experiment until you find a result, and isn’t this a bit like that? Shouldn’t we expect regression to the mean and all that?
When two people come to me with a question, I get the impression it’s worth answering. So here goes:
First, we discuss stopping rules in section 6.3 (the example on pages 147-148), section 8.5, and exercise 8.15 of BDA3. The short answer is that the stopping rule enters Bayesian data analysis in two places: inference and model checking:
1. For inference, the key is that the stopping rule is only ignorable if time is included in the model. To put it another way, treatment effects (or whatever it is that you’re measuring) can vary over time, and that possibility should be allowed for in your model, if you’re using a data-dependent stopping rule. To put it yet another way, if you use a data-dependent stopping rule and don’t allow for possible time trends in your outcome, then your analysis will not be robust to failures with that assumption.
2. For model checking, the key is that if you’re comparing observed data to hypothetical replications under the model (for example, using a p-value), these hypothetical replications depend on the design of your data collection. If you use a data-dependent stopping rule, this should be included in your data model, otherwise your p-value isn’t what it claims to be.
Next, my response to Benjamin Kay’s question about AZT:
For the Bayesian analysis, it is actually kosher “to keep adding time or sample to an experiment until you find a result.” As noted above, you do lose some robustness but, hey, there are tradeoffs in life, and robustness isn’t the only important thing out there. Beyond that, I do think there should be ways to monitor treatments that have already been approved, so that if problems show up, somebody becomes aware of them as soon as possible.
P.S. I know that some people are bothered by the idea that you can keep adding time or sample to an experiment until you find a result. But, really, it doesn’t bother me one bit. Let me illustrate with a simple example. Suppose you’re studying some treatment that has a tiny effect, say 0.01 on some scale in which an effect of 1.0 would be large. And suppose there’s a lot of variability, so if you do a preregistered study you’re unlikely to get anything approaching certainty. But if you do a very careful study (so as to minimize variation) or a very large study (to get that magic 1/sqrt(n)), you’ll get a small enough confidence interval to have high certainty about the sign of the effect. So, from going from high sigma and low n, to low sigma and high n, you’ve “adding time or sample to an experiment” and you “found a result.” See what I did there? OK, this particular plan (measure carefully and get a huge sample size) is chosen ahead of time, it doesn’t involve waiting until the confidence interval excludes zero. But so what? The point is that by manipulating my experimental conditions I can change the probability of getting a conclusive result. That doesn’t bother me. In any case, when it comes to decision making, I wouldn’t use “Does the 95% interval exclude zero?” as a decision rule. That’s not Bayesian at all.
It seems to me that problems with data-based stopping and Bayesian analysis (other than the two issues I noted above) arise only because people are mixing Bayesian inference with non-Bayesian decision making. Which is fair enough—people apply these sorts of mixed methods all the time—but in that case I prefer to see the problem as arising from the non-Bayesian decision rule, not from the stopping rule or the Bayesian inference.

No comments:

Post a Comment