This is not a valid criticism because ... | This is reasonable because ... |
---|---|
1. For exposition purposes, Bacchetti et al. (2005) made use of the mathematical equivalence that total study value exceeds total participant burden if and only if value per participant exceeds burden per participant. This follows very simply from dividing both sides of an inequality by the same positive number (the sample size), but it seemed to cause considerable confusion. The above passage seems to conflate each participants’ altruistic satisfaction with the study’s value per participant (total study value divided by sample size). In addition, participants’ altruistic satisfaction cannot be included in the projected study value used to justify the burden accepted by participants. There must be enough projected scientific or practical value to justify the planned burden; if this value is not sufficient, then any altruistic satisfaction will be produced under false pretences. These points were also explained in the rejoinder and again in later correspondence. 2. Bacchetti et al. (2008) subsequently analyzed many other ways of projecting a study’s expected value as a function of sample size, including both Bayesian and frequentist measures based on decision theory, estimation with squared error loss, interval estimation, and information theory. All have the concave shape that justifies the Bacchetti et al. (2005) reasoning, and none show what Prentice speculated that “one might expect". Careful, detailed analysis is more reliable than the qualitative speculation that Prentice provided. |
… a study with a sample that is too small will be unable to detect clinically important effects. Such a study may thus be scientifically useless, and hence unethical in its use of subjects and other resources.He also stated that “Power of 80-90% is recommended”. Challenges to this idea asserted that it had “been rendered untenable by the rising acceptance of amalgamated evidence from many studies”, while also contradicting the “scientifically useless” claim above by asserting that “imprecise results are better than no results at all” (Edwards, et al. 1997). An important caveat was that the results of small trials must be made available to future researchers. Halpern et al. (2003) sought to rebut these challenges, restating the original argument in somewhat more detail. They asserted that having too small a sample size “shifts the risk-benefit calculus that helps justify research in an unfavorable direction” and that “the marginal value of narrowing confidence intervals to widths still compatible with both positive and negative results generally is insufficient to justify exposing individuals to the common risks and burdens of research”. They concluded that in order for trials to be ethical one of two conditions must be met:
either enough patients will be enrolled to obtain at least 80% power to detect a clinically important effect or, if this is not possible, the researchers will be able to document a clear and practical plan to integrate the results of their trial with those of future trials.Bacchetti et al. (2005) performed a detailed analysis of the quantitative claim, made most explicitly by Halpern et al. (2003), that studies with too small a sample size do not have enough value to justify the burdens imposed on participants. They found that statistical power and other measures of a study’s projected value all exhibit diminishing marginal returns as a function of sample size and that increasing sample size therefore can only worsen the ratio of projected value to total participant burden, which increases linearly rather than in diminishing increments. They therefore asserted:
Even assuming the controversial premise that a study’s projected value is determined only by its power, with no value from estimates, confidence intervals, or potential meta-analyses, the balance between a study’s value and the burdens accepted by its participants does not improve as the sample size increases. Thus, the argument for ethical condemnation of small studies fails even on its own termsSubsequent work made a detailed case for diminishing marginal returns for many other measures of projected study value that have been proposed in the statistical literature for use in sample size planning (Bacchetti et al. 2008) and provided a less technical explanation (Bacchetti 2010) of the threshold myth, arguing that it underlies the argument for ethical condemnation of studies that are “underpowered” and other misconceptions about sample size planning. In an article focused mainly on ethical issues in analysis rather than planning of studies, Gelfond et al. (2011) wrote, “Underpowered studies are not likely to yield results with practical translational value; they put subjects at unnecessary risk and waste resources.” They did not acknowledge any controversy or reference any previous work on ethics and sample size. Bacchetti et al. (2012) wrote a letter citing previous work and summarizing the argument that the value to risk ratio can only worsen as sample size increases. Gelfond et al. (2012) replied that “several other authors have significant critiques to their [Bacchetti et al. 2005] formulation of ethicality that go beyond mere ‘misconceptions.’” Below are examinations of those critiques. These are followed by examination of other arguments for why having “too small” a sample size does or does not make a study unethical. Following that is discussion of a proposal from Gelfond et al. (2012) to define the term “underpowered” in terms of optimality and efficiency rather than just sample size.
the value to a participant from his or her altruistic contribution to a definitive study of an important clinical or public health question is relatively independent of the number of trial participants. More generally, as a function of sample size, one might expect the projected value per participant to start low since there is modest benefit from a trial (in isolation) that is insufficient to affect medical or public health practice, then to be relatively constant over a range of sample sizes that have potential clinical impact, and eventually to decline beyond sample sizes where the research question will have been reliably answered … .%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%
This is not a valid criticism because ... | This is reasonable because ... |
---|---|
1. For exposition purposes, Bacchetti et al. (2005) made use of the mathematical equivalence that total study value exceeds total participant burden if and only if value per participant exceeds burden per participant. This follows very simply from dividing both sides of an inequality by the same positive number (the sample size), but it seemed to cause considerable confusion. The above passage seems to conflate each participants’ altruistic satisfaction with the study’s value per participant (total study value divided by sample size). In addition, participants’ altruistic satisfaction cannot be included in the projected study value used to justify the burden accepted by participants. There must be enough projected scientific or practical value to justify the planned burden; if this value is not sufficient, then any altruistic satisfaction will be produced under false pretences. These points were also explained in the rejoinder and again in later correspondence. 2. Bacchetti et al. (2008) subsequently analyzed many other ways of projecting a study’s expected value as a function of sample size, including both Bayesian and frequentist measures based on decision theory, estimation with squared error loss, interval estimation, and information theory. All have the concave shape that justifies the Bacchetti et al. (2005) reasoning, and none show what Prentice speculated that “one might expect". Careful, detailed analysis is more reliable than the qualitative speculation that Prentice provided. |
|
This is not a valid criticism because ... | This is reasonable because ... |
---|---|
1. The implications of the Bacchetti et al. analysis may indeed seem counterintuitive and unappealing to those who have long asserted the opposite, but the idea that high burden should reduce sample size does not seem impossible to everyone. In animal research (where burden on subjects is often very high) the downward pressure on sample size from ethical considerations is well recognized. Or imagine planning a human study and discovering a series of increasing possible harms to subjects—how would plans change? The limiting case (say, certain death of all participants) is a sample size of zero, not infinity or no change as the projected burden increases. The optimal sample size can only decrease when projected burden (or any uniform per-participant cost) increases; see Bacchetti et al. (2008), proposition 4.
|
|
This is not a valid criticism because ... | This is reasonable because ... |
---|---|
Bacchetti et al. (2005) did not divide total study value by the number of afflicted patients, but rather by the number of study participants, utilizing the mathematical equivalence noted above. The response used Halpern et al.’s definition of value to illustrate the validity of the original reasoning. | |
because people commonly participate in research for altruistic reasons, and because additional participants increase the probability that a social benefit is obtained, each participant’s expected individual benefit increases with larger sample sizes. If a new treatment is proven effective, each participant’s altruistic motives are rewarded in full; if it is not, and the study was underpowered, then none are rewarded at all. On the other hand, if an adequately powered trial determines that a clinically important benefit is unlikely (recognizing the impossibility of ‘‘proving’’ the null hypothesis), then altruistic motives are still rewarded. Assuming, as Bacchetti et al. (1) do, that the average burden per participant is constant across all possible sample sizes results in an improved risk-benefit ratio for individual research participants as the sample size increases.%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%
This is not a valid criticism because ... | This is reasonable because ... |
---|---|
1. The primary purpose of research is not to satisfy participants but to produce knowledge and contribute to societal benefit. Indeed, this is what motivates participants’ altruism. The rejoinder had already stated:
“In addition, there is no reason for research ethics committees to preempt participants’ own decisions about what gives them satisfaction. Our analysis shows that it is not irrational for participants to derive satisfaction from contributing to a small study, because their personal contribution to the value produced will on average be more than if they had constituted a smaller fraction of the total sample size in a comparable larger study. Disapproving a study because a committee thinks it is too small to reward participants’ altruistic motives would therefore fail to show proper respect for participant autonomy.” |
|
Heading for one side here ... | Heading for other side here ... . |
---|---|
This is not a valid criticism because ... | This is reasonable because ... |
---|---|
The Bacchetti et al. argument requires only a minimal, and reasonable, assumption about what “underpowered” is intended to mean--it assumes that an “underpowered” study has a smaller sample size than an otherwise identical “adequately” powered one. It does not depend on any particular cutoff or that a cutoff be the same for all studies. The statement that “the ratio of study value to participant burden can only worsen as sample size increases” (Bacchetti et al. 2012) mirrors the continuous formulation in the original publication (Bacchetti et al. 2005) and does not assume any arbitrary cutoff. | |
This is not a valid criticism because ... | This is reasonable because ... | ||
---|---|---|---|
1. It is unclear what “by some measure” is intended to mean. Certainly the conventional definition of <80% imposes a lower limit, but a consequence of the Bacchetti et al. (2005) argument is that no such limit is valid for branding a sample size as unethical.
|
Heading for one side here ... | Heading for other side here ... |
---|---|
power is determined by the complete study design that includes many factors other than sample size, and one could define underpowered designs as having less power than the optimal feasible design, where the optimal design is determined by some efficiency criterion. Given this definition of underpowered, we could revise our statement in the article (edits in italics) to ‘Underpowered studies are less likely to yield results with practical translational value; they may both put subjects at unnecessary risk and waste resources.’%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%
This is not a reasonable defense of what they originally stated nor a reasonable proposal because ... | This is reasonable because ... |
---|---|
1. Gelfond et al. (2011) did not give any definition of “underpowered”, so a reader would naturally assume that it means having too small a sample size, because this is how the term is generally understood, it is what previous writings have meant (Edwards 1997; Halpern et al. 2003; Bacchetti et al. 2005), and they used the term in a paragraph that was discussing “sample size estimation”. This seems like a post-hoc switch in definition. 2. This definition makes the revised statement boil down to saying that inefficient studies are inefficient. This is true, but it seems to shed little light on the original issues. 3. This proposal is inconsistent with their advocacy of power calculations for sample size planning. Because the impact of sample size on a study’s projected value is a concave function with no threshold, optimization cannot be performed without considering costs and other drawbacks of increasing sample size. Conventional power-based sample size calculations ignore those considerations and so cannot accomplish what Gelfond et al. seem to be advocating. 4. Optimality seems an unreasonable standard for what is ethical. Surely a study that is merely good rather than perfect is still ethically acceptable. There will almost always be some tweak to inclusion criteria, control conditions, etc. that could produce a slightly more efficient study. |
|