Testing EDITTABLE for Ethics Sample Size

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%

This is not a valid criticism because ...	This is reasonable because ...
1. For exposition purposes, Bacchetti et al. (2005) made use of the mathematical equivalence that total study value exceeds total participant burden if and only if value per participant exceeds burden per participant. This follows very simply from dividing both sides of an inequality by the same positive number (the sample size), but it seemed to cause considerable confusion. The above passage seems to conflate each participants’ altruistic satisfaction with the study’s value per participant (total study value divided by sample size). In addition, participants’ altruistic satisfaction cannot be included in the projected study value used to justify the burden accepted by participants. There must be enough projected scientific or practical value to justify the planned burden; if this value is not sufficient, then any altruistic satisfaction will be produced under false pretences. These points were also explained in the rejoinder and again in later correspondence. 2. Bacchetti et al. (2008) subsequently analyzed many other ways of projecting a study’s expected value as a function of sample size, including both Bayesian and frequentist measures based on decision theory, estimation with squared error loss, interval estimation, and information theory. All have the concave shape that justifies the Bacchetti et al. (2005) reasoning, and none show what Prentice speculated that “one might expect". Careful, detailed analysis is more reliable than the qualitative speculation that Prentice provided.

This is not a valid criticism because ...

1. For exposition purposes, Bacchetti et al. (2005) made use of the mathematical equivalence that total study value exceeds total participant burden if and only if value per participant exceeds burden per participant. This follows very simply from dividing both sides of an inequality by the same positive number (the sample size), but it seemed to cause considerable confusion. The above passage seems to conflate each participants’ altruistic satisfaction with the study’s value per participant (total study value divided by sample size). In addition, participants’ altruistic satisfaction cannot be included in the projected study value used to justify the burden accepted by participants. There must be enough projected scientific or practical value to justify the planned burden; if this value is not sufficient, then any altruistic satisfaction will be produced under false pretences. These points were also explained in the rejoinder and again in later correspondence.

2. Bacchetti et al. (2008) subsequently analyzed many other ways of projecting a study’s expected value as a function of sample size, including both Bayesian and frequentist measures based on decision theory, estimation with squared error loss, interval estimation, and information theory. All have the concave shape that justifies the Bacchetti et al. (2005) reasoning, and none show what Prentice speculated that “one might expect". Careful, detailed analysis is more reliable than the qualitative speculation that Prentice provided.

COPY OF PREVIOUS PAGE

Members of the task force on Ethical Practice of Biostatistics in Clinical and Translational Research have differing opinions on how sample size influences whether a study is ethical. This page provides a place for reasoning to be presented and updated on this issue.

Anyone is welcome to make contributions supporting what they believe on these issues.

Background
Objections to the Bacchetti et al. argument
Other points about ethics and sample size
- New point template
Gelfond et al.'s proposed definition of “underpowered”
References

Background

The assertion that having too small a sample size makes a clinical trial unethical goes back at least 35 years (Newell, 1978). In a very influential paper, Altman (1980) wrote that

… a study with a sample that is too small will be unable to detect clinically important effects. Such a study may thus be scientifically useless, and hence unethical in its use of subjects and other resources.

He also stated that “Power of 80-90% is recommended”.

Challenges to this idea asserted that it had “been rendered untenable by the rising acceptance of amalgamated evidence from many studies”, while also contradicting the “scientifically useless” claim above by asserting that “imprecise results are better than no results at all” (Edwards, et al. 1997). An important caveat was that the results of small trials must be made available to future researchers.

Halpern et al. (2003) sought to rebut these challenges, restating the original argument in somewhat more detail. They asserted that having too small a sample size “shifts the risk-benefit calculus that helps justify research in an unfavorable direction” and that “the marginal value of narrowing confidence intervals to widths still compatible with both positive and negative results generally is insufficient to justify exposing individuals to the common risks and burdens of research”. They concluded that in order for trials to be ethical one of two conditions must be met:

either enough patients will be enrolled to obtain at least 80% power to detect a clinically important effect or, if this is not possible, the researchers will be able to document a clear and practical plan to integrate the results of their trial with those of future trials.

Bacchetti et al. (2005) performed a detailed analysis of the quantitative claim, made most explicitly by Halpern et al. (2003), that studies with too small a sample size do not have enough value to justify the burdens imposed on participants. They found that statistical power and other measures of a study’s projected value all exhibit diminishing marginal returns as a function of sample size and that increasing sample size therefore can only worsen the ratio of projected value to total participant burden, which increases linearly rather than in diminishing increments. They therefore asserted:

Even assuming the controversial premise that a study’s projected value is determined only by its power, with no value from estimates, confidence intervals, or potential meta-analyses, the balance between a study’s value and the burdens accepted by its participants does not improve as the sample size increases. Thus, the argument for ethical condemnation of small studies fails even on its own terms

Subsequent work made a detailed case for diminishing marginal returns for many other measures of projected study value that have been proposed in the statistical literature for use in sample size planning (Bacchetti et al. 2008) and provided a less technical explanation (Bacchetti 2010) of the threshold myth, arguing that it underlies the argument for ethical condemnation of studies that are “underpowered” and other misconceptions about sample size planning.

In an article focused mainly on ethical issues in analysis rather than planning of studies, Gelfond et al. (2011) wrote, “Underpowered studies are not likely to yield results with practical translational value; they put subjects at unnecessary risk and waste resources.” They did not acknowledge any controversy or reference any previous work on ethics and sample size. Bacchetti et al. (2012) wrote a letter citing previous work and summarizing the argument that the value to risk ratio can only worsen as sample size increases.

Gelfond et al. (2012) replied that “several other authors have significant critiques to their [Bacchetti et al. 2005] formulation of ethicality that go beyond mere ‘misconceptions.’” Below are examinations of those critiques. These are followed by examination of other arguments for why having “too small” a sample size does or does not make a study unethical. Following that is discussion of a proposal from Gelfond et al. (2012) to define the term “underpowered” in terms of optimality and efficiency rather than just sample size.

Objections to the Bacchetti et al. argument

Value per participant

In an invited commentary, Prentice (2005) stated his main objection as

the value to a participant from his or her altruistic contribution to a definitive study of an important clinical or public health question is relatively independent of the number of trial participants. More generally, as a function of sample size, one might expect the projected value per participant to start low since there is modest benefit from a trial (in isolation) that is insufficient to affect medical or public health practice, then to be relatively constant over a range of sample sizes that have potential clinical impact, and eventually to decline beyond sample sizes where the research question will have been reliably answered … .

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%

This is not a valid criticism because ...	This is reasonable because ...
1. For exposition purposes, Bacchetti et al. (2005) made use of the mathematical equivalence that total study value exceeds total participant burden if and only if value per participant exceeds burden per participant. This follows very simply from dividing both sides of an inequality by the same positive number (the sample size), but it seemed to cause considerable confusion. The above passage seems to conflate each participants’ altruistic satisfaction with the study’s value per participant (total study value divided by sample size). In addition, participants’ altruistic satisfaction cannot be included in the projected study value used to justify the burden accepted by participants. There must be enough projected scientific or practical value to justify the planned burden; if this value is not sufficient, then any altruistic satisfaction will be produced under false pretences. These points were also explained in the rejoinder and again in later correspondence. 2. Bacchetti et al. (2008) subsequently analyzed many other ways of projecting a study’s expected value as a function of sample size, including both Bayesian and frequentist measures based on decision theory, estimation with squared error loss, interval estimation, and information theory. All have the concave shape that justifies the Bacchetti et al. (2005) reasoning, and none show what Prentice speculated that “one might expect". Careful, detailed analysis is more reliable than the qualitative speculation that Prentice provided.

This is not a valid criticism because ...

This is reasonable because ...

2. Bacchetti et al. (2008) subsequently analyzed many other ways of projecting a study’s expected value as a function of sample size, including both Bayesian and frequentist measures based on decision theory, estimation with squared error loss, interval estimation, and information theory. All have the concave shape that justifies the Bacchetti et al. (2005) reasoning, and none show what Prentice speculated that “one might expect". Careful, detailed analysis is more reliable than the qualitative speculation that Prentice provided.

Unappealing consequences

Bacchetti et al. (2005) provided a hypothetical example where burden and value were assumed to both be quantified on the same scale, and they deliberately assumed that the burden was very high so that a conventional choice producing 80% power would be unethical, while sample sizes producing 52% power or less (N<130 in the example) would have an acceptable ratio of value to participant burden. Prentice (2005) wrote of this example, “the authors’ arguments imply that only trials having power less than 52 percent are ethically defensible!” He went on to assert that this limit would even apply to a series of trials and that once 130 participants had collectively been studied, no further study would be ethical. He noted, “These types of implications seem quite unappealing and counterintuitive and cause one to question their societal value formulation.”

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%

This is not a valid criticism because ...	This is reasonable because ...
1. The implications of the Bacchetti et al. analysis may indeed seem counterintuitive and unappealing to those who have long asserted the opposite, but the idea that high burden should reduce sample size does not seem impossible to everyone. In animal research (where burden on subjects is often very high) the downward pressure on sample size from ethical considerations is well recognized. Or imagine planning a human study and discovering a series of increasing possible harms to subjects—how would plans change? The limiting case (say, certain death of all participants) is a sample size of zero, not infinity or no change as the projected burden increases. The optimal sample size can only decrease when projected burden (or any uniform per-participant cost) increases; see Bacchetti et al. (2008), proposition 4. 2. Finding the consequences unappealing is not in itself a counterargument, and starting with an assumption that an argument “must be” wrong does not promote clear thinking. At least 4 eminent statistical thinkers seemed to question the obvious mathematical validity of the equivalence noted above, three of them even after the rejoinder emphasized that “it is a simple mathematical fact that comparing study value with total participant burden is equivalent to comparing value per participant (as we defined it) with burden per participant.” 3. The Bacchetti et al. reasoning does not impose a cap at the outset on a series of trials, not even in the hypothetical case presented. The results of earlier trials can shift the projected value of future trials up, justifying further investigation. 4. The general result is that a smaller trial is ethically acceptable whenever an otherwise identical larger trial would be. The primary point is to identify what cannot be condemned as unethical, and identification of studies that are unethical due to excessive sample size has no corresponding simple, general guideline.

This is not a valid criticism because ...

This is reasonable because ...

1. The implications of the Bacchetti et al. analysis may indeed seem counterintuitive and unappealing to those who have long asserted the opposite, but the idea that high burden should reduce sample size does not seem impossible to everyone. In animal research (where burden on subjects is often very high) the downward pressure on sample size from ethical considerations is well recognized. Or imagine planning a human study and discovering a series of increasing possible harms to subjects—how would plans change? The limiting case (say, certain death of all participants) is a sample size of zero, not infinity or no change as the projected burden increases. The optimal sample size can only decrease when projected burden (or any uniform per-participant cost) increases; see Bacchetti et al. (2008), proposition 4.

2. Finding the consequences unappealing is not in itself a counterargument, and starting with an assumption that an argument “must be” wrong does not promote clear thinking. At least 4 eminent statistical thinkers seemed to question the obvious mathematical validity of the equivalence noted above, three of them even after the rejoinder emphasized that “it is a simple mathematical fact that comparing study value with total participant burden is equivalent to comparing value per participant (as we defined it) with burden per participant.”

3. The Bacchetti et al. reasoning does not impose a cap at the outset on a series of trials, not even in the hypothetical case presented. The results of earlier trials can shift the projected value of future trials up, justifying further investigation.

4. The general result is that a smaller trial is ethically acceptable whenever an otherwise identical larger trial would be. The primary point is to identify what cannot be condemned as unethical, and identification of studies that are unethical due to excessive sample size has no corresponding simple, general guideline.

They botched the arithmetic

Halpern et al. (2005) argued that Bacchetti et al. (2005) had calculated study benefit incorrectly, writing “The resultant net social benefit is thus the product of the value of the new treatment per patient and the number of those afflicted (including the study participants), rather than the quotient.”

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%

This is not a valid criticism because ...	This is reasonable because ...
Bacchetti et al. (2005) did not divide total study value by the number of afflicted patients, but rather by the number of study participants, utilizing the mathematical equivalence noted above. The response used Halpern et al.’s definition of value to illustrate the validity of the original reasoning.

Altruistic satisfaction

Related to the quote from Prentice above, Halpern et al. (2005) wrote:

because people commonly participate in research for altruistic reasons, and because additional participants increase the probability that a social benefit is obtained, each participant’s expected individual benefit increases with larger sample sizes. If a new treatment is proven effective, each participant’s altruistic motives are rewarded in full; if it is not, and the study was underpowered, then none are rewarded at all. On the other hand, if an adequately powered trial determines that a clinically important benefit is unlikely (recognizing the impossibility of ‘‘proving’’ the null hypothesis), then altruistic motives are still rewarded. Assuming, as Bacchetti et al. (1) do, that the average burden per participant is constant across all possible sample sizes results in an improved risk-benefit ratio for individual research participants as the sample size increases.

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%

This is not a valid criticism because ...	This is reasonable because ...
1. The primary purpose of research is not to satisfy participants but to produce knowledge and contribute to societal benefit. Indeed, this is what motivates participants’ altruism. The rejoinder had already stated: “we believe that participants’ altruistic satisfaction cannot be included as part of the study’s value, as Prentice suggests. There must already be a net benefit to justify the participants’ altruism. Potential participants certainly weigh altruistic motives when deciding whether to volunteer, but this is distinct from the assessment of ethical acceptability at the planning and approval stages. Indeed, participants are entitled to assume that experts have already judged that the potential scientific or clinical benefits outweigh the burdens they are about to shoulder.” The response to Halpern et al. tried to emphasize this point further: “We do not believe that participants’ altruistic satisfaction is relevant for assessing sample size at the planning and approval stages. If a study fails on a criterion such as inequality 1 [Study value > total burden], no amount of participant satisfaction can make it acceptable; there is no real net benefit, and any satisfaction would be created under false pretenses.” 2. In addition, Halpern et al. (and Prentice) were just speculating about what participants value. The response went on to say: “In addition, there is no reason for research ethics committees to preempt participants’ own decisions about what gives them satisfaction. Our analysis shows that it is not irrational for participants to derive satisfaction from contributing to a small study, because their personal contribution to the value produced will on average be more than if they had constituted a smaller fraction of the total sample size in a comparable larger study. Disapproving a study because a committee thinks it is too small to reward participants’ altruistic motives would therefore fail to show proper respect for participant autonomy.”

This is not a valid criticism because ...

This is reasonable because ...

1. The primary purpose of research is not to satisfy participants but to produce knowledge and contribute to societal benefit. Indeed, this is what motivates participants’ altruism. The rejoinder had already stated:
“we believe that participants’ altruistic satisfaction cannot be included as part of the study’s value, as Prentice suggests. There must already be a net benefit to justify the participants’ altruism. Potential participants certainly weigh altruistic motives when deciding whether to volunteer, but this is distinct from the assessment of ethical acceptability at the planning and approval stages. Indeed, participants are entitled to assume that experts have already judged that the potential scientific or clinical benefits outweigh the burdens they are about to shoulder.”

The response to Halpern et al. tried to emphasize this point further:
“We do not believe that participants’ altruistic satisfaction is relevant for assessing sample size at the planning and approval stages. If a study fails on a criterion such as inequality 1 [Study value > total burden], no amount of participant satisfaction can make it acceptable; there is no real net benefit, and any satisfaction would be created under false pretenses.”

2. In addition, Halpern et al. (and Prentice) were just speculating about what participants value.
The response went on to say:

“In addition, there is no reason for research ethics committees to preempt participants’ own decisions about what gives them satisfaction. Our analysis shows that it is not irrational for participants to derive satisfaction from contributing to a small study, because their personal contribution to the value produced will on average be more than if they had constituted a smaller fraction of the total sample size in a comparable larger study. Disapproving a study because a committee thinks it is too small to reward participants’ altruistic motives would therefore fail to show proper respect for participant autonomy.”

New point template

Overview text here

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%

Heading for one side here ...	Heading for other side here ... .

Required definition of "underpowered"

Gelfond et al. (2012) wrote that “their argument hinges on the definition of the term underpowered, which they have interpreted to mean that the power of a study is less than some arbitrary value such as 80%.”

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%

This is not a valid criticism because ...	This is reasonable because ...
The Bacchetti et al. argument requires only a minimal, and reasonable, assumption about what “underpowered” is intended to mean--it assumes that an “underpowered” study has a smaller sample size than an otherwise identical “adequately” powered one. It does not depend on any particular cutoff or that a cutoff be the same for all studies. The statement that “the ratio of study value to participant burden can only worsen as sample size increases” (Bacchetti et al. 2012) mirrors the continuous formulation in the original publication (Bacchetti et al. 2005) and does not assume any arbitrary cutoff.

This is not a valid criticism because ...

This is reasonable because ...

The Bacchetti et al. argument requires only a minimal, and reasonable, assumption about what “underpowered” is intended to mean--it assumes that an “underpowered” study has a smaller sample size than an otherwise identical “adequately” powered one. It does not depend on any particular cutoff or that a cutoff be the same for all studies. The statement that “the ratio of study value to participant burden can only worsen as sample size increases” (Bacchetti et al. 2012) mirrors the continuous formulation in the original publication (Bacchetti et al. 2005) and does not assume any arbitrary cutoff.

There is a lower limit

Gelfond et al. (2012) wrote that “we disagree that there is no lower limit for sample size that would result in a study being ‘underpowered’ by some measure.”

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%

This is not a valid criticism because ...	This is reasonable because ...
1. It is unclear what “by some measure” is intended to mean. Certainly the conventional definition of <80% imposes a lower limit, but a consequence of the Bacchetti et al. (2005) argument is that no such limit is valid for branding a sample size as unethical. 2. They did not seem to provide any reasoning explaining why they do not agree.

This is not a valid criticism because ...

This is reasonable because ...

1. It is unclear what “by some measure” is intended to mean. Certainly the conventional definition of <80% imposes a lower limit, but a consequence of the Bacchetti et al. (2005) argument is that no such limit is valid for branding a sample size as unethical.

2. They did not seem to provide any reasoning explaining why they do not agree.

}

Other points about ethics and sample size

New point template

Overview text here

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%

Heading for one side here ...	Heading for other side here ...

Gelfond et al.'s proposed definition of “underpowered”

Gelfond et al. (2012) objected to “straw men” raised by Bacchetti et al. (2012), writing that “the article does not specify their definition of underpowered.” They went on to write:

power is determined by the complete study design that includes many factors other than sample size, and one could define underpowered designs as having less power than the optimal feasible design, where the optimal design is determined by some efficiency criterion. Given this definition of underpowered, we could revise our statement in the article (edits in italics) to ‘Underpowered studies are less likely to yield results with practical translational value; they may both put subjects at unnecessary risk and waste resources.’

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" }%

This is not a reasonable defense of what they originally stated nor a reasonable proposal because ...	This is reasonable because ...
1. Gelfond et al. (2011) did not give any definition of “underpowered”, so a reader would naturally assume that it means having too small a sample size, because this is how the term is generally understood, it is what previous writings have meant (Edwards 1997; Halpern et al. 2003; Bacchetti et al. 2005), and they used the term in a paragraph that was discussing “sample size estimation”. This seems like a post-hoc switch in definition. 2. This definition makes the revised statement boil down to saying that inefficient studies are inefficient. This is true, but it seems to shed little light on the original issues. 3. This proposal is inconsistent with their advocacy of power calculations for sample size planning. Because the impact of sample size on a study’s projected value is a concave function with no threshold, optimization cannot be performed without considering costs and other drawbacks of increasing sample size. Conventional power-based sample size calculations ignore those considerations and so cannot accomplish what Gelfond et al. seem to be advocating. 4. Optimality seems an unreasonable standard for what is ethical. Surely a study that is merely good rather than perfect is still ethically acceptable. There will almost always be some tweak to inclusion criteria, control conditions, etc. that could produce a slightly more efficient study.

This is not a reasonable defense of what they originally stated nor a reasonable proposal because ...

This is reasonable because ...

1. Gelfond et al. (2011) did not give any definition of “underpowered”, so a reader would naturally assume that it means having too small a sample size, because this is how the term is generally understood, it is what previous writings have meant (Edwards 1997; Halpern et al. 2003; Bacchetti et al. 2005), and they used the term in a paragraph that was discussing “sample size estimation”. This seems like a post-hoc switch in definition.

2. This definition makes the revised statement boil down to saying that inefficient studies are inefficient. This is true, but it seems to shed little light on the original issues.

3. This proposal is inconsistent with their advocacy of power calculations for sample size planning. Because the impact of sample size on a study’s projected value is a concave function with no threshold, optimization cannot be performed without considering costs and other drawbacks of increasing sample size. Conventional power-based sample size calculations ignore those considerations and so cannot accomplish what Gelfond et al. seem to be advocating.

4. Optimality seems an unreasonable standard for what is ethical. Surely a study that is merely good rather than perfect is still ethically acceptable. There will almost always be some tweak to inclusion criteria, control conditions, etc. that could produce a slightly more efficient study.

References

Altman, D. G. (1980). Statistics and ethics in medical research 3: How large a sample? British Medical Journal 281, 1336-1338. Available here.

Bacchetti, P. (2010). Current sample size conventions: Flaws, harms, and alternatives. BMC Medicine 8, 17. Available here. Ctspedia version here.

Bacchetti, P., McCulloch, C. E., and Segal, M. R. (2008). Simple, defensible sample sizes based on cost efficiency (with discussion and rejoinder). Biometrics 64, 577-594. Available here.

Bacchetti, P., McCulloch, C., and Segal, M. R. (2012). Being ‘underpowered' does not make a study unethical. Statistics in Medicine 31, 4138-4139.

Bacchetti, P., Wolf, L. E., Segal, M. R., and McCulloch, C. E. (2005). Ethics and sample size. American Journal of Epidemiology 161, 105-110. Available here.

Edwards, S. J. L., Lilford, R. J., Braunholtz, D., and Jackson, J. (1997). Why ''underpowered'' trials are not necessarily unethical. Lancet 350, 804-807.

Gelfond, J. A. L., Heitman, E., Pollock, B. H., and Klugman, C. M. (2011). Principles for the ethical analysis of clinical and translational research. Statistics in Medicine 30, 2785-2792.

Gelfond, J. A., Heitman, E., Pollock, B. H., and Klugman, C. H. (2012). Power, ethics, and obligation Authors' Reply. Statistics in Medicine 31, 4140-4141.

Halpern, S. D., Karlawish, J. H. T., and Berlin, J. A. (2002). The continuing unethical conduct of underpowered clinical trials. Journal of the American Medical Association 288, 358-362.

Halpern, S. D., Karlawish, J. H. T., and Berlin, J. A. (2005). Re: “Ethics and sample size”. American Journal of Epidemiology 162, 195-196. Available here .

Horrobin, D. F. (2003). Are large clinical trials in rapidly lethal diseases usually unethical? Lancet 361, 695-697.

Newell, D.J. (1978). Type II errors and ethics. Br Med J. 2:1789–1789. Available here.

Prentice, R. L. (2005). Ethics and sample size—Another view. American Journal of Epidemiology 161, 111-112. Available here.