Return to Home Page Ethics in CTS Biostatistics

Ethics and Sample Size

Contributors to CTSpedia have differing opinions on how sample size influences whether a study is ethical. This page provides a place for reasoning to be presented and updated on this issue.

Anyone is welcome to make contributions supporting what they believe on these issues. Click on "Edit" button below a table to add or modify text within an exisiting point, or see instructions for how to add other material or new entries.

Background

The assertion that having too small a sample size makes a clinical trial unethical goes back at least several decades (Newell, 1978). In a very influential paper, Altman (1980) wrote that
… a study with a sample that is too small will be unable to detect clinically important effects. Such a study may thus be scientifically useless, and hence unethical in its use of subjects and other resources.
He also stated that “Power of 80-90% is recommended”.

Challenges to this idea asserted that it had “been rendered untenable by the rising acceptance of amalgamated evidence from many studies”, while also contradicting the “scientifically useless” claim above by asserting that “imprecise results are better than no results at all” (Edwards, et al. 1997). An important caveat was that the results of small trials must be made available to future researchers.

Halpern et al. (2002) sought to rebut these challenges, restating the original argument in somewhat more detail. They asserted that having too small a sample size “shifts the risk-benefit calculus that helps justify research in an unfavorable direction” and that “the marginal value of narrowing confidence intervals to widths still compatible with both positive and negative results generally is insufficient to justify exposing individuals to the common risks and burdens of research”. They concluded that in order for trials to be ethical one of two conditions must be met:
either enough patients will be enrolled to obtain at least 80% power to detect a clinically important effect or, if this is not possible, the researchers will be able to document a clear and practical plan to integrate the results of their trial with those of future trials.
Bacchetti et al. (2005) performed a detailed analysis of the quantitative claim, made most explicitly by Halpern et al. (2002), that studies with too small a sample size do not have enough value to justify the burdens imposed on participants. They found that statistical power and other measures of a study’s projected value all exhibit diminishing marginal returns as a function of sample size and that increasing sample size therefore can only worsen the ratio of projected value to total participant burden, which increases linearly rather than in diminishing increments. They therefore asserted:
Even assuming the controversial premise that a study’s projected value is determined only by its power, with no value from estimates, confidence intervals, or potential meta-analyses, the balance between a study’s value and the burdens accepted by its participants does not improve as the sample size increases. Thus, the argument for ethical condemnation of small studies fails even on its own terms
Subsequent work made a detailed case for diminishing marginal returns for many other measures of projected study value that have been proposed in the statistical literature for use in sample size planning (Bacchetti et al. 2008) and provided a less technical explanation (Bacchetti 2010) of the threshold myth, arguing that it underlies the argument for ethical condemnation of studies that are “underpowered” and other misconceptions about sample size planning.

In an article focused mainly on ethical issues in analysis rather than planning of studies, Gelfond et al. (2011) wrote, “Underpowered studies are not likely to yield results with practical translational value; they put subjects at unnecessary risk and waste resources.” They did not acknowledge any controversy or reference any previous work on ethics and sample size. Bacchetti et al. (2012) wrote a letter citing previous work and summarizing the argument that the value to risk ratio can only worsen as sample size increases.

Gelfond et al. (2012) replied that “several other authors have significant critiques to their [Bacchetti et al. 2005] formulation of ethicality that go beyond mere ‘misconceptions.’” Below are examinations of those critiques. These are followed by examination of other arguments for why having “too small” a sample size does or does not make a study unethical. Following that is discussion of a proposal from Gelfond et al. (2012) to define the term “underpowered” in terms of optimality and efficiency rather than just sample size.

Objections to the Bacchetti et al. argument

Value per participant

In an invited commentary, Prentice (2005) stated his main objection as
the value to a participant from his or her altruistic contribution to a definitive study of an important clinical or public health question is relatively independent of the number of trial participants. More generally, as a function of sample size, one might expect the projected value per participant to start low since there is modest benefit from a trial (in isolation) that is insufficient to affect medical or public health practice, then to be relatively constant over a range of sample sizes that have potential clinical impact, and eventually to decline beyond sample sizes where the research question will have been reliably answered … .
%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not a valid criticism because ... This is reasonable because ...

1. For exposition purposes, Bacchetti et al. (2005) made use of the mathematical equivalence that total study value exceeds total participant burden if and only if value per participant exceeds burden per participant. This follows very simply from dividing both sides of an inequality by the same positive number (the sample size), but it seemed to cause considerable confusion. The above passage seems to conflate each participants’ altruistic satisfaction with the study’s value per participant (total study value divided by sample size). In addition, participants’ altruistic satisfaction cannot be included in the projected study value used to justify the burden accepted by participants. There must be enough projected scientific or practical value to justify the planned burden; if this value is not sufficient, then any altruistic satisfaction will be produced under false pretences. These points were also explained in the rejoinder and again in later correspondence.

2. Bacchetti et al. (2008) subsequently analyzed many other ways of projecting a study’s expected value as a function of sample size, including both Bayesian and frequentist measures based on decision theory, estimation with squared error loss, interval estimation, and information theory. All have the concave shape that justifies the Bacchetti et al. (2005) reasoning, and none show what Prentice speculated that “one might expect". Careful, detailed analysis is more reliable than the qualitative speculation that Prentice provided.

Type or paste reasoning on the other side here, overwriting these instructions.

Unappealing consequences

Bacchetti et al. (2005) provided a hypothetical example where burden and value were assumed to both be quantified on the same scale, and they deliberately assumed that the burden was very high so that a conventional choice producing 80% power would be unethical, while sample sizes producing 52% power or less (N<130 in the example) would have an acceptable ratio of value to participant burden. Prentice (2005) wrote of this example, “the authors’ arguments imply that only trials having power less than 52 percent are ethically defensible!” He went on to assert that this limit would even apply to a series of trials and that once 130 participants had collectively been studied, no further study would be ethical. He noted, “These types of implications seem quite unappealing and counterintuitive and cause one to question their societal value formulation.” %EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not a valid criticism because ... This is reasonable because ...
1. The implications of the Bacchetti et al. analysis may indeed seem counterintuitive and unappealing to those who have long asserted the opposite, but the idea that high burden should reduce sample size does not seem impossible to everyone. In animal research (where burden on subjects is often very high) the downward pressure on sample size from ethical considerations is well recognized. Or imagine planning a human study and discovering a series of increasing possible harms to subjects—how would plans change? The limiting case (say, certain death of all participants) is a sample size of zero, not infinity or no change as the projected burden increases. The optimal sample size can only decrease when projected burden (or any uniform per-participant cost) increases; see Bacchetti et al. (2008), proposition 4.

2. Finding the consequences unappealing is not in itself a counterargument, and starting with an assumption that an argument “must be” wrong does not promote clear thinking. At least 4 eminent statistical thinkers seemed to question the obvious mathematical validity of the equivalence noted above, three of them even after the rejoinder emphasized that “it is a simple mathematical fact that comparing study value with total participant burden is equivalent to comparing value per participant (as we defined it) with burden per participant.”

3. The Bacchetti et al. reasoning does not impose a cap at the outset on a series of trials, not even in the hypothetical case presented. The results of earlier trials can shift the projected value of future trials up, justifying further investigation.

4. The general result is that a smaller trial is ethically acceptable whenever an otherwise identical larger trial would be. The primary point is to identify what cannot be condemned as unethical, and identification of studies that are unethical due to excessive sample size has no corresponding simple, general guideline.
Type or paste reasoning on the other side here, overwriting these instructions.

They botched the arithmetic

Halpern et al. (2005) argued that Bacchetti et al. (2005) had calculated study benefit incorrectly, writing “The resultant net social benefit is thus the product of the value of the new treatment per patient and the number of those afflicted (including the study participants), rather than the quotient.”

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not a valid criticism because ... This is reasonable because ...
Bacchetti et al. (2005) did not divide total study value by the number of afflicted patients, but rather by the number of study participants, utilizing the mathematical equivalence noted above. The response used Halpern et al.’s definition of value to illustrate the validity of the original reasoning. Type or paste reasoning on the other side here, overwriting these instructions.

Altruistic satisfaction and participant expectations

Related to the quote from Prentice above, Halpern et al. (2005) wrote:
because people commonly participate in research for altruistic reasons, and because additional participants increase the probability that a social benefit is obtained, each participant’s expected individual benefit increases with larger sample sizes. If a new treatment is proven effective, each participant’s altruistic motives are rewarded in full; if it is not, and the study was underpowered, then none are rewarded at all. On the other hand, if an adequately powered trial determines that a clinically important benefit is unlikely (recognizing the impossibility of ‘‘proving’’ the null hypothesis), then altruistic motives are still rewarded. Assuming, as Bacchetti et al. (1) do, that the average burden per participant is constant across all possible sample sizes results in an improved risk-benefit ratio for individual research participants as the sample size increases.
%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not a valid criticism because ... This is reasonable because ...

1. The primary purpose of research is not to satisfy participants but to produce knowledge and contribute to societal benefit. Indeed, this is what motivates participants’ altruism. The rejoinder had already stated:
“we believe that participants’ altruistic satisfaction cannot be included as part of the study’s value, as Prentice suggests. There must already be a net benefit to justify the participants’ altruism. Potential participants certainly weigh altruistic motives when deciding whether to volunteer, but this is distinct from the assessment of ethical acceptability at the planning and approval stages. Indeed, participants are entitled to assume that experts have already judged that the potential scientific or clinical benefits outweigh the burdens they are about to shoulder.”


The response to Halpern et al. tried to emphasize this point further:
“We do not believe that participants’ altruistic satisfaction is relevant for assessing sample size at the planning and approval stages. If a study fails on a criterion such as inequality 1 [Study value > total burden], no amount of participant satisfaction can make it acceptable; there is no real net benefit, and any satisfaction would be created under false pretenses.”

2. In addition, Halpern et al. (and Prentice) were just speculating about what participants value.
The response went on to say:

“In addition, there is no reason for research ethics committees to preempt participants’ own decisions about what gives them satisfaction. Our analysis shows that it is not irrational for participants to derive satisfaction from contributing to a small study, because their personal contribution to the value produced will on average be more than if they had constituted a smaller fraction of the total sample size in a comparable larger study. Disapproving a study because a committee thinks it is too small to reward participants’ altruistic motives would therefore fail to show proper respect for participant autonomy.”

In fact, Horrobin (2003) argued that patients with rapidly fatal diseases would generally prefer to contribute to a study that has a small chance of finding a large benefit than one that has a large chance of finding a small benefit (i.e., the typical large clinical trial). People are very diverse in how they value different risk-reward trade-offs and in how they feel about being a small part of a large effort versus a relatively larger part of a smaller effort. Presuming how all participants must feel does not result in valid ethical guidelines.

Type or paste reasoning on the other side here, overwriting these instructions.

Required definition of "underpowered"

Gelfond et al. (2012) wrote that “their argument hinges on the definition of the term underpowered, which they have interpreted to mean that the power of a study is less than some arbitrary value such as 80%.”

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not a valid criticism because ... This is reasonable because ...
The Bacchetti et al. argument requires only a minimal, and reasonable, assumption about what “underpowered” is intended to mean--it assumes that an “underpowered” study has a smaller sample size than an otherwise identical “adequately” powered one. It does not depend on any particular cutoff or that a cutoff be the same for all studies. The statement that “the ratio of study value to participant burden can only worsen as sample size increases” (Bacchetti et al. 2012) mirrors the continuous formulation in the original publication (Bacchetti et al. 2005) and does not assume any arbitrary cutoff. Type or paste reasoning on the other side here, overwriting these instructions.

There is a lower limit

Gelfond et al. (2012) wrote that “we disagree that there is no lower limit for sample size that would result in a study being ‘underpowered’ by some measure.”

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not a valid objection because ... This is a valid objection because ...
1. It is unclear what “by some measure” is intended to mean. Certainly the conventional definition of <80% power imposes a lower limit, but a consequence of the Bacchetti et al. (2005) argument is that no such limit is valid for branding a sample size as unethical.

2. They did not seem to provide any reasoning explaining why they do not agree.
Type or paste reasoning on the other side here, overwriting these instructions.

Desire for definitive results

Gelfond et al. (2012) quoted a phrase from one of the discussions of Bacchetti et al. (2008), noting that for phase III trials “there is usually strong interest in obtaining a definitive result.” They did not indicate why they considered this important or how it challenged the logic or premises of the reasoning in Bacchetti et al. (2005). The original discussion piece (Simon, 2008), also did not seem to provide any detail about why he considered this relevant.

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not a valid challenge because ... Column heading for other side (overwrite this text) ...
1. Desires do nothing to change the quantitative realities used in the Bacchetti et al. (2005) reasoning.

2. If this is referring to participants’ desires to contribute to a definitive study, the points above about altruistic satisfaction are relevant. In addition, it is always unethical to promise participants a definitive result. Any study can produce an estimated effect in the gray zone between what is large enough to be important and what is small enough to ignore. High power does not preclude this, and increasing power may not make it any less likely (e.g., if the true state of nature is in the gray zone).
Type or paste reasoning on the other side here, overwriting these instructions.

Other reasoning concerning ethics and sample size

Associated characteristics

Many critics of studies with small sample size have noted undesirable characteristics that are more common in small studies than in large ones. These may include poor quality (including susceptibility to bias), selective publication of results, and misinterpretation of negative results.

Button et al. (2013b) noted that there are “biases that have been empirically documented to be far more prevalent in very small studies than in larger studies”, and they went on to state
We agree that it would be wonderful if small studies and their research environment were devoid of biases and if all small studies on a particular question of interest could be perfectly integrated. However, this has not happened…
A major concern is the possibility of failing to publish (or otherwise disseminate) negative or inconclusive results. This may occur more commonly with small studies, leaving a selected, biased set of results available to researchers and distorting the scientific literature. Halpern et al. (2002) emphasized this problem in their discussion of meta analysis, and it was also highlighted in the explanation and elaboration document accompanying the 2010 update of the CONSORT guidelines for reporting clinical trials (Moher et al., 2010):
many medical researchers worry that underpowered trials with indeterminate results will remain unpublished and insist that all trials should individually have “sufficient power.”
In addition, interpretation of results with P>0.05 as “negative” may be especially misleading when sample size is small and power is correspondingly low.

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
Associated characteristics do not make small studies unethical, because ... Column heading for other side (overwrite this text) ...
1. A small study can have a sound design that minimizes bias, be prospectively registered, and be conducted by researchers with a flawless track record of both conscientious publication regardless of results and of careful analysis and fair interpretation of their studies’ data. Condemning it as unethical because it “looks like” other studies that have lacked these characteristics would be presuming guilt by association and would be a form of collective punishment. Using such reasoning is particularly ironic because it is itself recognized as unethical in other contexts, and collective punishment can even be considered a war crime.

2. The unfairness of condemnation based on associated characteristics can be starkly shown by applying similar reasoning in a different context. We can parallel the above quotes from Button et al. (2013b) as, “commission of violent crimes has been empirically documented to be far more prevalent in men than in women,” and “We agree that it would be wonderful men never committed violent crimes. However, this has not happened.” We could parallel the reasoning noted above by Moher et al. (2010) as, “many people worry that men will engage in violent behavior and insist that only women should be allowed on city streets after dark.”

3. Dealing indirectly with important problems like bias, lack of dissemination of results, and misinterpretation has been and will continue to be ineffective as well as unfair. Increasing sample size does not in itself solve any of these problems, and entangling them with sample size sows confusion and distracts from the real issues and more effective direct solutions. Notably, the myth of “adequate” sample size actually encourages misinterpretation of P>0.05 as proving no effect.
Type or paste reasoning on the other side here, overwriting these instructions.

Underpowered studies are inefficient

Button et al. (2013a) asserted that studies with low power are unethical because they are inefficient and wasteful. They focused on efficiency defined in terms of animals sacrificed without resulting in “detection” of an effect (meaning finding P<0.05), writing:
We argue that it is important to appreciate the waste associated with an underpowered study -- even a study that achieves only 80% power still presents a 20% possibility that the animals have been sacrificed without the study detecting the underlying true effect.

Low power therefore has an ethical dimension -- unreliable research is inefficient and wasteful. This applies to both human and animal research.
Although this seems closely related to the early quantitative claims about studies with smaller sample sizes not producing enough value to justify participant burden, they did not cite Bacchetti et al. (2005) or address the reasoning that it provided.

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not a valid argument because ... This is a valid argument because ...
1. When efficiency is defined by projected value relative to the number of subjects, the detailed quantitative analysis by Bacchetti et al. (2005, 2008) showed that increasing sample size can only reduce efficiency. This holds with projected value assumed to be proportional to power, as Button et al. apparently intend.

2. Although Button et al. (2013a) noted the risks of both “underpowered” and excessive sample size, they did not carefully analyze how the two combine. The fact of decreasing power per subject (Bacchetti, 2005: Figure 2, Appendix) implies that the expected number of animals that might be defined as “wasted” by a study increases as sample size increases. A smaller study therefore is not more wasteful than an otherwise identical larger one, even if we use a definition of waste based strictly on statistical hypothesis testing with alpha=0.05. A mathematical proof of this is given here.
Type or paste reasoning on the other side here, overwriting these instructions.

Studies should "answer the question"

Clark et al. (2013) acknowledged challenges, citing Bacchetti et al. (2005), but they nevertheless briefly justified consideration of sample size calculations by ethics committees as follows:
The view that underpowered studies are in themselves unethical has been challenged by some researchers, who argue that this is too simplistic. We believe that a study must be judged on whether it is appropriately designed to answer the research question posed, and the validity of the sample size calculation is germane to this assessment.
%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not valid because ... Column heading for other side (overwrite this text) ...
1. The second sentence is simply a brief restatement of the “too simplistic” argument. The idea that a sample size is or is not appropriate to “answer the research question” is a manifestation of the threshold myth. In reality, the amount of information that a study can be expected to provide about a given question increases gradually and has diminishing marginal returns as the planned sample size increases; there is no particular point at which it becomes “appropriate” or “enough” to answer the question.

2. Even a study with an excessive sample size is not guaranteed to “answer the question”. As noted by Bacchetti (2010), “even huge studies can produce results near the boundary of what is large enough to be important.” Any ethical standard along the lines Clark et al. advocate would therefore not only be arbitrary, but also inadequate to meet their criterion

3. In addition, requiring sample size calculations for ethical review actually contributes to erosion of scientific integrity. This is because researchers always consider cost and feasibility in choosing a sample size (it would be irrational not to), but the myth of calculating a correct sample size to “answer the question” prohibits them from disclosing the influence of cost and feasibility on their choice. As Bacchetti (2010) wrote, “Forcing investigators to hide the real reason for choosing a sample size sends a bad message about integrity, right at the beginning of the research process.”
Type or paste reasoning on the other side here, overwriting these instructions.

Nuremberg code

Gelfond, et al. (2012) wrote:
The need for power calculations is seen in point #2 of the Nuremberg Code: ‘The experiment should be such as to yield fruitful results for the good of society’.
And in their reply (Gelfond, et al., 2013) to Bacchetti, et al. (2013), they wrote:
The Nuremberg Code’s point 2 (the experiment should give meaningful results) and point 6 (the humanitarian importance of the problem must justify the participant’s risk) [1] offer strong support of this connection between power and ethics. The underlying principle is that for any given outlay of human risk or resources, there is an obligation to maximize the power and efficiency of the experimental design.
%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not valid reasoning because ... Column heading for other side (overwrite this text) ...
1. There is a huge leap from point #2 of the Nuremberg code to a “need for power calculations”, which was certainly not originally intended. The assumption that lack of a power calculation would render a study in violation of this point presumes that a frequentist statistical hypothesis testing approach should be used for planning every experiment. This is a narrow, controversial view that is not a valid basis for an ethical standard. In addition, it is unrealistic to presume that a power calculation will ensure fruitful results or will optimize efficiency.

2. As pointed out below, advocacy of conventional power-based sample size planning works against efficiency, because efficiency is by definition a ratio of yield to inputs (e.g., risks, costs). Power calculations do not consider inputs, and their utility for projecting yield is also doubtful. Reality may differ from both the null and the alternative hypothesis used in power calculations, auxiliary information needed for the calculations may be unknown or imperfectly known resulting in wide uncertainty, and the value of the actual results may depend on more than just whether or not P<0.05.
Type or paste reasoning on the other side here, overwriting these instructions.

Contribution to Meta-Analyses

Responding to the arguments from Edwards et al. (1997) noted above, Halpern et al. (2002) questioned the potential utility of contributing to meta-analyses, stating that “difficulties in synthesizing the results may prevent the calculation of valid treatment effects”, and that “For meta-analyses to be useful, however, comparable research methods must have been used among the primary trials”. They went on to focus on the greater potential that negative studies will remain unpublished if they were “underpowered”, noting: “Thus, the ideal conditions for combining evidence may be particularly unlikely when the component trials are underpowered”. Finally they asserted:
Only if widely accessible registries of RCTs are expanded to include privately sponsored trials could the potential for publication bias in retrospective meta-analyses be eliminated.
Gelfond et al. (2012) also summarized concerns, stating, “This meta-analysis strategy unduly depends on future studies actually being performed, ignores complications of cross-study heterogeneity that could hinder combinability of these studies, and is seemingly not time or cost efficient for testing a particular hypothesis.”

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
These issues do not permit ethical condemnation of “underpowered” studies because ... Column heading for other side (overwrite this text) ...
1. The reasoning from Bacchetti, et al. (2005, 2008), applies to an individual study, without relying on any future meta-analysis. As they stated (2005), \x93Even assuming the controversial premise that a study\x92s projected value is determined only by its power, with no value from estimates, confidence intervals, or potential meta-analyses, the balance between a study\x92s value and the burdens accepted by its participants does not improve as the sample size increases. Thus, the argument for ethical condemnation of small studies fails even on its own terms.\x94

2. Since long before the advent of formal meta-analysis, much scientific progress has occurred via accumulation of knowledge from different studies. The idea that a study must be definitive by itself in order to produce \x93adequate\x94 value would therefore condemn most scientific research. As noted above (item 2), even studies with supposedly \x93adequate\x94 power are not guaranteed to be definitive. A beta of 0.2 is not zero (and neither is an alpha of 0.05). In addition, any size study can produce an estimated effect in the gray zone between what is clearly large enough to be important and what is clearly too small to be important. Guiding and combining with future studies therefore constitute important parts of the potential value of most studies.

3. Concerns about heterogeneity apply to any size study and seem to impugn all meta-analysis, a stance that is too controversial to be a valid basis for condemning small studies.

4. The risk that a study will remain unpublished and therefore contribute to publication bias is an associated characteristic, which is not a legitimate basis for ethical condemnation. In addition, the last quote from Halpern et al. (2002) given above is clearly incorrect. Analyses that include all prospectively registered trials on a topic, and only those, will not have any selection bias, because they have not been selected based on their results. It does not matter if there are other trials that were not prospectively registered.

5. Gelfond et al. (2012) did not provide any support for their assertion of seeming inefficiency, and the alternative of doing single trial with the same number of participants as a collection of smaller trials may often not be possible, even if it would be more efficient. Bacchetti et al. (2008) examined efficiency in detail, showing that conventional assessments involving statistical power correspond very poorly to rigorous assessments of efficiency. Additional reasoning regarding efficiency is provided elsewhere on this page: here, here, and here.
Type or paste reasoning on the other side here, overwriting these instructions.

Gelfond et al.'s proposed definition of “underpowered”

Gelfond et al. (2012) objected to “straw men” raised by Bacchetti et al. (2012), writing that “the article does not specify their definition of underpowered.” They went on to write:
power is determined by the complete study design that includes many factors other than sample size, and one could define underpowered designs as having less power than the optimal feasible design, where the optimal design is determined by some efficiency criterion. Given this definition of underpowered, we could revise our statement in the article (edits in italics) to ‘Underpowered studies are less likely to yield results with practical translational value; they may both put subjects at unnecessary risk and waste resources.’
%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
This is not a valid criticism because ... This is reasonable because ...

1. Gelfond et al. (2011) did not give any definition of “underpowered”, so a reader would naturally assume that it means having too small a sample size, because this is how the term is generally understood, it is what previous writings have meant (Edwards 1997; Halpern et al. 2003; Bacchetti et al. 2005), and they used the term in a paragraph that was discussing “sample size estimation”. This seems like a post-hoc switch in definition.

2. This definition makes the revised statement boil down to saying that inefficient studies are inefficient. This is true, but it seems to shed little light on the original issues.

3. This proposal is inconsistent with their advocacy of power calculations for sample size planning. Because the impact of sample size on a study’s projected value is a concave function with no threshold, optimization cannot be performed without considering costs and other drawbacks of increasing sample size. Conventional power-based sample size calculations ignore those considerations and so cannot accomplish what Gelfond et al. seem to be advocating.

4. Optimality seems an unreasonable standard for what is ethical. Surely a study that is merely good rather than perfect is still ethically acceptable. There will almost always be some tweak to inclusion criteria, control conditions, etc. that could produce a slightly more efficient study.

Type or paste reasoning on the other side here, overwriting these instructions.

Summary

This section is provided for contributors to make a case for a bottom line conclusion based on all the above material.

%EDITTABLE{format=" |textarea, 80x40 | textarea, 80x40| " quietsave="off" changerows="off" }%
Ethical condemnation of studies for having "too small" sample size is unwarranted because ... Column heading for other side (overwrite this text) ...
1. The reasoning of Bacchetti et al. (2005) has not been cogently challenged. As shown above, the counterarguments proposed all have fundamental flaws. Notably, the key claim from Gelfond et al. that the reasoning depends on a narrow definition of “underpowered” is, ironically, a clear straw man.

2. Other proposed reasons for ethical condemnation also have clear flaws. They have often been proposed without examining the existing literature on ethics and sample size, and, as detailed above, provide shallow reasoning that is easily refuted.

3. Accusing other researchers of being unethical should require very strong evidence and very little possibility of being wrong. Making such accusations without careful consideration of counterarguments seems itself to be ethically questionable.
Type or paste reasoning on the other side here, overwriting these instructions.

References

Altman, D. G. (1980). Statistics and ethics in medical research 3: How large a sample? British Medical Journal 281, 1336-1338. Available here.

Bacchetti, P. (2010). Current sample size conventions: Flaws, harms, and alternatives. BMC Medicine 8, 17. Available here. Ctspedia version here.

Bacchetti P. (2013). Small sample size is not the real problem. Nature Rev. Neuroscience, 14:585. Available here (proprietary).

Bacchetti, P., McCulloch, C. E., and Segal, M. R. (2008). Simple, defensible sample sizes based on cost efficiency (with discussion and rejoinder). Biometrics 64, 577-594. Available here.

Bacchetti, P., McCulloch, C., and Segal, M. R. (2012). Being ‘underpowered' does not make a study unethical. Statistics in Medicine 31, 4138-4139. Available here (proprietary).

Bacchetti, P., Wolf, L. E., Segal, M. R., and McCulloch, C. E. (2005). Ethics and sample size. American Journal of Epidemiology 161, 105-110. Available here.

Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafo MR. (2013a). Power failure: why small sample size undermines the reliability of neuroscience. Nature Rev. Neuroscience 14, 365-376. Available here (proprietary).

Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafo MR. (2013b). Confidence and precision increase with high statistical power. Nature Rev. Neuroscience 14, 585-586. Available here (proprietary).

Clark T, Berger U, Mansmann U. (2013). Sample size determinations in original research protocols for randomised clinical trials submitted to UK research ethics committees: review. British Medical Journal 346, f1135. Available here.

Edwards, S. J. L., Lilford, R. J., Braunholtz, D., and Jackson, J. (1997). Why ''underpowered'' trials are not necessarily unethical. Lancet 350, 804-807. Available here (proprietary).

Gelfond, J. A. L., Heitman, E., Pollock, B. H., and Klugman, C. M. (2011). Principles for the ethical analysis of clinical and translational research. Statistics in Medicine 30, 2785-2792. Available here (proprietary).

Gelfond, J. A., Heitman, E., Pollock, B. H., and Klugman, C. H. (2012). Power, ethics, and obligation. Statistics in Medicine 31, 4140-4141. Available here (proprietary).

Halpern, S. D., Karlawish, J. H. T., and Berlin, J. A. (2002). The continuing unethical conduct of underpowered clinical trials. Journal of the American Medical Association 288, 358-362. Available here (proprietary).

Halpern, S. D., Karlawish, J. H. T., and Berlin, J. A. (2005). Re: “Ethics and sample size”. American Journal of Epidemiology 162, 195-196. Available here .

Horrobin, D. F. (2003). Are large clinical trials in rapidly lethal diseases usually unethical? Lancet 361, 695-697. Available here (proprietary).

Moher D, Hopewell S, Schulz KF, Montori V, Gotzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG. (2010). CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials. British Medical Journal, 340:28. Available here.

Newell, D.J. (1978). Type II errors and ethics. Br Med J. 2:1789–1789. Available here.

Prentice, R. L. (2005). Ethics and sample size—Another view. American Journal of Epidemiology 161, 111-112. Available here.

This topic: CTSpedia > WebHome > EthicsBiostat > EthicsSampleSize
Topic revision: 30 Jun 2014, PeterBacchetti
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback