Under Construction

Significance Testing as Perverse Probabilistic Reasoning

This is a CTSpedia version of an open access article originally published as:
Westover MB, Westover KD, Bianchi MT. Significance Testing as Perverse Probabilistic Reasoning. BMC Medicine 2011, 9:20. The published PDF version can be found here.

Abstract

Abstract

Background: Truth claims in the medical literature rely heavily on statistical significance testing. Unfortunately, most physicians misunderstand the underlying probabilistic logic of significance tests and consequently misinterpret their results. This misunderstanding is highlighted by means of a simple quiz which we administered to 246 physicians at two major academic hospitals, on which the proportion of incorrect responses exceeded 90%.
Discussion: This paper provides an accessible treatment of the concepts needed to avoid misinterpreting significance tests. We begin by thoroughly explaining the basic concepts of probability theory, in particular reviewing the origin and meaning of Bayes' rule. Next, we explain the two essential ingredients of significance testing, binary hypothesis testing and p-values. We then show that significance testing as usually understood represents a perversion of probabilistic reasoning. We show how understanding Bayes' rule can protect against common errors in statistical reasoning, and how physicians can exploit their experience with interpreting diagnostic tests to gain intuitions about probabilistic inference generally. Finally, we review the debate in the cognitive sciences regarding physicians' aptitude for probabilistic inference.
Summary: Understanding the fundamental concepts of probability theory has become essential to the rational interpretation of medical information. This essay provides a technically sound review of these concepts that is accessible to a medical audience.

Background

Medicine is a science of uncertainty and an art of probability. -Sir William Osler [1]

While probabilistic considerations have always been fundamental to medical reasoning, formal probabilistic arguments have only become ubiquitous in the medical literature in recent decades [2, 3]. Meanwhile, many have voiced concerns that physicians generally misunderstand probabilistic concepts, with potential serious negative implications for the quality of medical science and ultimately public health [3-12]. This problem has been demonstrated previously by surveys similar to the following quiz [13], which we administered to a group of 246 physicians at two major U.S. teaching hospitals. The reader is likewise invited to answer before proceeding.

Consider a typical medical research study, e.g. designed to test the efficacy of a drug, in which a null hypothesis H0 (‘no effect') is tested against an alternative hypothesis H1 (‘some effect'). Suppose that the study results pass a test of statistical significance (i.e. p-value < 0.05) in favor of H1. What has been shown?

1. H0 is false.
2. H1 is true.
3. H0 is probably false.
4. H1 is probably true.
5. Both 1 & 2.
6. Both 3 & 4.
7. None of the above.

The answer profile for our participants is shown in Table 1. This essay is for readers who, like 93% of our respondents, did not confidently select the correct answer, (7), ‘None of the above'. We hasten to assure the reader that this is not a trick question. Rather, it is a matter of elementary probabilistic logic. As will be clear by the end of this essay answers 1-6 involve ‘leaping to conclusions', in violation of the basic law of probabilistic inference, Bayes' rule. We will see that Bayes' rule is an essential principle governing all reasoning in the face of uncertainty. Moreover, understanding Bayes' rule serves as a potent prophylaxis against statistical fallacies such as those underlying the apparent plausibility of the 6 erroneous answers in this little quiz.

Table 1: Quiz answer profile.

Answer (1) (2) (3) (4) (5) (6) (7)
Number 8 0 58 37 6 69 12
Percent 4.2 0 30.5 19.5 3.2 36.3 6.3

Answer	(1)	(2)	(3)	(4)	(5)	(6)	(7)
Number	8	0	58	37	6	69	12
Percent	4.2	0	30.5	19.5	3.2	36.3	6.3