Return to Discussion Forum
Title: BERD - Pain Presentation
On the bottom of this page, you will find the topic for discussion and the name of the contributor.
Please add comments and then click on the "Add comment" button.
%COMMENT{type="belowthreadmode"}%
A few random comments -
- This is a lot of work in SAS compared to the ordinal package in R
(handles random effects)
- Sparseness does not have to do with non-proportional odds except in a
strange way: the SAS PROC LOGISTIC test for proportional odds (which
doesn't reference my student Bercedis Peterson's paper which invented the
test and pointed out its shortcomings) will strongly reject
H0:proportional odds even when there is perfect proportional odds, when
cells are sparse.
- I suspect that your observations about standard errors when things
become more sparse is related to the above. For proportional odds
assessment I rely on partial residual plots.
Regards,
Frank
Michael Berbaum - 17 Aug 2011 - 16:08
Greetings,
I can offer a couple more examples of pain analyses. Both designs are 3-arm RCTs examing pain control during interventional radiology procedures. Thus, at baseline, before the procedure begins, average pain level should be nearly equivalent in the three groups. Patients are repeatedly asked their pain (and anxiety) levels at regular intervals (repeated measures). One key feature is that as patients' procedures are completed, they "drop out" so that observation end when the patient with the longest duration completes. With increasing sparsity, the standard errors around the groups' curves grow substantially! In study #1 we used a normal mixed model (SAS PROC MIXED or BMDP 5V); in study #2 we used a proportional odds model with random intercepts (SAS PROC NLMIXED). The proportional odds assumption failed owing to the sparsity of data at later times and at higher pain levels. We collapsed levels 9 and 10 into level 8 and then we were OK. We struggled to find an understandable graph of results and ended up showing binary "splits" at various thresholds. I've appended some SAS code for analyses in #2, prepared mostly by Ms. Xinyu Li, M.S., co-author on the second paper. If I had it to do again, I would think more about controlling baseline covariates and the MAR assumption we relied on. I hope this is helpful, and I'd welcome any comments on the approach we took.
#1
Lang, Elvira V., Benotsch, Eric G., Fick, Lauri J., Lutgendorf, Susan, Berbaum, Michael L., Berbaum, Kevin S., Logan Henrietta, and Spiegel, David (2000). Adjunctive non-pharmacological analgesia for invasive medical procedures: a randomised trial. The Lancet, vol. 355, issue 9214, pages 1486-1490, April 2000. doi:10.1016/S0140-6736(00)02162-0
#2
Lang EV, Berbaum KS, Faintuch S, Hatsiopoulou O, Halsey N, Li X, Berbaum ML, Laser E, Baum J. 2006). Adjunctive self-hypnotic relaxation for outpatient medical proceudres: A prospective randomized trial with women undergoing large core breast biopsy. Pain, 126(1-3): 155-164, Dec 15, 2006. PMID: 16959427.
Best regards, --Mike
--
Michael L. Berbaum, Ph.D., Director
Methodology Research Core
Institute for Health Research and Policy (MC 275)
University of Illinois at Chicago
1747 West Roosevelt, Room 558
Chicago, Illinois 60608
Tel: (312) 413-0476
Fax: (312) 996-2703
Email:
mberbaum@uic.edu
IHRP web site:
http://www.ihrp.uic.edu
I hear ya.
Thanks
Frank
The fundamental difference from many other methods is that ambiguities are allowed. We don't need to make strong assumptions (proportionality, linearity, independence) to ensure that the pairwise ordering among all subjects can be decided.
Knut
Hi Knut,
I'm in England, with apologies for the slow reply. Yes your example
isn't controversial. Other combinations are harder to figure, which is
why I like to adjust for baseline as a covariate instead.
Frank,
I agree that this discussion I proving more interesting than I had expected, including two of your recent remarks.
In fact, I wonder whether we should edit this discussion and make it available in some form.
First, I fully agree that "making the subject its own control" is heavily overrated among clinicians. As we have seen, it is less than trivial to formalize the concept of "change", being it difference, ratio, sign, ... Still, there may be cases where baseline values should be incorporated.
Which brings is to your second comment.
U-scores for multivariate data (Hoeffding 1948) are based on the assumption that - everything else being the same - more in any of the pain characteristics is worse. No linearity, proportionality, or independence being assumed. For instance, if subject A is coming in with a lower baseline and a higher outcome VAS than subject B, then A had less of a response than B. I would not expect too much of a controversy here.
The fundamental difference from many other methods is that ambiguities are allowed. We don't need to make strong assumptions (proportionality, linearity, independence) to ensure that the pairwise ordering among all subjects can be decided.
Knut
Hi Ron,
Thanks for the clarification!
Of course, knowing the scale does not automatically lead to the method, but it restricts the methods one can use. Chi-square only for nominal, u-test for ordinal, u- or t-test for ratio/interval only.
It is also important to remember that rank tests are no panacea (see Scheffe, 1959, chapter 10) and that the (apparent, assumed, ...) distribution of the data is not very helpful in choosing between tests that are both asymptotically distribution-free (like the t- and the u-test).
Still, I'd rather use a test that is approximately right, than one that is exactly wrong. If all we were interested in is alpha, we might simply toss 17 coins and if we get less than five of either heads or tails, we have a test for the 5% level, no need to even gather data
Hence, we need to also consider what alternatives the tests are sensitive against, like deviations from the arithmetic mean being zero for the t-test vs the tendency among paired comparisons from 50:50 (not the median!) for the u-test.
What's missing in many of these cook book rules is that we cannot choose a test by looking at the characteristics of the data (distribution) and the variables (scale), we also have to formalize the question (type of alternative) of interest.
Knut
Same here! Great discussion!
Frank
(k) Enough ramblings for now. Thanks to all for the excuse to avoid doing other stuff on my plate.
Cheers,
Ron Thisted
It is easy to convert the odds ratio and other parameters in the model to
the mean or median pain score. Also, exceedance probabilities come
straight out of the model and are easily interpreted by clinicians.
(h) Changes in pain (or in other symptoms that also have a subjective or self-report element) may make sense in some contexts and not in others. For instance, pain after surgery eventually gets better. The focus may be on how rapidly this occurs. On the other hand, if one is studying chronic pain (that is, pain that one would not expect to improve in the natural course of things), then the focus is definitely on how much improvement in pain can be achieved, and in what fraction of patients.
(i) Consider the situation in which patients are randomized to two treatments, and two hours later, a VAS pain measurement is taken. If the point of an exercise is testing the null hypothesis of no difference between groups, lots of sensible and familiar tests will work just fine, in the sense that they will be valid tests of H0 and will have (approximately) the right size. For this purpose, the difference between a t-test and a proportional-odds regression based measure (taking each unique observed VAS score as a "cutpoint") will depend upon the alternatives against which wants to have greatest power.
(j) If the point is to estimate the size of the treatment effect, then one has to have some sense as to what differences on some scale mean. In the context of anesthesia for certain particular operative procedures, for instance, VAS pain measurements of 30 or below are considered adequately low scores, and pre-treatment scores of 50 or 60 are typical. In this context the mean and SD of VAS scores has a clinical interpretation (which may not extend to other contexts). The odds ratio (from a proportional odds-based analysis) would not be easily understood or communicated, and it would be hard to relate to what clinicians already understand about how the VAS works in this particular context--even if it were the basis for a more sensitive test of the differences between the VAS distributions under the two treatments.
I haven't experienced that problem. You can model baseline using dummies
or using quadratic or spline functions.
(g) I agree with Knut that changes in pain can (and often should) be analyzed using methods other than simply taking the numerical difference in pain scales. Transition models (with a small number of defined ordered categories) are often successful at doing this. Proportional odds models, while incredibly useful for comparing groups at a single point in time (such as the completion of a randomized clinical trial), are less easily used when one wants to make inferences conditional on, say, a baseline variable that itself is measured as an ordered category (for instance, baseline pain assessment).
I don't think that follows. I agree that clinicians think this is more
interpretable but I think they are largely fooling themselves, mainly
because of floor and ceiling effects. An unbiased estimate of current
status is going to be quite useful, and can be calibrated in the sense you
are saying, by including baseline level (or a spline function of it) as a covariate.
(f) Changes in pain scores (as opposed to changes in other kinds of scores) can be particularly important, since within-subject pain scores are likely to be much better calibrated than between-patient scores. So from an interpretability standpoint, clinicians and others often find changes to be more compelling than raw scores. And as we know, if the within-subject correlation exceeds 0.5, there are efficiency gains to the use of difference scores.
Often I see major non-proportionality yet the PO model fit better than all
the other models I was entertaining.
(e) I agree with Frank that the proportional odds and related models are not known (or used) widely enough. As with almost all models, the assumptions under which they work best (constant proportionality of odds between groups) always hold only approximately. Conditional on actually using a proportional odds model, examining the extent to which proportionality holds, and critically assessing the extent to which it really matters whether it holds, are also not done widely enough.
The real problem with the central limit theorem is that for a given
dataset we don't know if it applies (this is more true for highly skewed
Y).
(d) The utility of a particular analysis depends more on the study design and the substantive question than on the scale of measurement. The central limit theorem works wonders in many situations. (For instance, if one applies the two-sample t-test to binary data--ordinal scale at most--the test is essentially equivalent to the chi-squared test for comparing proportions.)
Very nice discussion Ron. It should be noted that the Wilcoxon tests
almost always tests a stochastic ordering hypothesis that is relevant. We
tend to get ourselves in trouble when we use the t- or normal
approximation for getting P-values with the Wilcoxon. If you have scale
differences (or other simple translation differences) you can get very
accurate P-values using the general U-statistic standard error, as
implemented in the R Hmisc package's rcorr.cens function.
Dear Laura Lee, Frank, Knut, Greg, et al:
A few random thoughts on pain stimulated by the (less random) notes of others:
(a) Regarding Laura Lee's original request, Thomas Permutt at FDA has done some very thoughtful work on analyzing pain outcomes in the context of clinical trials. I am not sure if his work has been published, but it has been influential in the design and analysis of Phase III studies of drugs intended to affect pain. A key reference is the IMMPACT recommendations (2005, "Core outcome measures for chronic pain clinical trials: IMMPACT recommendations," Pain 113: 9-19).
(b) The emphasis on scale of measurement (ordinal, interval, ratio, etc) has the potential to side-track us from the most important questions of design, analysis, and inference. As often as not, focusing on scale of measurement can be misleading. It is particularly pernicious when it leads to automatic choices of the "correct" statistical analysis based on measurement characteristics and not consideration of the study design, distributional characteristics of the measurements, subject-matter knowledge, and identification of the question that really needs to be answered. The outstanding paper by Velleman and Wilkinson makes a convincing case. [Velleman, P. F. & Wilkinson L. (1993). Nominal, ordinal, interval, and ratio typologies are misleading. American Statistician, 47, 65-72.]
(c) The identification of a particular statistical test with a scale of measurement is often gets things badly wrong. For instance, it is commonly stated that the t-test assumes an interval scale, while the Wilcoxon (Mann-Whitney) test assumes only an ordinal scale. That is not correct. In fact, the two-sample Wilcoxon procedure relies on the assumption that the two distributions differ only in location and not in shape. In particular, that variances and skewness are identical in the two distributions, and that one is simply a shifted version of the other. Simply having an ordinal scale of measurement is not sufficient to justify the validity of the Wilcoxon test. Indeed, if the two groups are normally distributed and have the same mean, but one standard deviation is twice the size of the other, the size of a nominal 0.05 Wilcoxon test is actually 0.074 (JW Pratt, JASA 1964, 59: 665-80).
Ronghui (Lily) Xu - 14 Aug 2011
Hi Laura,
If it adds to it, our group have also worked on brain imaging and meta-analysis aspects of pain research:
Leung A, Duann J,
McGreevy K, Li E, Xu R, Donohue M, et al. The supraspinal
pain pathway of the thermal grill illusion.
NeuroImage, 2009; 47(Supplement 1):
S61-S61.
Leung AY, Donohue M, Xu R, Lee R, Lefaucheur J, Khedr E, Saitoh Y, Andre-
Obadia N, Rollnik J, Wallace M, Chen R. rTMS in suppressing neuropathic pain:
a meta-analysis. The Journal of Pain, 2009; 10(12): 1205-16.
Thanks,
Ronghui (Lily) Xu
Professor
Division of Biostatistics and Bioinformatics
Department of Family and Preventive Medicine
and Department of Mathematics
Director, CTRI Design and Biostatistics
University of California, San Diego
9500 Gilman Drive, Mail Code 0112
La Jolla, CA 92093-0112
Hi Knut,
Good discussion. I think the score you've specified will make even more
assumptions than the proportional odds assumption though.
I don't think that change will do better adjusting for baseline
differences, because of floor and ceiling effects.
Best,
Frank
A deterministic reply to a random comment: analyzing changes in pain could potentially adjust for differences in baseline pain perception without the need of making assumptions about proportionality. Of course, "analyzing changes" does not necessarily mean "computing differences of scores". For instance, one could score a particular subject's response (outcome vs baseline) as
- the number of subjects with a larger-or-equal baseline and a smaller-or-equal outcome (smaller effect) minus
- the number of subjects with a smaller-or-equal baseline and a larger-or-equal outcome (larger effect).
These 'u-scores' would score changes on one (or several) ordinal outcomes without computing differences.
Knut
A random comment: I think it is a mistake to analyze change in pain
status. The difference in two ordinal scales is not ordinal. There are
many reasons to have the final pain severity as the outcome, adjusted for
initial severity as a baseline covariate.
A nice feature of the proportional odds model is that you can have as many
categories as you have unique Y values.
Frank
Kathryn Chaloner - 14 Aug 2011
Hi all
Like John Connett I was involved in the Shlay (1998) study. The more we looked and analyzed the Gracely continous scale, I think it is fair to say as the study went on, the less we believed that it measured something real. We also had a \x93Global Pain Relief Scale\x94 that was ordered \x93Complete, A lot, \x85none, Pain got worse\x94 that was more believable in interpretation and in analysis. Fortunately results were consistent.
The rationale for the Gracely scale was that a previous study in diabetic peripheral neuropathy had used the scale \x96 and so for the HIV and acupuncture design there was data \x96 and there was a lot of support for using the same scale from clinicians. In retrospect not a great idea to perpetuate a bad endpoint.
With hindsight, the simple global pain relief scale made a much better endpoint and analysis and was much more interpretable. We used ordinal response models.
Kathryn
Kathryn Chaloner, PhD
319 384 5029
kathryn-chaloner@uiowa.edu
Hi Laura Lee,
I find the empirical "validation" for using methods based on the linear model for individual VAS scales less convincing, but the real problem lies in the complexity of measuring complex phenomena, such as pain, on a variety of scales.
Shlay (1998): Patients rated their pain in a diary once daily, choosing from the Gracely scale of 13 words that describe the intensity. The words had been assigned magnitudes on the basis of ratio-scaling procedures that demonstrated internal consistency, reliability, and objectivity. --- Comparison of treatment groups for the primary end point of change in pain, as measured by the pain diary, used a linear model with baseline characteristics, clinical unit, and option (factorial or single factor) as covariates.
Griffith (2008): The primary outcome measure was the mean difference in the subjects\x92 self-reported pain scores before and after the administration of the initial medication treatment. A pain score reduction of 3 or more points after the initial treatment was considered clinically effective and used as a cutoff point to dichotomize the primary outcome measure for multivariate statistical analyses.
I agree with Frank that ordinary regression may not be appropriate to generate valid comprehensive scores. Trying to avoid the problem by dichotomizing the outcomes at an arbitrary cutoff point may also not be a good solution.
Under
WebServices/MuStat, CTSPedia offers biostatstical tools (spreadsheets, R package, and Web server) that help to resolve some of these problems by creating scores/metrics that are intrinsically valid, because fewer assumptions need to made and, thus, empirically "validated".
BTW, in a collaboration with the NINR we are corrently using the same
WebServices/MuStat to screen for genetic risk factors of fibromyalgia, yet another way of addressing the many open questions in pain research using the novel methods and tools developed by BERD.
Here are the references:
Morales (2008): www.bepress.com/sagmb/vol7/iss1/art19/ (complex phenotypes, such as pain)
Wittkowski (2010) www.ncbi.nlm.nih.gov/pubmed/20652502 (comprehensive overview in a book with many
CTSA contributions)
Rubio (2011): www.ncbi.nlm.nih.gov/pubmed/21284015 (on the crossfertilization of BERD developing metrics applied both by and to BERD practicioners)
Knut
John Connett - 13 Aug 2011
Laura,
An example of pain measurement and analysis in an acupuncture study:
Shlay, Chaloner et al. (1998) "Acupuncture and amitriptyline for pain
due to HIV-related peripheral neuropathy," JAMA 280: 1590-1595.
John C.
Greg,
I'm amazed that I still see people analyzing ordinal scales using ordinary
regression. The proportional odds model and its cousins are still not
known to vast areas of research.
Frank
Laura,
Pain research usually involves a visual analog scale (VAS) measurement of pain. There is confusion, however, if these can be analyzed as a continuous variable (interval scale), or if should they be considered an ordered categorical variable (ordinal scale). That is, there is inconsistent in how these scales are analyzed. You could clear up this confusion in your talk.
I introduce this topic on page 3 of Ch 2-6 of my course manual that is available in CTSpedia, the educational materials section. Another website for it is given in the footnote on the first page. In that chapter, I provided citations and justification of why it can be analyzed as an interval scale. I have attached two papers I cited.
On p 41 of Ch 2-1, I give the taxonomy of levels of measurement, which you might use as background material.
Thanks,
Greg
Demetrios Kyriacou - 13 Aug 2011
Dear Laura Lee,
Attached is a study that was published in Journal of Pain and conducted at
Northwestern University Department of Emergency Medicine by one of our senior
residents, a junior faculty person, and me. We conducted a retrospective cohort
study to compare "Metoclopramide Versus Hydromorphone for the Emergency Department
Treatment of Migraine Headache."
I use this study in my Intermediate Epidemiology course to illustrate the different
types of confounding by indication. While we adjusted for potential confounding by
severity of the migraine headache in the adjusted relative risk comparison of
reduction in migraine pain, we did not adjust for the potential confounding by
indication for nausea or vomiting that is frequently associated with more severe
migraine headaches and is often treated with metoclopramide which is an anti-emetic
medications. Thus, there is still potential for confoudning in the study.
Let me know if this is useful to you and if you have any questions.
Demetrios N. Kyriacou (Jim)
This topic: CTSpedia
> WebHome >
DiscussionForum > DiscussionBERD000
Topic revision:
09 May 2013, MaryBanach
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki?
Send feedback