Multiple Predictor Linear Model
Return to Course Materials
Lead Author(s): David Glidden, PhD
Start presentation
Slide 1: Multiple predictor linear regression
- Models dependence of the mean of continuous outcome on multiple predictors simultaneously
- By including multiple predictors we can try to
- control confounding of treatment effects by indication, risk factor effects by demographics, other covariates
- examine mediation of treatment, risk factor effects
- assess interaction of treatment effects or exposure with sex, race/ethnicity, genotype, other effect modifiers
- get at causal mechanisms in observational data
- also: account for stratified or multi-center design of RCT, increase precision of estimates
Slide 2: Components of the Linear Model
- Systematic:
- how does the average value of outcome y depend on values of the predictors?
- Random:
- at each observed value of the predictors, values of y
are distributed about the predicted average
- assumed distribution of deviations underlies
hypothesis tests, p-values, and confidence intervals
Slide 3: Systematic part of the model
- In abstract terms, model written as
- Σ[y|x]= \xDF0 + \xDF1x1 + \xDF2x2 + \xB7\xB7\xB7 + \xDFpxp
- Σ[y|x]: Expected or average value of y for a given set of predictors x = x1,x2,\xB7\xB7\xB7 ,xp
- \xDFj: change in average value of outcome y per unit increase in predictor xj, holding all other predictors
constant
- \xDF0 (the intercept): average value of the outcomey when
all predictors = 0
- "Linear predictor" common to linear, logistic, Cox, and longitudinal models
Slide 4: Interpretation of regression coefficients
- \xDFj: change in average value of outcome y per unit increase in predictor xj, holding all other predictors
constant
- Hold x2,...,xp constant, and let x1= k:
- Σ[y|x]= \xDF0 + \xDF1k+ \xDF2x2 + \xB7\xB7\xB7 + \xDFpxp (1)
- Now increase x1 by one unit to k+1:
- Σ[y|x]= \xDF0 + \xDF1(k +1)+ \xDF2x2 + \xB7\xB7\xB7 + \xDFpxp (2)
- Subtracting(1) from(2) gives \xDF1, for every value of k as well as x2,...xp
- Note: assumes x1 does not interact with x2,...xp
Slide 5: Interpretation of regression coefficients
- \xDF0: average value of outcome y when all predictors = 0
- Let x1 = x2 = \xB7\xB7\xB7 = xp
0. Then E[y|x]
\xDF0 + \xDF1x1 + \xDF2x2 + \xB7\xB7\xB7 + \xDFpxp = \xDF0
- Intercept: where the regression line meets the y-axis in single-predictor models
Slide 6: Review: centering predictors
- Same as in single-predictor model
- For many continuous predictors like age, SBP, LDL, no one has value 0
- Solution: center them on their sample means, so new variable has value 0 for observations at the mean
- For binary predictors, 0 is the usual coding for the reference group, so not a problem for interpretation
- With centering, \xDF0 estimates expected value of y for participant at reference level of binary predictors, mean of centered continuous predictors
- Values and interpretation of other coefficients unaffected
Slide 7: Review: rescaling predictors
- Same as in single-predictor model
- Rescaled variable Xrs = X/k
- Coefficient for Xrs interpretable as increase in mean of outcome for a k-unit increase in X
- If k = SD(X), coefficient forXrs interpretable as increase in mean of outcome for a 1 SD increase in X
- \xDFˆ(Xrs)= k\xDFˆ(X);SE(\xDFˆ), 95% CI for \xDFˆalso rescaled
- P-value for \xDF, intercept coefficient unaffected
- Can accomplish the same thing using lincom
Slide 8: Random part of the model
yi =Σ[y|xi]+εi
- Outcome yi varies from the average at xi by an amount oi
- ε represents unmeasured sources of variation, error
- As in single-predictor model, four assumptions about o:
1. Normally distributed
2. mean zero at every value of x
3. constant variance
4. statistically independent
- These assumptions underlie hypothesis tests, confidence intervals, p-values, also model checking
Slide 9: Assumptions about the predictors
- Nodistributional assumptions(e.g. Normality)
- predictorscanbecontinuous,discrete(e.g. counts),
categorical(dichotomous, nominal, ordinal)
- Linear regression works better if
- predictors are relatively variable
- there are no excessively "influential" points
- Assumed measured without error(otherwise "regression dilution bias" and residual confounding)
Slide 10: Update of two details
- Fitted value:ˆyi = \xDFˆ0 + \xDFˆ1xi1 + \xB7\xB7\xB7 + \xDFˆpxip - estimated average or expected value of outcome y when
x = xi, the predictor values for observation i
- now depends on multiple predictors instead of just one
- Residual: ri = yi -yˆi =ˆoi
- difference between datapoint and fitted value
- sample analogue of oi, used in checking model fit
- not obvious what "vertical" means with multiple predictors
Slide 11: Ordinaryleast squares(OLS)
- Method for fitting linear regression models
- OLS finds values of regression coefficients which minimize residual sumof squares(RSS; i.e. sumof squared residuals)
- Good statistical properties: unbiased, efficient, easy to compute, but sensitive to outliers
- For normally distributed outcomes, OLS is equivalent to "maximumlikelihood" (methodusedforlogistic,Cox, some repeated measures, many other models)
Slide 12: Multi-predictor linear model for glucose
Multi-predictor linear model for glucose
- Upper left(ANOVA table)
- Total SS =Σn/i=1(yi-\xAFy)2: variability of outcome yi=1(yi - about the sample average \xAFy n
- Total MS =(yi -y\xAF)2/(n -1): sample variance i=1 of outcome y n
- Model SS = (ˆyi -y\xAF)2: variability of outcome i=1 accounted for by predictors included in model
- Model MS: numerator of model F-statistic n
- Residual SS =(yi -yˆi)2: residual variability i=1 not accounted for by predictors, what OLS minimizes
- Residual MS = yi)2/(n -p): sample i=1(yi - ˆvariance of residuals
Slide 13: Interpreting Stata regression output
Interpreting STATA regression output
Slide 14: Summary of model
- Multipredictor linear regression is a tool for estimating how the average value of a continuous outcome depends
on multiple predictors simultaneously
- Inferential machinery evaluates precision of estimates and whether sampling error can account for findings
- Coefficients generally interpretable as the change in theaverage value of the outcome per unit increase in the
predictor, holding all other predictors constant
- Power helped by effect size, sample size, variability of predictor; hurt by correlation with other predictors,
variability left unexplained
Slide 15: Confounding
- Can account for the some or all of the unadjusted association between a predictor and an outcome
- Controlling confounding the primary reason for doing multi-predictor regression
- Confounders must be associated with predictor and independently with outcome
- Only an association adjusted for confounders can be viewed as possibly causal
Slide 16: Unadjusted waist/glucose association
Unadjusted waist/glucose association
Slide 17: Adjusted waist/glucose association
Adjusted waist/glucose association
Slide 18: Primary predictor, confounder, and outcome
Primary predictor, confounder and outcome
Adjusting for a confounder
- Primary predictor and confounder are correlated:
- values of primary predictor larger in subgroup 2 than subgroup 1
- conversely, those with larger values of primary predictor more likely in subgroup 2
- Both continuousprimarypredictor andbinary confounder independently predict higher values of outcome
- Unadjusted effect of primary predictor partly reflects effect of being in subgroup 2
- Adjustment for the confounder fixes the problem
Slide 19: Interpretation of results
- Unadjusted estimateforprimarypredictor(6.2)
- Estimates an observable trend in whole population
- Causal interpretation misleading in most contexts
- Adjusted estimate(3.3) may have a causalinterpretation, because the effect of the confounder is not ignored
- Regression lines for subgroups 1 and 2:
- slopes estimate predictor/outcome association within
each subgroup("holding subgroup constant")
- assumedparallel(nointeraction - sameeffectinboth
subgroups)
Behavior of regression coefficients for this case
- When the primary predictor and confounder are positively correlated, both predict higher(or lower)
- Values of the outcome adjusted coefficient for primary predictor is attenuated: that is, closer to zero than unadjusted coefficient in this case, still non-zero and signficant
- Typical pattern for confounding
Slide 20: Another case: so-called negative confounding
- Confounding can also "mask" an independent association
- Example: needlestick injuries and HIV-seroconversion
- overall, AZT prophylaxis does not predict seroconversion, but* use of AZT associated with severity of injury * severity of injury predicts seroconversion
- protective effect of AZT unmasked after controlling
for severity of injury
Slide 21: Negative confounding: two scenarios
Negative confounding may arise between predictors that are
- Positively correlated, with opposite effects on outcome:
Example: injury severity, AZT, and seroconversion
- Negatively correlated, with similar effects on outcome:
Example: average BMI decreases with age in HERS
cohort, but both predict increased SBP
Slide 22: Summary: negative confounding
- Average BMI decreases with age in HERS cohort, but both predict increased SBP
- Adjustment for age increases BMI slope estimate from .21 to .30 mmHg per kg/m2
- Negative confounding is not all that uncommon
- Implications for predictor selection: univariate screening, "forward" selection procedures may miss some negatively confounded predictors
Slide 23: Confounding is difficult to rule out
- Were all important confounders adjusted for?
- Were they measured accurately?
- Were their effects modeled adequately?
- modeled non-linearities in response to continuous
predictors(Session 6)
- no omittedinteractions(Session5)
- no gross extrapolations
- Modeling difficulties used to argue for propensity scores
Slide 24: Summary
- Confounders must be associated with predictor and independently with outcome
- Unadjusted, adjusted coefficients estimate different things
- Unadjusted association may be partly or completely explained or, conversely, unmasked after adjustment
- Regression controlsfor confounding byjointly modeling effects ofpredictor and confounders(VGSMSect. 4.4)
- Bigger samples don't help, except by making it easier to adjust
- Controlling for covariates is easy enough, but residual confounding is difficult to rule out
- Confounders are thought to cause the primary predictor, or are correlates of such a cause
- In contrast, mediators are on the causal pathway from primary predictor to the outcome
- In models, mediation and confounding behave alike and must be distinguished on substantive grounds
- Example: to what extent is effect of BMI on SBP mediated by its effects on glucose levels?
- Use a series of models to show that:
- primary predictor independently predicts mediator
- mediator predicts outcome independently of primary predictor
- adjustment for mediator attenuates estimate for primary predictor
- The models:
- regress mediator on predictor and confounders
- regress outcome on predictor and confounders
- regress outcome on predictor, mediator, and confounder
- Interpretation of coefficient estimates for primary predictor:
- before adjustment for mediator: overall effect
- after adjustment: effect, if any, via pathways other than the mediator
- Assess mediation by difference between coefficients for primary predictor before and after adjustment for mediator
- Hypothesis tests, CIs for difference and proportion of effect explained abitharder(seebookfor references)
- Example: is association of BMI with SBP mediated by glucose levels?
- BMI independently predicts higher glucose: 1.7 mg/dL (95% CI 1.4-1.9) for each kg/m2
increase in BMI
- A 10 mg/dL increase in glucose levels is independently associated withhigherSBP:0.5 mmHg(95%CI0.3-0.7)
- Overall BMI effect: before adjustment for glucose levels, each additional kg/m2 predicts an increase of .25 mmHg (95% CI 0.12-0.38) in average SBP
- Direct BMI effect via other pathways: after adjustment for glucose levels, each kg/m2 predicts an increase of only .16 mmHg(95%CI0.03-0.30)
- Degree of attenuation(PTE):glucoselevels explain (.25-.16)/.25*100 = 34% of the effect of BMI on SBP
- An observational analysis even when the primary predictor is treatment in RCT; must control for
confounding of mediator effects.
- Evidence for mediation potentially stronger in longitudinal data
- but when predictor is both a mediator and a confounder, fancier methods required: e.g., "marginal structural models"
- "Negative" mediation is possible: glitazones, weight, bone loss; HT, statin use, CHD events
- TZDs cause bone loss in mouse models.
- In HABC, TZD use not associated with bone loss, after controlling for confounders by indication
- TZDs also cause weight gain, which is protective against bone loss
- TZDs do predict bone loss, after controlling for weight gain: adverse effect emerges after controlling for
beneficial effect via weight gain
- In HERS, statin use differentially increased in placebo group, and controlling for this makes HT look a bit protective
- Regression coefficients change when either a confounder or a mediator is added to the model; which is which depends on how you draw the causal arrows(statistics not informative)
- Negative mediation is possible
- Must control for confounders of mediator
- Estimated independent effect of primary predictor
- before adjustment for mediator: overall effect
- after adjustment: direct effect via other pathways
(assumingboth models adjust for confounders)
- Positive continuous variables commonly log-transformed outcomes: normalize and equalize variance
- predictors: get rid of non-linearity, interaction
- more about this is session 6
- Bothlog-10(HIV viralload) and natural log transformations used
- How does this affect interpretation of regression coefficients
- For natural-log or log-10 transformed predictor xj, \xDFˆj estimates the increase in the mean of the outcome for each 1-log increase in log-transformed xj - equivalently a 2.7-fold or 10-fold increase in untransformed value of xj.
- \xDFˆjln(1+k/100) estimates the change in the mean of the outcome for each k% increase in untransformed xj.
- Note: p-value for test of \xDFj =0 unaffected by choice of k
- Use \xDFˆjlog10(1+k/100) if xj is log10
- Use nlcom to get interpretable estimates with confidence interval(lincom does not allow log() as argument)