Multiple Predictor Linear Model

Return to Course Materials

Lead Author(s): David Glidden, PhD

Start presentation

Slide 1: Multiple predictor linear regression

Models dependence of the mean of continuous outcome on multiple predictors simultaneously
By including multiple predictors we can try to
- control confounding of treatment effects by indication, risk factor effects by demographics, other covariates
- examine mediation of treatment, risk factor effects
- assess interaction of treatment effects or exposure with sex, race/ethnicity, genotype, other effect modifiers
- get at causal mechanisms in observational data
- also: account for stratified or multi-center design of RCT, increase precision of estimates

Slide 2: Components of the Linear Model

Systematic:
- how does the average value of outcome y depend on values of the predictors?
Random:
- at each observed value of the predictors, values of y
  are distributed about the predicted average
- assumed distribution of deviations underlies
  hypothesis tests, p-values, and confidence intervals

Slide 3: Systematic part of the model

In abstract terms, model written as
- Σ[y|x]= \xDF0 + \xDF1x1 + \xDF2x2 + \xB7\xB7\xB7 + \xDFpxp
Σ[y|x]: Expected or average value of y for a given set of predictors x = x1,x2,\xB7\xB7\xB7 ,xp
\xDFj: change in average value of outcome y per unit increase in predictor xj, holding all other predictors
constant
\xDF0 (the intercept): average value of the outcomey when
all predictors = 0
"Linear predictor" common to linear, logistic, Cox, and longitudinal models

Slide 4: Interpretation of regression coefficients

\xDFj: change in average value of outcome y per unit increase in predictor xj, holding all other predictors
constant
Hold x2,...,xp constant, and let x1= k:
- Σ[y|x]= \xDF0 + \xDF1k+ \xDF2x2 + \xB7\xB7\xB7 + \xDFpxp (1)
Now increase x1 by one unit to k+1:
- Σ[y|x]= \xDF0 + \xDF1(k +1)+ \xDF2x2 + \xB7\xB7\xB7 + \xDFpxp (2)
Subtracting(1) from(2) gives \xDF1, for every value of k as well as x2,...xp
Note: assumes x1 does not interact with x2,...xp

Slide 5: Interpretation of regression coefficients

\xDF0: average value of outcome y when all predictors = 0
Let x1 = x2 = \xB7\xB7\xB7 = xp 0. Then E[y|x] \xDF0 + \xDF1x1 + \xDF2x2 + \xB7\xB7\xB7 + \xDFpxp = \xDF0
Intercept: where the regression line meets the y-axis in single-predictor models

Slide 6: Review: centering predictors

Same as in single-predictor model
For many continuous predictors like age, SBP, LDL, no one has value 0
Solution: center them on their sample means, so new variable has value 0 for observations at the mean
For binary predictors, 0 is the usual coding for the reference group, so not a problem for interpretation
With centering, \xDF0 estimates expected value of y for participant at reference level of binary predictors, mean of centered continuous predictors
Values and interpretation of other coefficients unaffected

Slide 7: Review: rescaling predictors

Same as in single-predictor model
Rescaled variable Xrs = X/k
Coefficient for Xrs interpretable as increase in mean of outcome for a k-unit increase in X
If k = SD(X), coefficient forXrs interpretable as increase in mean of outcome for a 1 SD increase in X
\xDFˆ(Xrs)= k\xDFˆ(X);SE(\xDFˆ), 95% CI for \xDFˆalso rescaled
P-value for \xDF, intercept coefficient unaffected
Can accomplish the same thing using lincom

Slide 8: Random part of the model

yi =Σ[y|xi]+εi

Outcome yi varies from the average at xi by an amount oi
ε represents unmeasured sources of variation, error
As in single-predictor model, four assumptions about o:

1. Normally distributed
2. mean zero at every value of x
3. constant variance
4. statistically independent

These assumptions underlie hypothesis tests, confidence intervals, p-values, also model checking

Slide 9: Assumptions about the predictors

Nodistributional assumptions(e.g. Normality)
- predictorscanbecontinuous,discrete(e.g. counts),
  categorical(dichotomous, nominal, ordinal)
Linear regression works better if
- predictors are relatively variable
- there are no excessively "influential" points
Assumed measured without error(otherwise "regression dilution bias" and residual confounding)

Slide 10: Update of two details

Fitted value:ˆyi = \xDFˆ0 + \xDFˆ1xi1 + \xB7\xB7\xB7 + \xDFˆpxip - estimated average or expected value of outcome y when
x = xi, the predictor values for observation i
- now depends on multiple predictors instead of just one
Residual: ri = yi -yˆi =ˆoi
- difference between datapoint and fitted value
- sample analogue of oi, used in checking model fit
- not obvious what "vertical" means with multiple predictors

Slide 11: Ordinaryleast squares(OLS)

Method for fitting linear regression models
OLS finds values of regression coefficients which minimize residual sumof squares(RSS; i.e. sumof squared residuals)
Good statistical properties: unbiased, efficient, easy to compute, but sensitive to outliers
For normally distributed outcomes, OLS is equivalent to "maximumlikelihood" (methodusedforlogistic,Cox, some repeated measures, many other models)

Slide 12: Multi-predictor linear model for glucose

Multi-predictor linear model for glucose

Upper left(ANOVA table)
- Total SS =Σn/i=1(yi-\xAFy)2: variability of outcome yi=1(yi - about the sample average \xAFy n
- Total MS =(yi -y\xAF)2/(n -1): sample variance i=1 of outcome y n
- Model SS = (ˆyi -y\xAF)2: variability of outcome i=1 accounted for by predictors included in model
- Model MS: numerator of model F-statistic n
- Residual SS =(yi -yˆi)2: residual variability i=1 not accounted for by predictors, what OLS minimizes
- Residual MS = yi)2/(n -p): sample i=1(yi - ˆvariance of residuals

Slide 13: Interpreting Stata regression output

Interpreting STATA regression output

Slide 14: Summary of model

Multipredictor linear regression is a tool for estimating how the average value of a continuous outcome depends
on multiple predictors simultaneously
Inferential machinery evaluates precision of estimates and whether sampling error can account for findings
Coefficients generally interpretable as the change in theaverage value of the outcome per unit increase in the
predictor, holding all other predictors constant
Power helped by effect size, sample size, variability of predictor; hurt by correlation with other predictors,
variability left unexplained

Slide 15: Confounding

Can account for the some or all of the unadjusted association between a predictor and an outcome
Controlling confounding the primary reason for doing multi-predictor regression
Confounders must be associated with predictor and independently with outcome
Only an association adjusted for confounders can be viewed as possibly causal

Slide 16: Unadjusted waist/glucose association

Unadjusted waist/glucose association

Slide 17: Adjusted waist/glucose association

Adjusted waist/glucose association

Slide 18: Primary predictor, confounder, and outcome

Primary predictor, confounder and outcome

Adjusting for a confounder

Primary predictor and confounder are correlated:
- values of primary predictor larger in subgroup 2 than subgroup 1
- conversely, those with larger values of primary predictor more likely in subgroup 2
Both continuousprimarypredictor andbinary confounder independently predict higher values of outcome
Unadjusted effect of primary predictor partly reflects effect of being in subgroup 2
Adjustment for the confounder fixes the problem

Slide 19: Interpretation of results

Unadjusted estimateforprimarypredictor(6.2)
- Estimates an observable trend in whole population
- Causal interpretation misleading in most contexts
Adjusted estimate(3.3) may have a causalinterpretation, because the effect of the confounder is not ignored
Regression lines for subgroups 1 and 2:
- slopes estimate predictor/outcome association within
  each subgroup("holding subgroup constant")
- assumedparallel(nointeraction - sameeffectinboth
  subgroups)

Behavior of regression coefficients for this case

When the primary predictor and confounder are positively correlated, both predict higher(or lower)
Values of the outcome adjusted coefficient for primary predictor is attenuated: that is, closer to zero than unadjusted coefficient in this case, still non-zero and signficant
Typical pattern for confounding

Slide 20: Another case: so-called negative confounding

Confounding can also "mask" an independent association
Example: needlestick injuries and HIV-seroconversion
- overall, AZT prophylaxis does not predict seroconversion, but* use of AZT associated with severity of injury * severity of injury predicts seroconversion
- protective effect of AZT unmasked after controlling
  for severity of injury

Slide 21: Negative confounding: two scenarios

Negative confounding may arise between predictors that are

Positively correlated, with opposite effects on outcome:
Example: injury severity, AZT, and seroconversion
Negatively correlated, with similar effects on outcome:
Example: average BMI decreases with age in HERS
cohort, but both predict increased SBP

Slide 22: Summary: negative confounding

Average BMI decreases with age in HERS cohort, but both predict increased SBP
Adjustment for age increases BMI slope estimate from .21 to .30 mmHg per kg/m2
Negative confounding is not all that uncommon
Implications for predictor selection: univariate screening, "forward" selection procedures may miss some negatively confounded predictors

Slide 23: Confounding is difficult to rule out

Were all important confounders adjusted for?
Were they measured accurately?
Were their effects modeled adequately?
- modeled non-linearities in response to continuous
  predictors(Session 6)
- no omittedinteractions(Session5)
- no gross extrapolations
Modeling difficulties used to argue for propensity scores

Slide 24: Summary

Confounders must be associated with predictor and independently with outcome
Unadjusted, adjusted coefficients estimate different things
Unadjusted association may be partly or completely explained or, conversely, unmasked after adjustment
Regression controlsfor confounding byjointly modeling effects ofpredictor and confounders(VGSMSect. 4.4)
Bigger samples don't help, except by making it easier to adjust
Controlling for covariates is easy enough, but residual confounding is difficult to rule out

Slide 25: Causal diagrams Mediation

Confounders are thought to cause the primary predictor, or are correlates of such a cause
In contrast, mediators are on the causal pathway from primary predictor to the outcome
In models, mediation and confounding behave alike and must be distinguished on substantive grounds
Example: to what extent is effect of BMI on SBP mediated by its effects on glucose levels?

Slide 26: Examining mediation

Use a series of models to show that:
- primary predictor independently predicts mediator
- mediator predicts outcome independently of primary predictor
- adjustment for mediator attenuates estimate for primary predictor
The models:
- regress mediator on predictor and confounders
- regress outcome on predictor and confounders
- regress outcome on predictor, mediator, and confounder

Slide 27: Mediation

Interpretation of coefficient estimates for primary predictor:
- before adjustment for mediator: overall effect
- after adjustment: effect, if any, via pathways other than the mediator
Assess mediation by difference between coefficients for primary predictor before and after adjustment for mediator
Hypothesis tests, CIs for difference and proportion of effect explained abitharder(seebookfor references)
Example: is association of BMI with SBP mediated by glucose levels?

Slide 28: Mediation of BMI by glucose levels

BMI independently predicts higher glucose: 1.7 mg/dL (95% CI 1.4-1.9) for each kg/m2
increase in BMI
A 10 mg/dL increase in glucose levels is independently associated withhigherSBP:0.5 mmHg(95%CI0.3-0.7)
Overall BMI effect: before adjustment for glucose levels, each additional kg/m2 predicts an increase of .25 mmHg (95% CI 0.12-0.38) in average SBP
Direct BMI effect via other pathways: after adjustment for glucose levels, each kg/m2 predicts an increase of only .16 mmHg(95%CI0.03-0.30)
Degree of attenuation(PTE):glucoselevels explain (.25-.16)/.25*100 = 34% of the effect of BMI on SBP

Slide 29: Mediation issues

An observational analysis even when the primary predictor is treatment in RCT; must control for
confounding of mediator effects.
Evidence for mediation potentially stronger in longitudinal data
- but when predictor is both a mediator and a confounder, fancier methods required: e.g., "marginal structural models"
"Negative" mediation is possible: glitazones, weight, bone loss; HT, statin use, CHD events

Slide 30: Negative mediation

TZDs cause bone loss in mouse models.
In HABC, TZD use not associated with bone loss, after controlling for confounders by indication
TZDs also cause weight gain, which is protective against bone loss
TZDs do predict bone loss, after controlling for weight gain: adverse effect emerges after controlling for
beneficial effect via weight gain
In HERS, statin use differentially increased in placebo group, and controlling for this makes HT look a bit protective

Slide 31: Summary: mediation

Regression coefficients change when either a confounder or a mediator is added to the model; which is which depends on how you draw the causal arrows(statistics not informative)
Negative mediation is possible
Must control for confounders of mediator
Estimated independent effect of primary predictor
- before adjustment for mediator: overall effect
- after adjustment: direct effect via other pathways
  (assumingboth models adjust for confounders)

Slide 32: Interpreting results for log-transformed variables

Positive continuous variables commonly log-transformed outcomes: normalize and equalize variance
- predictors: get rid of non-linearity, interaction
- more about this is session 6
Bothlog-10(HIV viralload) and natural log transformations used
How does this affect interpretation of regression coefficients

Slide 33: Log-transformed predictors

For natural-log or log-10 transformed predictor xj, \xDFˆj estimates the increase in the mean of the outcome for each 1-log increase in log-transformed xj - equivalently a 2.7-fold or 10-fold increase in untransformed value of xj.
\xDFˆjln(1+k/100) estimates the change in the mean of the outcome for each k% increase in untransformed xj.
Note: p-value for test of \xDFj =0 unaffected by choice of k
Use \xDFˆjlog10(1+k/100) if xj is log10
- transformed
Use nlcom to get interpretable estimates with confidence interval(lincom does not allow log() as argument)