Return to Course Materials
Title: UCSF - Simple Linear Regression
Lead Author(s): David Glidden, PhD
Start presentation
Slide 1: Simple Linear Regression
Example
- HERS: Randomized clinical trial n=2763
- Post menopausal women with hx of MI
- Randomized to placebo v. hormone therapy
- Outcome: Second MI
- Wealth of baseline data
- HDL associated w/ waist circumference?
- Sample of 221 subjects - baseline data
Scatterplot
Slide 2: Why regression?
- Take as given that mean is a good summary
- How does mean HDL depend on waist circumference?
- Other methods are not ideal
- Try forming groups based on value of waist circumference...
Scatterplot: Mean & 95% CI
Slide 3: Grouping Approach
Scatterplot - Waist Circumference
- Advantages:
- Disadvantages:
- Choice of groups
- More groups: resolution but variability
- Maybe important differences w/in groups
- Hard to describe
Slide 4: Linear Regression
Linear Regression
- No groups
- Mean changes continuously with predictor
- Uses all the data to predict mean at any point (borrows strength)
- Can work around linear assumption
- Has a straightforward interpretation!
(See also: Linear Regression)
Slide 5: Linear Regression Mean
Is Line Reasonable?
Slide 6: Example
- Study of 52 HIV+ individuals
- Recruited from the SFGH Neurology clinic
- Studied cross-sectionally
- Not on anti retroviral therapy
- HIV-RNA determined in plasma and CSF
- How are the two related?
Does line fit well?
Slide 7: Scatterplot Smoother
Scatterplot with line
- Nonparametric method
- Draws and connects a series of local lines
- Result: flexible smooth curve
- Mean of y as a function of x non-linear regression line
- Useful tool for exploring association
Slide 8: Suppose - Let's consider 4 equal-sized groups
Slide 9: Scatterplot Smoother
- Many methods for smoothing.
- LOWESS (locally weighted scatterplot smoothing) is the most popular
- Depends on inputs how smooth to make the line
- Oversmooth: linear fit
- Undersmooth: connects the dots
- Programs have defaults (80% smoothing)
- Methods work by "local" regression
- twoway scatter csfrna hivrna scatterplot
- lowess csfrna hivrna lowess curve for data
- Menu: Graphics, Twoway Graphs
Linear regression variability
Slide 11: Next Example
- Based on a 19 subject subsample of the HERS data
- Makes it easier to visualize data
- Effects of the outliers is more vivid
Slide 12: Scatterplot + Fitted
Scatterplot and Fitted
Slide 13: Least Squares
Simple Linear Regression - Scatterplot
(See also: Least Squares)
Slide 14: Effect of Outlier
Which fits better?
- Large if....
- Predictor value is far from mean
- Large residual in regression
- Relatively few values
- Same outlier has little effect in big dataset
Slide 15: Interpreting Regression
Less influence of outlier
Slide 16: Intercept/Slope
Intercept and Slope
Slide 17: Questions a Regression Can Answer
Questions a regression can answer
Slide 18: Answers - Question 1
Question 1: How does mean HDL vary with waist circumference?
Question 1 Answer
Slide 19: Answers - Question 2
Question 2: Is the association significant?
Question 2 Answer
Slide 20: Answers - Question 3
Question 3: How much of HDL variation is explained by variation in waist circumference?
Question 3 Answer
Slide 21: Answers - Question 3
Question 3
Slide 22: Sample Paragraph
There is an inverse association between waist circumference and HDL (p=0.002) with each one cm increase in waist circumference associated with a -0.196 mg/dL decrease in HDL, 95% CI (-0.32,-0.07). Even though the relationship was significant, waist circumference accounted for only 4% of the observed variance in HDL.
Slide 23: Good Paragraph
- Doesn't focus solely on significance
- Quotes slope and 95% CI
- Perhaps R-squared
- Don't bother with sum of squares
Slide 24: Confounding
- Is the effect of waist on HDL causal? -1cm change in person translate to +.45?
- Maybe waist circ. reflecting different populations? with different diet and exercise patterns?
- How can we isolate the effect of waist? adjusting for diet and exercise! *Solution... Multiple Linear Regression
(See also: Bias or Confounding)
Slide 25: Summary
- Linear regression is a powerful tool for interpreting associations
- Extends familiar methods (t-test, ANOVA)
- Series of assumptions' (Lines, normality) all of which can be relaxed