Return to Course Materials
  Title: UCSF - Simple Linear Regression 
 Lead Author(s): David Glidden, PhD
Start presentation
  Slide 1: Simple Linear Regression 
 Example  
-  HERS: Randomized clinical trial n=2763
  -  Post menopausal women with hx of MI
  -  Randomized to placebo v. hormone therapy
  -  Outcome: Second MI
  -  Wealth of baseline data
  -  HDL associated w/ waist circumference?
  -  Sample of 221 subjects - baseline data
 
 
Scatterplot 
  Slide 2: Why regression? 
 
-  Take as given that mean is a good summary
  -  How does mean HDL depend on waist circumference?
  -  Other methods are not ideal
  -  Try forming groups based on value of waist circumference...
 
 
Scatterplot: Mean & 95% CI 
  Slide 3: Grouping Approach 
 Scatterplot - Waist Circumference  
-  Advantages:   
  -  Disadvantages:  
-  Choice of groups
  -  More groups: resolution but variability
  -  Maybe important differences w/in groups
  -  Hard to describe
 
 
 
 
  Slide 4: Linear Regression 
 Linear Regression  
-  No groups
  -  Mean changes continuously with predictor
  -  Uses all the data to predict mean at any point (borrows strength)
  -  Can work around linear assumption
  -  Has a straightforward interpretation!
 
 
(See also: Linear Regression) 
  Slide 5: Linear Regression Mean 
 Is Line Reasonable? 
  Slide 6: Example 
  
  
-  Study of 52 HIV+ individuals
  -  Recruited from the SFGH Neurology clinic
  -  Studied cross-sectionally
  -  Not on anti retroviral therapy
  -  HIV-RNA determined in plasma and CSF
  -  How are the two related?
 
 
Does line fit well? 
  Slide 7: Scatterplot Smoother 
 Scatterplot with line  
-  Nonparametric method
  -  Draws and connects a series of local lines
  -  Result: flexible smooth curve
  -  Mean of y as a function of x non-linear regression line
  -  Useful tool for exploring association
 
 
  Slide 8: Suppose - Let's consider 4 equal-sized groups 
  
  Slide 9: Scatterplot Smoother 
 
-  Many methods for smoothing.
  -  LOWESS (locally weighted scatterplot smoothing) is the most popular
  -  Depends on inputs how smooth to make the line
  -  Oversmooth: linear fit
  -  Undersmooth: connects the dots
  -  Programs have defaults (80% smoothing)
  -  Methods work by "local" regression
 
 
 
-  twoway scatter csfrna hivrna scatterplot
  -  lowess csfrna hivrna lowess curve for data
  -  Menu: Graphics, Twoway Graphs
 
 
Linear regression variability 
  Slide 11: Next Example 
 
-  Based on a 19 subject subsample of the HERS data
  -  Makes it easier to visualize data
  -  Effects of the outliers is more vivid
 
 
  Slide 12: Scatterplot + Fitted 
 Scatterplot and Fitted 
  Slide 13: Least Squares 
 Simple Linear Regression - Scatterplot 
 (See also: Least Squares) 
  Slide 14: Effect of Outlier 
 Which fits better?  
-  Large if....
  -  Predictor value is far from mean
  -  Large residual in regression
  -  Relatively few values
  -  Same outlier has little effect in big dataset
 
 
  Slide 15: Interpreting Regression 
 Less influence of outlier 
  Slide 16: Intercept/Slope 
 Intercept and Slope 
  Slide 17: Questions a Regression Can Answer 
 Questions a regression can answer 
  Slide 18: Answers - Question 1 
Question 1: How does mean HDL vary with waist circumference?
Question 1 Answer
  Slide 19: Answers - Question 2 
Question 2: Is the association significant?
Question 2 Answer 
  Slide 20: Answers - Question 3 
Question 3: How much of HDL variation is explained by variation in waist circumference?
Question 3 Answer 
  Slide 21: Answers - Question 3 
 Question 3 
  Slide 22: Sample Paragraph 
There is an inverse association between waist circumference and HDL (p=0.002) with each one cm increase in waist circumference associated with a -0.196 mg/dL decrease in HDL, 95% CI (-0.32,-0.07). Even though the relationship was significant, waist circumference accounted for only 4% of the observed variance in HDL.
  Slide 23: Good Paragraph 
 
-  Doesn't focus solely on significance
  -  Quotes slope and 95% CI
  -  Perhaps R-squared
  -  Don't bother with sum of squares
 
 
  Slide 24: Confounding 
 
-  Is the effect of waist on HDL causal? -1cm change in person translate to +.45?
  -  Maybe waist circ. reflecting different populations? with different diet and exercise patterns?
  -  How can we isolate the effect of waist? adjusting for diet and exercise! *Solution... Multiple Linear Regression
 
 
(See also: Bias or Confounding) 
  Slide 25: Summary 
 
-  Linear regression is a powerful tool for interpreting associations
  -  Extends familiar methods (t-test, ANOVA)
  -  Series of assumptions' (Lines, normality) all of which can be relaxed