# Title: UCSF - Simple Linear Regression

Start presentation

## Slide 1: Simple Linear Regression

Example
• HERS: Randomized clinical trial n=2763
• Post menopausal women with hx of MI
• Randomized to placebo v. hormone therapy
• Outcome: Second MI
• Wealth of baseline data
• HDL associated w/ waist circumference?
• Sample of 221 subjects - baseline data
Scatterplot

## Slide 2: Why regression?

• Take as given that mean is a good summary
• How does mean HDL depend on waist circumference?
• Other methods are not ideal
• Try forming groups based on value of waist circumference...
Scatterplot: Mean & 95% CI

## Slide 3: Grouping Approach

Scatterplot - Waist Circumference
• Simplicity
• Interpretable
• Choice of groups
• More groups: resolution but variability
• Maybe important differences w/in groups
• Hard to describe

## Slide 4: Linear Regression

Linear Regression
• No groups
• Mean changes continuously with predictor
• Uses all the data to predict mean at any point (borrows strength)
• Can work around linear assumption
• Has a straightforward interpretation!

## Slide 5: Linear Regression Mean

Is Line Reasonable?

## Slide 6: Example

• Study of 52 HIV+ individuals
• Recruited from the SFGH Neurology clinic
• Studied cross-sectionally
• Not on anti retroviral therapy
• HIV-RNA determined in plasma and CSF
• How are the two related?
Does line fit well?

## Slide 7: Scatterplot Smoother

Scatterplot with line
• Nonparametric method
• Draws and connects a series of local lines
• Result: flexible smooth curve
• Mean of y as a function of x non-linear regression line
• Useful tool for exploring association

## Slide 9: Scatterplot Smoother

• Many methods for smoothing.
• LOWESS (locally weighted scatterplot smoothing) is the most popular
• Depends on inputs how smooth to make the line
• Oversmooth: linear fit
• Undersmooth: connects the dots
• Programs have defaults (80% smoothing)
• Methods work by "local" regression

## Slide 10: STATA

• twoway scatter csfrna hivrna scatterplot
• lowess csfrna hivrna lowess curve for data
Linear regression variability

## Slide 11: Next Example

• Based on a 19 subject subsample of the HERS data
• Makes it easier to visualize data
• Effects of the outliers is more vivid

## Slide 12: Scatterplot + Fitted

Scatterplot and Fitted

## Slide 13: Least Squares

Simple Linear Regression - Scatterplot

## Slide 14: Effect of Outlier

Which fits better?
• Large if....
• Predictor value is far from mean
• Large residual in regression
• Relatively few values
• Same outlier has little effect in big dataset

## Slide 15: Interpreting Regression

Less influence of outlier

## Slide 16: Intercept/Slope

Intercept and Slope

## Slide 18: Answers - Question 1

Question 1: How does mean HDL vary with waist circumference?

## Slide 19: Answers - Question 2

Question 2: Is the association significant?

## Slide 20: Answers - Question 3

Question 3: How much of HDL variation is explained by variation in waist circumference?

Question 3

## Slide 22: Sample Paragraph

There is an inverse association between waist circumference and HDL (p=0.002) with each one cm increase in waist circumference associated with a -0.196 mg/dL decrease in HDL, 95% CI (-0.32,-0.07). Even though the relationship was significant, waist circumference accounted for only 4% of the observed variance in HDL.

## Slide 23: Good Paragraph

• Doesn't focus solely on significance
• Quotes slope and 95% CI
• Perhaps R-squared
• Don't bother with sum of squares

## Slide 24: Confounding

• Is the effect of waist on HDL causal? -1cm change in person translate to +.45?
• Maybe waist circ. reflecting different populations? with different diet and exercise patterns?
• How can we isolate the effect of waist? adjusting for diet and exercise! *Solution... Multiple Linear Regression