- HERS: Randomized clinical trial n=2763
- Post menopausal women with hx of MI
- Randomized to placebo v. hormone therapy
- Outcome: Second MI
- Wealth of baseline data
- HDL associated w/ waist circumference?
- Sample of 221 subjects - baseline data

- Take as given that mean is a good summary
- How does mean HDL depend on waist circumference?
- Other methods are not ideal
- Try forming groups based on value of waist circumference...

- Advantages:
- Simplicity
- Interpretable

- Disadvantages:
- Choice of groups
- More groups: resolution but variability
- Maybe important differences w/in groups
- Hard to describe

- No groups
- Mean changes continuously with predictor
- Uses all the data to predict mean at any point (borrows strength)
- Can work around linear assumption
- Has a straightforward interpretation!

- Study of 52 HIV+ individuals
- Recruited from the SFGH Neurology clinic
- Studied cross-sectionally
- Not on anti retroviral therapy
- HIV-RNA determined in plasma and CSF
- How are the two related?

- Nonparametric method
- Draws and connects a series of local lines
- Result: flexible smooth curve
- Mean of y as a function of x non-linear regression line
- Useful tool for exploring association

- Break the data into a series of groups Data
- Fit a line to each of those groups
- "Connect" them Four local lines
- What would that look like?
- It would be kind a like a smooth Four local lines Lowess Smooth

- Many methods for smoothing.
- LOWESS (locally weighted scatterplot smoothing) is the most popular
- Depends on inputs how smooth to make the line
- Oversmooth: linear fit
- Undersmooth: connects the dots
- Programs have defaults (80% smoothing)
- Methods work by "local" regression

- twoway scatter csfrna hivrna scatterplot
- lowess csfrna hivrna lowess curve for data
- Menu: Graphics, Twoway Graphs

- Based on a 19 subject subsample of the HERS data
- Makes it easier to visualize data
- Effects of the outliers is more vivid

(See also: Least Squares)

- Large if....
- Predictor value is far from mean
- Large residual in regression
- Relatively few values
- Same outlier has little effect in big dataset

- Doesn't focus solely on significance
- Quotes slope and 95% CI
- Perhaps R-squared
- Don't bother with sum of squares

- Is the effect of waist on HDL causal? -1cm change in person translate to +.45?
- Maybe waist circ. reflecting different populations? with different diet and exercise patterns?
- How can we isolate the effect of waist? adjusting for diet and exercise! *Solution... Multiple Linear Regression

- Linear regression is a powerful tool for interpreting associations
- Extends familiar methods (t-test, ANOVA)
- Series of assumptions' (Lines, normality) all of which can be relaxed