Case Control Design - Sampling Controls at Follow-up

Lead Author(s): Jeff Martin, MD

Summary of Case-Control Sampling
Definition of Prevalent Controls
Diagram Using Prevalent Controls at Follow-up
Problems with Using Prevalent Controls at Follow-up
Limitation of Study Controls-FIRST PROBLEM
Bias in Waiting until Follow-up- SECOND PROBLEM
Inability to Calculate the Risk Ratio -THIRD PROBLEM
- Example of Inability to Calculate the Risk Ratio
- Example Showing Incorrect Odds Ratio
Prevalent Controls - Rare Disease Assumption-FOURTH PROBLEM
Case Control with Low Incidence-PROBLEM
- Example of Case Control with Low Incidence
CAVEAT: Sampling Non-cases May Introduce Bias-PROBLEM

Summary of Case-Control Sampling

Design	Sampling	Measure of Association
Case-cohort	Entire cohort at baseline	risk ratio
Incidence-density	Non-cases at time of diagnosis	rate ratio
Prevalent Case Control	Non-cases at single point in time	odds ratio

Design

Sampling

Measure of Association

Case-cohort

Entire cohort at baseline

risk ratio

Incidence-density

Non-cases at time of diagnosis

rate ratio

Prevalent Case Control

Non-cases at single point in time

odds ratio

A random sample of the cohort baseline = case-cohort design
- (Sampling Controls - Random Sample at Baseline)

At time each case is diagnosed = incidence density sampling
- (Sampling Controls - Incidence Density Sampling)

From persons without disease at the end of follow-up = prevalent controls design

Definition of Prevalent Controls

Sampling only non-cases in a primary or secondary study base

This case-control design uses prevalent controls at follow-up .

Called prevalent controls because controls are sampled from those without disease with a cross-sectional sample of the study base

Odds ratio approximates risk ratio only if disease occurrence is rare

This sampling is the classic instance of needing the rare disease assumption assumption that many text books discuss

because the OR will approximate the risk ratio only if the incidence is low or rare.

Diagram Using Prevalent Controls at Follow-up

In the diagram below, one can see prevalent controls drawn from the non-diseased individuals at follow-up.

Problems with Using Prevalent Controls at Follow-up

This is the design that most neophytes are drawn to. It is the least desirable of the three types of control sampling but it used to be the most common. That may no longer be the case as researchers are becoming more sophisticated about case-control design.

LIST PROBLEMS AND LINK

Limitation of Study Controls-FIRST PROBLEM

So even if all the cases are captured as in the schematic,

the controls are drawn only from those present at the time the study is conducted.

So unlike the case-cohort and the case-control with incidence density sampling designs,

no cases can be included in the control group.

Because the cases are excluded,

the control group can no longer represent the entire baseline population of the cohort.

Bias in Waiting until Follow-up- SECOND PROBLEM

As you can see one of the problems with this design is that there is an obvious source of potential bias in waiting until the end of follow-up to select controls

because factors that influence loss to follow-up will influence the selection of controls.

Furthermore, losses to follow-up and deaths also make this group of controls not very representative of the population that gave rise to the cases.

Nor can it represent the person-time of the cohort because time is not represented throughout the study base experience in sampling the controls, only one time point is used.

If those factors are associated with both your predictor variable and your outcome, the measure of association will be biased.

Inability to Calculate the Risk Ratio -THIRD PROBLEM

In case control design, Case Control Design: OR equals RR.

ratio is known in all case-control designs

BUT sampling only non-cases cannot get unbiased estimate of

The ratio of exposed to unexposed in the whole cohort

can only be estimated by a sample of everyone at the beginning of follow-up,
not just those who remain non-cases at the end of follow-up.

Example of Inability to Calculate the Risk Ratio

So using prevalent controls, you get:

60 non-cases in the exposed
90 non-cases in the unexposed

Example Showing Incorrect Odds Ratio

If you look at the odds calculation:

ad/bc = OR

IN this example,

(40 * 90) / (60 * 10) = 6.0

One quarter of the cohort has been diagnosed with disease during the cohort follow-up

l eaving only 150 of the original 200 left from which to select controls using the prevalent control case-control design.

Since the original cohort was divided 50/50 by exposure and the

odds of disease among exposed versus unexposed cases is 4 to 1,
the remaining subjects without disease will have a ratio of 60/90 or 2/3 of exposed to unexposed.

In other words, the odds of exposure in the eligible controls will be 2/3 and the odds ratio will be 4 divided by 2/3 = 6.0.

These numbers use everyone in the cohort and the case-control study will only use a sample of 150 remaining without disease but as they will be sampled independently of exposure status the ratio of 2/3 also applies to any random sample of controls.

Thus the OR in this example is much larger than the risk ratio and cannot be considered even an approximation of it.

Prevalent Controls - Rare Disease Assumption-FOURTH PROBLEM

If controls are selected among those without disease at time of study (+/- prevalent cases), the OR approximates risk ratio only with the

rare disease assumption.

Case Control with Low Incidence-PROBLEM

If the disease only removes a few persons from the original cohort,

the ratio of exposure in those remaining will stay close to the original ratio at baseline.

It follows that estimating N0/N1 by using prevalent controls

becomes increasing more valid as the number removed by the disease gets smaller.

Example of Case Control with Low Incidence

IN this example,

(4 * 99) / (96 * 1) = 4.13

Assuming that the incidence of disease was 2.5% (5 out of 200 developed disease),

the OR is only slightly higher than the risk ratio for the simple reason
that the ratio of exposure in the remaining non-cases is close to 1.0,
- which is what it was in the whole cohort at baseline.

The somewhat arbitrary rule of thumb of incidence below 10% is sometimes given as what is meant by a rare disease

If the incidence were 10% (16 exposed cases and 4 unexposed cases),

OR = 4.57 (Do you think that this is a good approximation of 4.0?)

CAVEAT: Sampling Non-cases May Introduce Bias-PROBLEM

Disease may remove few from study base sampled for controls, but other sources of loss to* *loss to follow-up can bias control group.

The rare disease assumption only looks at the effect of removing potential controls who are diagnosed with the outcome, the disease.

Losses to follow-up and deaths among potential controls from the study base givingrise to the cases affect who is available at one point in time.

Looking at the study base that gave rise to those cases over time, some members of the study base population at time zero will not be in the population of non-cases sampled at the end of the time when all the cases have been ascertained. Some will have left the study base or died, and these changes in the group of non-cases who are sampled can bias, the estimate of exposure in the controls. Since no information is available on who left the study base with the prevalent controls design, the nature of this bias cannot be known. Thus, even though the rare disease assumption is met, the OR from this type of case-control sampling may give a biased estimate of the risk ratio.