Lead Author(s): Jeff Martin, MD

Definition of Matching

Definition of Matching: Matching is another method that can be used in the design phase to reduce confounding.

Matching to Reduce Confounding - Cohort Study

In the following DAG, you see that in a Cohort Study Matching results in prohibiting any association between the exposure and the confounder.

For example, in a cohort study, if we were concerned about race as a confounder, you might match on race.

Matching to Reduce Confounding - Case-Control Study

In the following DAG, you see that in a Case Control Study Matching precludes any association between the disease and the confounder.

In a case-control study, if, for example, we performed a study of, say, diet and cancer and were worried about age as a potential confounder, we might match on age.

Advantages of Matching

1. Best way to manage or prevent certain confounding variables, such as neighborhood 2. By ensuring a balanced number of cases and controls (in a case-control study) or exposed/unexposed (in a cohort study ) within the various strata of the confounding variable, Matching brings you increased statistical precision in your inferences (i.e., smaller standard errors and narrower confidence intervals).

Example of Matching by Neighborhood

Here is an example of matching by neighborhood in a case-control study of the effect of a bacille Calmette-Guerin (BCG) vaccine in preventing tuberculosis (TB) (Dantas) Control sampling: Relying upon random sampling without attention to neighborhood may result in choosing no controls from some of the neighborhoods (especially in a small study) seen in the case group (i.e., cases and controls lack overlap) In this example the investigators examined the effect of a second BCG vaccine in preventing the occurrence of TB.

Example of Matching in San Francisco

Think of all the neighborhoods in San Francisco.

Neighborhood is a nominal variable with multiple values; If one had to rely upon random sampling of controls, you might end up choosing no controls from some of the neighborhoods seen in the case group. This is especially true in a small study. By matching on neighborhood (diagram below),

Example of Matching for Precision

Here is an example of the advantage of matching to improve precision.

If one performs a case-control study and randomly samples controls from the community, you'll get the following (below - A): a crude or unadjusted odds ratio of 8.8 looking at the association between matches and lung cancer.

But what if we instead matched on smoking when we sampled the controls. We end up with better balance in the smoking strata (see below - B).
There exist examples where the benefit of matching in terms of precision is greater than the example shown here.

Disadvantages of Matching

1. Finding appropriate matches may be difficult and expensive. First of all, it may be time-consuming to sift through many individual records to find appropriate matches. As long as you believe you will have overlap in key potential confounders and are not dealing with a complex nominal variable, the inefficiencies of having to find matches may outweigh the benefits gained in statistical precision.

2. In a case-control study, a factor used to match subjects cannot be itself evaluated as a risk factor for the disease. Second, in a case-control study, because you have artificially precluded any association between the potential confounder and the outcome, you cannot directly assess in this study whether this factor is indeed related to the outcome. This illustrates how matching in general works toward reducing the robustness of a study for secondary research questions.

3. Decisions are irrevocable - Third, the decisions you make about matching are irrevocable. For example, say you matched upon an intermediary variable, then you have likely lost the ability to look for an effect of your exposure through that pathway. 4. If potential confounding factor really is not a confounder, Fourth, if the variable you are concerned about producing confounding really is not a confounder, you will actually suffer losses, not gains, in statistical precision compared to the situation where you did not match. Suffice it to say, that the heyday for matching was clearly in the past before the advent of newer mathematical regression modeling techniques and the computers to run them.

Disadvantages can sometimes outweigh advantages in matching. ALWAYS Think carefully before you match and seek advice!


Dantas, O. M., Ximenes, R. A., de Albuquerque Mde, F., da Silva, N. L., Montarroyos, U. R., de Souza, W. V., et al. (2006). A case-control study of protection against tuberculosis by BCG revaccination in Recife, Brazil. Int J Tuberc Lung Dis, 10(5), 536-541.