Kaplan-Meier Method for Calculating Cumulative Incidence
Calculating Cumulative Incidence with the Kaplan-Meier Method
To calculate cumulative incidence we must take into consideration varying follow-up times.
The Kaplan-Meier Method:
- requires date last observed or date outcome occurred on each individual (end of study can be the last date observed) The essence of the Kaplan-Meier (KM) method is having the date each outcome in the cohort occurred.
The Analysis:
- Analysis is performed by dividing the follow-up time into discrete pieces to calculate probability of survival at each event (survival = probability of no event)
Those dates divide the follow-up time of the cohort into a number of discrete pieces. The proportion surviving (probability) is calculated for each discrete piece and the overall cumulative probability of surviving is calculated by multiplying together the individual probabilities.
Every member of the cohort has to be assigned a date first seen and a date last seen or a date diagnosed.
Cumulative Probability
Probability of two independent events occurring is the product of the two probabilities for each occurring alone
- eg, if event 1 occurs with probability 1/6 and event 2 with probability 1/2,
- then the probability of both event 1 and 2 occurring = 1/6 x 1/2 = 1/12
Probability of living to time 2 given that one has already lived to time 1
- Is independent of the probability of living to time 1
In order to calculate cumulative incidence, you need to understand or least accept on faith the following. It is a fundamental theorem of probability that the cumulative probability of two independent events is the product of their individual probabilities. So the probability of flipping two heads in a row with a fair coin is 1/2 x 1/2 = 1/4 .
The Kaplan-Meier method of calculating the cumulative probability of the disease outcome is to treat each separate discrete piece of time as an independent trial. There was some probability of the outcome during the first time period; there was another probability of the outcome during the second time period. The probability of the outcome during both time periods together is the product of the individual probabilities.
Students sometimes balk at treating the two time periods as independent events. They say, "How can they be considered independent when it is many of the same persons in each time period?" The answer is that the probability in the second time period is conditional on a given person already having lived through the first time. So the probability of the outcome in the second period is the probability conditional on not having experienced the outcome up until that point in time. A similar mistake is made by gamblers who think that because a coin has come up tails four times in row the probability of heads on the next toss is better than 1/2. IT IS NOT.
Example - Kaplan Meier Estimates
Using the data from Follow-up Starting Times Szklo and Nieto (Szklo, M., & Nieto, F. (2007). Epidemiology: Beyond the Basics (2nd Edition ed.). Boston: Jones and Bartlett Publishers) produced the following cumulative survival table.
Cumulative survival is calculated by multiplying probabilities for each prior failure time:
- e.g., 0.9 x 0.875 x 0.857 = 0.675 and
- 0.9 x 0.875 x 0.857 x 0.800 x 0.667 x 0.500 = 0.180
Deaths occurred at 6 different times during follow-up, so there are 6 discrete pieces of time (D = death).
Data One Month Follow-up:
- The probability of the event is the number of deaths at each point in time (just 1 here, but it is possible to have more than 1 at the same time) divided by the number in the cohort at that time.
- So at 1 month of follow-up there was a death and at that time all 10 original members of the cohort were still in follow-up.
- The probability of death was 1/10 and the probability of survival was 1 minus 1/10 = 9/10.
Data Three Month Follow-up:
- When the second diagnosis occurs at 3 months of follow-up, only 8 persons are still in follow-up because one person was lost to follow-up at 2 months of follow-up.
- The probability of death was 1/8, of survival was 7/8
- The cumulative probability of survival was 9/10 x 7/8 = 0.788.
Why not calculate a probability of survival when the one person was lost at 2 months? Because the probability of survival for the 9 would be 9/9 = 1 and 1 times the previous cumulative survival leaves it unchanged.
Survival Probabilities
Cannot calculate by multiplying each event probability (=probability of repeating event)
- (in our example, 0.100 x 0.125 x 0.143 x 0.200 x 0.333 x 0.500 = 0.0000595)
The cumulative probability is calculated with the survival probabilities because it is only survival that happens repeatedly. To use the probability of the event each time you would be calculating a probability of repeated diagnoses, not what you want.
At the end of multiplying together all of the individual survival probabilities to get the cumulative probability of 0.18, the cumulative probability of death can be obtained by subtracting from 1. 1 – 0.18 = 0.82.
Since it is a proportion, it has no time unit connected to it, so time period has to be added
- e.g, 2-year cumulative incidence
Example - Kaplan Meier Analysis
The following is a graph showing a Kaplan-Meier analysis of cumulative survival after breast cancer among patients grouped by whether they carry either the BRCA1 or the BRCA2 breast cancer gene mutation (N=58) versus patients without either mutation (N=979)
( Lee, J. S., Wacholder, S., Struewing, J. P., McAdams, M., Pee, D., Brody, L. C., et al. (1999). Survival after breast cancer in Ashkenazi Jewish BRCA1 and BRCA2 mutation carriers. J Natl Cancer Inst, 91(3), 259-263).
Notice that the lines are graphed in a stepwise fashion.
Note also that the two curves lie on top of one another for about two years, but there is a suggestion that the mutation carriers have better survival beyond two years or so.
- This observation should be viewed skeptically, though, as the numbers have become very small among both groups by 40 months and especially among the carriers (N=3).
- In a Kaplan-Meier graphic large steps indicate big jumps in probability due to small numbers at risk. Hence, the tail of the curve does not give precise information.
To read cumulative survival for a group from the graph, pick a time point, such as 24 months, draw a line straight up to intersect the survival curve and then a horizontal line that intersects the y-axis. Where it intersects the y-axis is the estimate of the proportion surviving at 24 months of follow-up (about 44% in these data for either group).
If the cumulative incidence of death had been plotted instead of the cumulative incidence of survival (always an option), the graph would have started in the lower left-hand corner at 0 and moved up toward 1 (inverting the curve pictured).
Comparing Two KM Curves
As you can see in the two Kaplan Meier curves (below) the risk ratio would be different for different follow-up times.
When a Kaplan-Meier analysis is presented in the medical literature, a p-value that summarizes the probability that the two curves differ over their entire lengths is usually given.
- This is a more complex statistic than just comparing two proportions with a chi-square test as it compares proportions all along the curves whenever an event occurs.
- The most commonly used statistic is called the log rank test;
- An alternative test is called the Wilcoxon.