Start presentation
Slide 1: Measuring Disease Occurrence
- Occurrence of disease is the fundamental outcome measurement of epidemiology
- Occurrence of disease is typically a binary (yes/no) outcome
- Occurrence of disease involves time
Slide 2: Main Points to be Covered
- IncidenceMove versus Prevalence
- The 3 elements of measures of incidence
- Cumulative vs. person-time incidence
- Concept of censoring
- Calculating cumulative incidence by the Kaplan-Meier method
In addition to the distinction between incidence and prevalence, we will be presenting the difference between two important and widely used ways to measure incidence called cumulative incidence and person-time incidence. Understanding the 3 elements in measuring disease incidence and focusing on how they are used (or not used) can eliminate confusion about the frequently erratic use of terminology in the medical literature.
Slide 3: Prevalence versus IncidenceMove
- Prevalence counts existing disease diagnoses, usually at a single point in time
- CTSpedia.IncidenceMove counts new disease diagnoses during a defined time period
The difference between incidence and prevalence is a fundamental distinction in epidemiology. Prevalent cases of a disease will over-represent those with longer duration or survival. This can potentially introduce significant bias into a study of a disease and its risk factors. The amount of the difference between incidence and prevalence is related to the time period during which incidence is measured and the average length of duration of the disease condition. In some instances prevalence may look similar to incidence and in others the two measures will be very different.
The concepts are not difficult to grasp but there are some subtleties in implementing them as diseases with gradual onset can be diagnosed at varying points in their development, cancer being the most common example. Both incidence and prevalence can be affected by changes in methods of diagnosis and the ability to identify disease at earlier stages.
Prevalence is often important from the public health perspective of examining the burden of a disease in a population. IncidenceMove is also important to public health in determining trends over time in controlling a disease, but it is the fundamental measure for studies of causality.
Slide 4: Two Types of Prevalence
- Point prevalence - number of persons with a specific disease at one point in time divided by total number of persons in the population
- Period prevalence - number of persons with disease in a time interval (eg, one year) divided by number of persons in the population
- Prevalence at beginning of an interval plus any incident cases
- Risk factor prevalence may also be important
This distinction is often not made because most prevalence estimates that you will encounter in the medical literature are point prevalence. Period prevalence has its uses, however. It is, for example, helpful for planning the delivery of health services to know how many persons in a given time period may need those services. A way to think about period prevalence is as the point prevalence at the beginning of a defined time interval plus whatever incident cases occur during that interval.
The concept of prevalence is not unique to disease outcome. The prevalence of a risk factor is also important from a public health perspective. For example, a risk factor (exposure) may have a modest association with a disease outcome (say, a relative risk less than 2), but be a very common exposure. In that instance even a small relative risk may have great public health importance if a large proportion of the population is exposed. A good example is the risk associated with second hand smoking.
Slide 5: Example of Point Prevalence
- NHANES = National Health and Nutrition Examination Survey, a probability sample of all United States residents from 1988 to 1994
- During NHANES III, blood samples drawn and tested for antibodies against HIV
- Estimated national prevalence: 461,000 [HIV-infected (0.18%)]
McQuillan et al., JAIDS, 1997
NHANES III was carried out over a long time period, 6 years between 1988 and 1994, but each person’s blood was drawn at a single point in time. Of course, testing all of those blood samples together as if they were all drawn at a single point in time does make an assumption that the point prevalence wasn’t changing during the time it took to carry out the entire NHANES III study. This certainly would have been a poor assumption had NHANES III taken place in the early 1980’s, and it may also be questionable for the 1988 to 1994 time period. The concept illustrated is that point prevalence is not a function of the length of time it takes to conduct the study. It is a function of the time period represented by the measurement. So a single blood sample represents one point in time. Although it may take a while to gather all of the samples, most prevalence studies don’t take 6 years, and the assumption of not much change in the prevalence being measured during the period of carrying out the study is usually reasonable.
This particular estimate of HIV prevalence was undoubtedly too low, as the authors acknowledged, because NHANES does not get a good sample of some of the groups at high risk for HIV infection, especially injecting drug users and probably gay men. They made some calculations and adjusted this estimate upward by about 200,000 persons.
Slide 6: Example of Period Prevalence: National Health Interview Survey (NHIS)
Information About NHIS
http://twiki.library.ucsf.edu/twiki/bin/viewfile/CTSpedia/TICRDisOccurI?rev=1;filename=fig_nhis.JPG
The National Health Interview Survey is carried out every year (so it doesn’t take as long as the NHANES), but again the point is not how long it takes to conduct the NHIS but what time period does the measurement represent. In this case the measurement of disease prevalence is for a specified time period, 30 days, so it is an example of period prevalence, albeit not a particularly long period. The same question might have been asked for the past six months and a different prevalence estimate would have been obtained. One can begin to see how a very short-lived condition will have different prevalence depending on whether point or period prevalence is being examined and on the length of the time period if period prevalence is being examined.
Slide 7: The Three Elements in Measures of Disease IncidenceMove
- E = an event = a binary outcome
- N = number of at-risk persons in the population under study
- T = time period during which the events are observed
That’s all for prevalence. Turning now to incidence, understanding how these 3 elements of an event, a number of persons at risk , and a time period during which the disease events are observed are essential to understanding the different measures we will be discussing. If you always pay close attention to how these 3 elements are being used (or not used), you should have no trouble in understanding what kind of incidence measure is being presented. The binary event we are generally examining is the occurrence of a disease or death, but this framework applies to measuring incidence in any clinical study with a binary outcome (such as successfully quitting smoking).
Slide 8: Disease Occurrence Measures: A Confusion of Terms
- Terminology is not standardized and is used carelessly even by those who know better
- Key to understanding measures is to pay attention to how the 3 elements of number of events (E), number of persons at risk (N), and time (T) are used
- Even the basic difference between prevalence and incidence is often ignored
We will present terms for the types of incidence that we will stick to. We think they make the distinctions that the best epidemiological writing on the subject make and that they should be preferred. Precise use of this terminology allows us to speak to each other accurately, avoid misunderstanding, and get the right answers in our research projects. Unfortunately, we need to warn you that in reading the medical literature you will often find these terms used in other ways and often used without making distinctions that we think are important. This extends all the way to making the basic distinction between prevalence and incidence we have been discussing.
HIV/AIDS infection rates drop in Uganda
KAMPALA, Sept. 10 (Kyodo) - Infection rates of the HIV/AIDS epidemic among Ugandan men, women and children dropped to 6.1% at the end of 2000 from 6.8% a year earlier, an official report shows…the results were obtained after testing the blood of women attending clinics in 15 hospitals around the country.
The report says the average rate of infection for urban areas fell from 10.9% to 8.7%. In rural areas, the average was 4.2%, not much different from the 4.3% average a year earlier. The highest infection rate of 30% was last reported in western Uganda in 1992.
Use of the word “rate” should imply that incidence is being measured. Reports like that cited above are common. What is reported as an infection rate is not in fact incidence but prevalence. It is not immediately clear that the figures of 6.1% and 6.8% are not incidence because it might be possible (although very unlikely) to have one-year incidence rates of HIV infection that high in Africa, but the last sentence of the report gives an “infection rate of 30%,” a figure so high that it can only be prevalence. For it to be incidence 30% of the population of women attending clinics in Uganda would have to have been newly infected in a one-year period. No HIV infection rate this high has been seen in one-year anywhere. All of the figures are prevalence as they are the proportion of women who tested positive in successive years in 15 Ugandan hospital clinics. The proportion testing positive is prevalence as it does not take into account whether the women had all been tested the year before and whether the positives were only among those testing HIV negative the year before.
Since this is a news service report, it isn’t clear whether the WHO and MCR of Britain used the language of infection rates or whether that was introduced by the reporting. But in any case you will see the same use of rate in some of the medical literature
- The word “rate” should be avoided when existing diagnoses at one point in time are what was measured.
- Although you may encounter “prevalence rate,” rate should be reserved for measuring incidence.
- In general a rate is a change in one measure with respect to change in a 2nd
Rate is a mathematical concept that has broad application in many fields of science. Although it is used in medicine for reporting the change in a measure of disease occurrence (the number of new diagnoses among a number of persons) with respect to a change in time, it can be change in one measure with respect to a second measure which is not necessarily time. For example, traffic fatalities per passenger-mile traveled is a rate. Mathematically, a rate is the first derivative of a function and it is usually called the hazard of a function (hence the term proportional hazards model for a type of regression model used with longitudinal data).
Here we are pointing out that rate should be restricted to use as a term for incidence but not prevalence. We will later go on to make a further distinction between a person-time incidence rate and cumulative incidence. Some epidemiologists also call cumulative incidence a rate, but that is incorrect. Both are ways to measure incidence.
Slide 10: Measures that are sometimes loosely called IncidenceMove
- Count of the number of events (E) eg, there were 84 traffic fatalities during the holidays
- Count of the number of events during some time period (E/T) eg, traffic accidents have averaged 50 per week during the past year
- Neither explicitly includes the number of persons (N) giving rise to the events
CTSpedia.IncidenceMove requires that we know how many events occurred (E) during what time period (T) among how many persons (N). One could argue that some of these examples implicitly include a number of persons. For example, traffic accidents in San Francisco County for the past year has an implied number of persons, which is the average county population during the year, so by naming a geographic location a number of persons at risk is implied. The “holidays”may refer to a specific 3-day holiday weekend, thus giving the implied time period. These inferences about N and T can be reasonable depending upon the context, but the point we are making is that all three E, N, and T need to be accounted for, perferably explicitly, in order to have a measure of incidence.
Slide 11: CDC: Chickenpox rates drop in four states as inoculations become common
SF Chronicle, Thursday, September 18, 2003
(09-18) 13:51 PDT ATLANTA (AP) --
The number of chickenpox cases in four states dropped more than 75 percent as inoculations became more common in the last decade, according to a federal study released Thursday.
The total number of cases in Illinois, Michigan, Texas and West Virginia dropped from about 102,200 in 1990 to about 24,500 in 2001, the Centers for Disease Control and Prevention said.
At the same time, the percentage of infants receiving chickenpox shots rose from less than 9 percent in 1996 to as much as 83 percent in 2001, the CDC said.
In this example, the number of events (E) is given, the time period (T) is described (one year at two points in time), and a population of persons is specified (four states). The story says that the number of cases dropped more than 75% and that is perfectly accurate, but the headline says that rates dropped. What is missing in order to compare rates between these two one-year time periods is the number of persons living in those four states in the two time periods. Since the two one-year incidence periods are 11 years apart, it is a reasonable bet that the population of the four states changed during 11 years. How much, or in which direction, one can’t be certain, but most likely the population increased. If the population increased a lot, then the difference in incidence rates between the two years is even greater than the change in the number of cases. So the press release is probably not qualitatively incorrect (unless those 4 states lost a lot of population), but it would have been even more informative if the incidence rates rather than the counts had been reported. This is information knowable from census data—a good use of census data.
To take it a step further, since chicken pox is largely a disease of infants, it would be even more informative to know what had happened to the size of the population of infants during that period and what the rates were among infants. It is perhaps not so clear that the infant population size increased. If not, then the report may be overstating the change in chickenpox rates among infants.
Problem: How would you measure breast cancer incidence in a cohort study (such as the Nurses Health Study)?
CTSpedia.IncidenceMove = occurrence of new cases
But how to account for the role of time?
Cohort studies typically start with a study base of individuals free of the disease being studied and then attempt to identify every new diagnosis of the disease during the follow-up time of the cohort. Therefore, the fundamental outcome measure in a cohort study is disease incidence, the occurrence of new cases over time. But “over time” is a key phrase. Just counting the cases is not enough to measure incidence. The amount of time during which the diagnoses were made has be included in the way incidence is defined.
Two Measures Described as IncidenceMove in the Text:
- The proportion of individuals who experience the event in a defined time period (E/N during some time T) = cumulative incidence
- The number of events divided by the amount of person-time observed (E/NT) = incidence rate or density (not a proportion)
This is an important distinction because the two types of incidence are related to different types of analyses. Both are perfectly valid measures of incidence as both include E, N, and T. We will be exploring some of the properties of these two measures, the assumptions they make, and how they are used in research studies.
If the measure is a proportion of persons, it is unitless since it has to vary between 0 and 1. In other words it is a probability. But because it is unitless, the time element T has to be explicitly added; for example, the proportion of persons diagnosed during a one-year period. This is a common error in the literature. The time period for cumulative incidence is often missing.
If the measure is a number of events divided by some number of persons at risk during some period, it is not a proportion (not a probability) because the denominator multiplies persons by time (100 persons followed for 2 years gives the same denominator as 200 persons followed for 1 year). The value of the fraction will change with the denominator and the units of the denominator are arbitrary. That is, if an incidence is presented as events per person-years, those person-years could be converted to person-months, or person-days, or even person-minutes with corresponding changes in the incidence rate (even though they all mean the same thing). And none of those fractions is constrained to be between 0 and 1—they can exceed 1. The concept of an incidence rate is not intuitive to everyone at first glance, but we will spend much more time on it later.
Slide 13: Counterintuitive Idea
The denominator for incidence does not have to be a count of individual persons
http://twiki.library.ucsf.edu/twiki/bin/viewfile/CTSpedia/TICRDisOccurI?rev=1;filename=incident.JPG
Having introduced the idea of two kinds of incidence measures, here we return to a comment we made earlier about the lack of a standard vocabulary for measures of incidence even among epidemiological text books. Jeff Martin added the E’s, N’s, and T’s to this table in order to show how focusing on those 3 elements can clarify what is being measured despite the differences in terminology among these authors. Note, however, that when events per person-time unit is being measured, most authors call this a incidence rate (with one of these author calling it an incidence density).
We favor the terminology used here by Rothman, who calls E/NT incidence rate and E/N cumulative incidence. These terms make the most sense. A rate captures the number of cases occurring over time and is not tied to any specific time period. It may help to think of rate in another context, such as the velocity (rate) of an automobile. If you take a trip across California, your average velocity, a rate, is not determined by how long you have been driving. Cumulative incidence essentially adds up (hence the word “cumulative”) all incident cases over a fixed period of time.
Slide 14: Disease IncidenceMove Key Concept
- Numerator is always the number of new events in a time period (E)
- Examine the denominator (persons or person-time) to determine the type of incidence measure
Looking at whether the denominator is a number of persons or the product of persons times time will tell you whether you are looking at cumulative incidence or at an incidence rate. If you are looking at cumulative incidence, the authors should have given the time at which it was calculated (43% at 3 years).
Slide 15: Problem Set - Disease Occurrence - Prevalence vs IncidenceMove
Q1: Investigators performed a one-time survey of 500 residents of California and asked whether respondents currently had a “common cold”. Eighty residents responded “yes”. The survey was conducted from Jan. 1, 1997 to June 30, 1998. State the measure of disease occurrence (e.g., point prevalence, cumulative incidence, etc.) that is given or can be calculated from the information provided about this study. (Anwers: TICR Disease Occurrence - Answers )
Q2: Investigators performed a one-time survey of 500 residents of California and asked whether respondents had experienced a “common cold” anytime in the prior year. Two hundred residents responded “yes”. The survey was conducted from Jan 1, 1999 to Dec. 31, 1999. State the measure of disease occurrence (e.g., point prevalence, cumulative incidence, etc.) that is given or can be calculated from the information provided about this study. (Anwers: TICR Disease Occurrence - Answers )
Q3: A cohort study of HIV infection followed 600 initially HIV-negative men of whom 85 men acquired HIV infection during the follow-up. The average follow-up time for the 600 men was 6.5 years. State the measure of disease occurrence (e.g., point prevalence, cumulative incidence, etc.) that is given or can be calculated from the information provided about this study. State the measure of disease occurrence (e.g., point prevalence, cumulative incidence, etc.) that is given or can be calculated from the information provided about this study. (Anwers: TICR Disease Occurrence - Answers )