Infinite Estimates
Lead Author: Peter Bacchetti, PhD
Some regression methods, notably logistic regression and Cox proportional hazards regression, can produce degenerate estimates that are effectively infinite. (Note that zero is a degenerate estimate for odds ratios or hazard ratios, corresponding to an estimate of minus infinity for the log odds ratio or log hazard ratio.) This usually reflects a categorical predictor’s having a 0% or 100% rate of positive outcomes for one of its levels. In such cases, many software packages produce large estimated coefficients with even larger standard errors or with standard errors of zero. In such cases, the confidence intervals and Pvalues in the standard output are usually invalid.
Methods for dealing with infinite estimates
Reduce the data set and the model
Some software (e.g., Stata) will drop the observations that have infinite fitted values (probabilities of 0 or 1) from the data set and drop the predictor variable with an infinite estimated coefficient from the model, and then proceed with modeling the remaining observations using the remaining variables. For most modeling algorithms, this just has the effect of removing the unsightly estimates and standard errors from the output, but it has essentially no effect on the remaining, wellbehaved estimates. This may be OK when the dropped variable is only to be controlled for and is not of interest itself.
When the effect that was estimated to be infinite is of interest, then other methods are needed to characterize the strength of evidence for the existence of the effect and its magnitude.
Collapse categories
Sometimes it is possible to merge the group that has 0% or 100% positive outcomes with another group, producing a new categorization that does not produce any infinite estimates. This may appear to be an easy and practical solution, but modifying analysis decisions to avoid undesirable results can introduce bias (see Reproducible Research). This approach is therefore safest when the coarser (merged) categorization is clearly of interest regardless of any difficulties that arise from the finer categorization that produces the infinite estimate. In other situations, keeping the original categorization and using a method below is usually a preferable approach.
Likelihoodbased methods
The likelihood is a measure of how well the model fits the observed data, and it can be used to obtain confidence intervals and Pvalues. The profile likelihood method can provide a confidence bound (lower bound if the estimate is +infinity, upper bound if the estimate is zero), and a likelihood ratio test can provide a Pvalue. In SAS, the confidence bound can be obtained by including an option on the Model statement:
CLodds=PL option on the Model statement in SAS Proc Logistic.
Risklimits=PL option on the Model statement in SAS Proc PHreg.
LRCI option on the Model statement in Proc Genmod.
In Stata, the pllf command can produce a confidence bound.
While profile likelihood confidence bounds should in principle always be available, SAS Proc Genmod has been observed to produce an estimate, lower bound, and upper bound all equal to the same value in some challenging situations.
“Exact” methods
Methods that avoid reliance on approximations (e.g., asymptotic normality based on the central limit theorem) are often called “exact” methods. These are not exact in the sense of producing exactly the desired properties for the resulting confidence intervals and Pvalues. Instead, they are usually conservative, meaning the confidence intervals tend to be too wide and the Pvalues tend to be too large. Conservatism is sometimes useful, but it is usually less desirable than accuracy. In the SAS Logistic Procedure, such “exact” Pvalues and confidence intervals can be obtained with use of and “Exact” statement. Some investigation of exact methods for Cox proportional hazards regression is described here. A problem with “exact” methods is that they are computationally infeasible in many situations.
Firth adjustment
A general method proposed by David Firth prevents infinite estimates by penalizing the likelihood at very large values of the estimate. This also reduces the bias of maximum likelihood estimates, but it essentially assumes some prior knowledge about the parameter being estimated. It can be implemented by adding the “Firth” option to the model statement in the SAS Logistic and PHReg procedures.
Bayesian methods
In a Bayesian approach, one would first quantify prior information about the parameter to be estimated, and then synthesize this with the information provided by the current data set. In principle, this would prevent infinite estimates whenever exceptions are already known to have occurred or possibly because of theoretical considerations. Also, resulting posterior probability distributions would quantify the probability, if any, that the parameter is infinite, along with the uncertainty about its value if it is finite. These methods are currently less common and less accessible to researchers than those noted above.
Example
To illustrate the methods, we consider analysis of a simple data set where 20 people without a yesorno risk factor had a 50% rate of an outcome occurring, while 20 people who did have the risk factor had a 100% rate of the outcome, as summarized in the table below.
Example data
   Outcome   Percent with


    0
 1
  Outcome=1


       
Predictor
  0
  10  10
  50% 

 1   0  20   100% 

A SAS program to analyze these data produces results that are summarized in the following table:
Results of Analysis of Example Data by Various Methods

Estimated 

95% Confidence Interval 


Method 
Odds Ratio 

Lower 
Upper 

Pvalue 







Usual default (Wald) 
+∞ 

0 
+∞ 

0.95 
Profile likelihood 
+∞ 

8.4 
+∞ 

<0.0001 
"Exact" 
24.4 

3.4 
+∞ 

0.0004 
Firth  Wald 
41.0 

2.0 
829 

0.015 
Firth  profile likelihood 
41.0 

4.5 
5466 

0.0001 
The usual output, shown in the top row, inaccurately shows the study providing essentially no information. For the other methods, there is a more than 4fold range in the lower confidence bounds.
The Firth results may seem somewhat confusing, because the estimate is usually the possible true value that is most supported by the study’s data, but the data clearly don’t support an odds ratio of 41.0 any more than they support any other large value. Also, the usual (informal) interpretation of a confidence interval is that the study provides strong evidence against values outside the confidence interval, but any evidence against values larger than the Firth upper bounds does not come from this data set. The estimate from the “exact” method is something called a medianunbiased estimate, but as with the Firth method it seems odd to pick this particular value based on the study’s data. One can instead report the infinite odds ratio estimate and still validly report the “exact” interval with it. The profile likelihood results do not have any counterintuitive features.
Recommendations
The profile likelihood method is a reasonable choice that should usually work. An entire set of related analyses can be done with likelihoodbased methods, for consistency, when only some produce infinite estimates. “Exact” methods may be acceptable when they are computationally feasible and the results remain clear despite their likely conservative bias. Firth methods have good overall statistical properties, but their finite estimates and confidence bounds are counterintuitive. The profile likelihood Firth approach may work when the plain likelihood methods encounter technical problems.