Lead Author(s): Erin Esp


Reliability is the consistency of a set of measurements or of a measuring instrument. In short, reliability is the repeatability of a measurement. Reliability is often used to describe a test.

Types of Estimates

There are several different types of reliability estimates
  1. Test-Retest Reliability
  2. Internal Consistency Reliability
  3. Inter-Rater Reliability
  4. Inter-Method Reliability

Test-Retest Reliability

Test-Retest reliability is the variation in measurements taken by a single person or instrument on the same item and under the same conditions. This measure is desirable mainly for measurements that are not expected to change over time.

One particular type of test-retest reliability is intra-rater reliability. Intra-rater reliability measures the degree of agreement among multiple repetitions of a diagnostic test performed by a single rater.

Internal Consistency Reliability

Internal consitency reliability assesses the consistency of results across items within a test. That is, it measures whether several items that propose to measure the same general idea produce similar scores.

Often times Cronbach's Alpha is used to measure this reliability. Cronbach's Alpha is a statistic calculated from the pairwise correlations between items. An alpha value between 0.6 and 0.8 indicates an acceptable reliability while alpha values greater than 0.8 indicate good reliability. We should be careful though as high reliabilities may indicate that the items are entirely redundant.

Another statistic used to measure internal consistency reliability is the Coefficient Omega.

Inter-Rater Reliability

Inter-rater reliability, also known as inter-rater agreement and concordance, measures the variation in measurements when taken by different persons but with the same method or instrument. There are a number of different statistics that can be used to measure the inter-rater reliability.
  1. Overall Percent Agreement
    - This measure assumes the data is entirely nominal.
    - It does not take into account that agreement may happen solely based on chance.
    - The least robust measure of inter-rater reliability.
  2. Cohen's Kappa
    - This measure also assumes the data is entirely nominal
    - Measures the inter-rater reliability between two raters.
    - This measure takes into account chance agreement.
  3. Fleiss Kappa
    - This measure is similar to Cohen's Kappa however it incorporates any number of raters.
    - Assumes the data is entirely nominal.
  4. Inter-rater Correlation
    - Measures pairwise correlation among raters using a scale that is ordered.
    - Examples include Pearson's Correlation Coefficient and Spearman's Rank Correlation Coefficient.
  5. Intraclass Correlation Coefficient (ICC)
    - Measures the proportion of variability of an observation that is accounted for by the between-group variability.
  6. Concordance Correlation Coefficient
    - Nearly identical to the intra-class correlation.

Inter-Method Reliability

Inter-method reliability is the variation in measurements of the same target when taken by different methods or instruments. A commonly used type of inter-method reliability is the parallel-forms reliability.

-- ErinEsp - 24 Jul 2010