Type of Graph:

Last updated by ___ on __

Type of data:

Type of analysis:

Description and purpose:

Examples:

Basic Examples - Attached

Complementary Graphs - Attached

Potential pitfalls - Attached

Reference:

Code ( ):

%CODE{lang="java"}%

%ENDCODE%

Histograms

Last updated by Richard Forshee on September 17, 2010

Type of data: continuous

Type of analysis: univariate

Description and purpose:

Histograms are used to represent the distribution of a single continuous variable. A histogram groups individual observations into bins (mouseover to define) of a specified (usually equal) width and counts the number of observations in each bin. Rectangles are drawn so that the height (or width for a horizontal histogram) represents the frequency, percentage, or density of the number of observations in each bin. By convention, the rectangles in a histogram touch one another.

Histograms are distinct from bar charts (link). Bar charts are for categorical data and by convention the rectangles in a bar chart do not touch.

Examples:

All examples use 100 data points that were randomly generated from a Beta(2,5) distribution. The Beta(2,5) is a skewed distribution that is bounded between 0 and 1. The actual data is shown in a stem-and-leaf plot.

Stem-and-Leaf Plot of Randomly Generated Data Used for Examples plot in units of 0.01
  0* | 334
  0. | 6778889
  1* | 011112333344
  1. | 5555667889
  2* | 0123333444
  2. | 55666678999
  3* | 00122222222334
  3. | 55566889999
  4* | 011114
  4. | 5567789
  5* | 1223
  5. | 5788
  6* | 1

Basic Histogram Examples

pic1.png

Complementary Graphs:

Kernel density overlays
Theoretical distribution overlays
Rug plots
pic2.png

Potential pitfalls

Visual representation is very sensitive to the choice of bin size pic3.png

Reference: These concepts have been discussed by many authors, but Cox NJ, Speaking Stata: Graphing distributions, The Stata Journal (2004) 4, Number 1, pp. 66\x9688 was particularly helpful as I prepared this description.

Code (Stata 11):

%CODE{lang="java"}% ******************************************************** ** ** Histogram Examples for FDA-Industry-Academia Safety Graphics WG ** Richard Forshee, FDA/CBER/OBE ** ** Last updated September 17, 2010 ** ** This file benefited from Cox NJ, Speaking Stata: ** Graphing Distributions. Stata Journal 2004. ** ********************************************************

** Generate random data from a beta distribution alpha=2, beta=5 ** This set of parameters generates highly skewed data between 0 and 1

clear set seed 85360497 // serial number from the first dollar bill in my wallet set obs 100 gen x = rbeta(2,5)

label var x "Response Variable, arbitrary scale of 0-1"

stem x, round(0.01)

** Basic histograms twoway histogram x, title("Frequency") freq start(0) saving(basic_freq, replace) twoway histogram x, title("Percentage") percent start(0) saving(basic_perc, replace) twoway histogram x, title("Density") start(0) saving(basic_dens, replace)

graph combine basic_freq.gph basic_perc.gph basic_dens.gph, /// row(1) xsize(6) ysize(3) title("Basic Histogram Examples") /// subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

** Histograms with overlays

** Kernel Density twoway (histogram x, start(0)) (kdensity x), /// title("Kernel Density Overlay") xtitle("Response Variable, arbitrary scale of 0-1") /// legend(order(2) label(2 "Kernel Density")) /// saving(over_kd, replace)

** Normal summ x // Generate summary statistics local m=r(mean)' // Place mean into a local macro local sd=r(sd)' // Place standard deviation into a local macro

twoway (histogram x, start(0)) (function y=normalden(x,m',sd'), range(0 1)), /// title("Normal Distribution Overlay") xtitle("Response Variable, arbitrary scale of 0-1") /// legend(order(2) label(2 "Normal Distribution")) /// saving(over_normal, replace)

** Rug Plot gen pipe = "|" // Create a vertical line symbol gen where=-0.1 // Create a variable for vertical placement of the rug plot

** Histogram with a scatter plot underneath to produce rug plot

histogram x, start(0) /// title("Rug Plot Overlay") /// saving(over_rug, replace) /// plot(scatter where x, ms(none) mlabel(pipe) mlabpos(0)) /// legend(off) plotregion(margin(medium))

graph combine over_kd.gph over_normal.gph over_rug.gph, /// row(1) xsize(6) ysize(3) /// title("Histograms with Kernel Density, Normal Distribution, and Rug Plot Overlays") /// subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

** Pitfalls ** Bin width

histogram x, start(0) width(0.1) /// title("0.1 bin width") saving(width_10, replace) histogram x, start(0) width(0.05) /// title("0.05 bin width") saving(width_05, replace) histogram x, start(0) width(0.01) /// title("0.01 bin width") saving(width_01, replace)

graph combine width_10.gph width_05.gph width_01.gph, /// title("Bin Width Can Affect the Shape of a Histogram") /// subtitle("Randomly generated data, Beta(2,5) distribution, n=100") /// row(1) xsize(6) ysize(3) %ENDCODE%