Type of Graph:
Last updated by ___ on __
Type of data:
Type of analysis:
Description and purpose:
Examples:
Basic Examples - Attached
Complementary Graphs - Attached
Potential pitfalls - Attached
Reference:
Code ( ):
%CODE{lang="java"}%
%ENDCODE%
Histograms
Last updated by Richard Forshee on September 17, 2010
Type of data: continuous
Type of analysis: univariate
Description and purpose:
Histograms are used to represent the distribution of a single continuous variable. A histogram groups individual observations into bins (mouseover to define) of a specified (usually equal) width and counts the number of observations in each bin. Rectangles are drawn so that the height (or width for a horizontal histogram) represents the frequency, percentage, or density of the number of observations in each bin. By convention, the rectangles in a histogram touch one another.
Histograms are distinct from bar charts (link). Bar charts are for categorical data and by convention the rectangles in a bar chart do not touch.
Examples:
All examples use 100 data points that were randomly generated from a Beta(2,5) distribution. The Beta(2,5) is a skewed distribution that is bounded between 0 and 1. The actual data is shown in a stem-and-leaf plot.
Stem-and-Leaf Plot of Randomly Generated Data Used for Examples
plot in units of 0.01
0* | 334
0. | 6778889
1* | 011112333344
1. | 5555667889
2* | 0123333444
2. | 55666678999
3* | 00122222222334
3. | 55566889999
4* | 011114
4. | 5567789
5* | 1223
5. | 5788
6* | 1
Basic Histogram Examples
Complementary Graphs:
Kernel density overlays
Theoretical distribution overlays
Rug plots
Potential pitfalls
Visual representation is very sensitive to the choice of bin size
Reference: These concepts have been discussed by many authors, but Cox NJ, Speaking Stata: Graphing distributions, The Stata Journal (2004) 4, Number 1, pp. 66\x9688 was particularly helpful as I prepared this description.
Code (Stata 11):
%CODE{lang="java"}%
********************************************************
**
** Histogram Examples for FDA-Industry-Academia Safety Graphics WG
** Richard Forshee, FDA/CBER/OBE
**
** Last updated September 17, 2010
**
** This file benefited from Cox NJ, Speaking Stata:
** Graphing Distributions. Stata Journal 2004.
**
********************************************************
** Generate random data from a beta distribution alpha=2, beta=5
** This set of parameters generates highly skewed data between 0 and 1
clear
set seed 85360497 // serial number from the first dollar bill in my wallet
set obs 100
gen x = rbeta(2,5)
label var x "Response Variable, arbitrary scale of 0-1"
stem x, round(0.01)
** Basic histograms
twoway histogram x, title("Frequency") freq start(0) saving(basic_freq, replace)
twoway histogram x, title("Percentage") percent start(0) saving(basic_perc, replace)
twoway histogram x, title("Density") start(0) saving(basic_dens, replace)
graph combine basic_freq.gph basic_perc.gph basic_dens.gph, ///
row(1) xsize(6) ysize(3) title("Basic Histogram Examples") ///
subtitle("Randomly generated data, Beta(2,5) distribution, n=100")
** Histograms with overlays
** Kernel Density
twoway (histogram x, start(0)) (kdensity x), ///
title("Kernel Density Overlay") xtitle("Response Variable, arbitrary scale of 0-1") ///
legend(order(2) label(2 "Kernel Density")) ///
saving(over_kd, replace)
** Normal
summ x // Generate summary statistics
local m=r(mean)' // Place mean into a local macro
local sd=r(sd)' // Place standard deviation into a local macro
twoway (histogram x, start(0)) (function y=normalden(x,m',sd'), range(0 1)), ///
title("Normal Distribution Overlay") xtitle("Response Variable, arbitrary scale of 0-1") ///
legend(order(2) label(2 "Normal Distribution")) ///
saving(over_normal, replace)
** Rug Plot
gen pipe = "|" // Create a vertical line symbol
gen where=-0.1 // Create a variable for vertical placement of the rug plot
** Histogram with a scatter plot underneath to produce rug plot
histogram x, start(0) ///
title("Rug Plot Overlay") ///
saving(over_rug, replace) ///
plot(scatter where x, ms(none) mlabel(pipe) mlabpos(0)) ///
legend(off) plotregion(margin(medium))
graph combine over_kd.gph over_normal.gph over_rug.gph, ///
row(1) xsize(6) ysize(3) ///
title("Histograms with Kernel Density, Normal Distribution, and Rug Plot Overlays") ///
subtitle("Randomly generated data, Beta(2,5) distribution, n=100")
** Pitfalls
** Bin width
histogram x, start(0) width(0.1) ///
title("0.1 bin width") saving(width_10, replace)
histogram x, start(0) width(0.05) ///
title("0.05 bin width") saving(width_05, replace)
histogram x, start(0) width(0.01) ///
title("0.01 bin width") saving(width_01, replace)
graph combine width_10.gph width_05.gph width_01.gph, ///
title("Bin Width Can Affect the Shape of a Histogram") ///
subtitle("Randomly generated data, Beta(2,5) distribution, n=100") ///
row(1) xsize(6) ysize(3)
%ENDCODE%