# Histograms

( Last updated by Richard Forshee on September 17, 2010)

# Description and purpose:

Histograms are used to represent the distribution of a single continuous variable. A histogram groups individual observations into bins (mouseover to define) of a specified (usually equal) width and counts the number of observations in each bin. Rectangles are drawn so that the height (or width for a horizontal histogram) represents the frequency, percentage, or density of the number of observations in each bin. By convention, the rectangles in a histogram touch one another.

Histograms are distinct from bar charts (link). Bar charts are for categorical data and by convention the rectangles in a bar chart do not touch.

# Examples:

All examples use 100 data points that were randomly generated from a Beta(2,5) distribution. The Beta(2,5) is a skewed distribution that is bounded between 0 and 1. The actual data is shown in a stem-and-leaf plot.

## Stem-and-Leaf Plot of Randomly Generated Data Used for Examples plot in units of 0.01

```  0* | 334
0. | 6778889
1* | 011112333344
1. | 5555667889
2* | 0123333444
2. | 55666678999
3* | 00122222222334
3. | 55566889999
4* | 011114
4. | 5567789
5* | 1223
5. | 5788
6* | 1```

## Complementary Graphs - Attached as JPEG file

Kernel density overlays
Theoretical distribution overlays
Rug plots

## Potential pitfalls - Attached as JPEG file

Visual representation is very sensitive to the choice of bin size

Reference: These concepts have been discussed by many authors, but Cox NJ, Speaking Stata: Graphing distributions, The Stata Journal (2004) 4, Number 1, pp. 66–88 was particularly helpful as I prepared this description.

# Code (Stata 11):

## Highlighted Code

| %CODE{lang="java"}% ****************************************************** * Histogram Examples for FDA-Industry-Academia Safety Graphics WG * Richard Forshee, FDA/CBER/OBE * * Last updated September 17, 2010 * * This file benefited from Cox NJ, Speaking Stata: * Graphing Distributions. Stata Journal 2004. * *******************************************************

* Generate random data from a beta distribution alpha=2, beta=5 * This set of parameters generates highly skewed data between 0 and 1

clear set seed 85360497 // serial number from the first dollar bill in my wallet set obs 100 gen x = rbeta(2,5)

label var x "Response Variable, arbitrary scale of 0-1"

stem x, round(0.01)

** Basic histograms twoway histogram x, title("Frequency") freq start(0) saving(basic_freq, replace) twoway histogram x, title("Percentage") percent start(0) saving(basic_perc, replace) twoway histogram x, title("Density") start(0) saving(basic_dens, replace)

graph combine basic_freq.gph basic_perc.gph basic_dens.gph, /// row(1) xsize(6) ysize(3) title("Basic Histogram Examples") /// subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

** Histograms with overlays

** Kernel Density twoway (histogram x, start(0)) (kdensity x), /// title("Kernel Density Overlay") xtitle("Response Variable, arbitrary scale of 0-1") /// legend(order(2) label(2 "Kernel Density")) /// saving(over_kd, replace)

** Normal summ x // Generate summary statistics local m=r(mean)' // Place mean into a local macro local sd=r(sd)' // Place standard deviation into a local macro

twoway (histogram x, start(0)) (function y=normalden(x,m',sd'), range(0 1)), /// title("Normal Distribution Overlay") xtitle("Response Variable, arbitrary scale of 0-1") /// legend(order(2) label(2 "Normal Distribution")) /// saving(over_normal, replace)

** Rug Plot gen pipe = "|" // Create a vertical line symbol gen where=-0.1 // Create a variable for vertical placement of the rug plot

** Histogram with a scatter plot underneath to produce rug plot

histogram x, start(0) /// title("Rug Plot Overlay") /// saving(over_rug, replace) /// plot(scatter where x, ms(none) mlabel(pipe) mlabpos(0)) /// legend(off) plotregion(margin(medium))

graph combine over_kd.gph over_normal.gph over_rug.gph, /// row(1) xsize(6) ysize(3) /// title("Histograms with Kernel Density, Normal Distribution, and Rug Plot Overlays") /// subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

* Pitfalls * Bin width

histogram x, start(0) width(0.1) /// title("0.1 bin width") saving(width_10, replace) histogram x, start(0) width(0.05) /// title("0.05 bin width") saving(width_05, replace) histogram x, start(0) width(0.01) /// title("0.01 bin width") saving(width_01, replace)

graph combine width_10.gph width_05.gph width_01.gph, /// title("Bin Width Can Affect the Shape of a Histogram") /// subtitle("Randomly generated data, Beta(2,5) distribution, n=100") /// row(1) xsize(6) ysize(3) %ENDCODE%

## Disclaimer

DISCLAIMER: The views expressed within CTSpedia are those of the author and must not be taken to represent policy or guidance on the behalf of any organization or institution with which the author is affiliated.