Five-number summary
Encyclopedia
The five-number summary is a descriptive statistic
that provides information about a set of observations. It consists of the five most important sample percentiles
:
In order for these statistics to exist the observations must be from a univariate
variable that can be measured on an ordinal, interval or ratio scale
.
of the observations. Reporting five numbers avoids the need to decide on the most appropriate summary statistic. The five-number summary gives information about the location (from the median), spread (from the quartiles) and range (from the sample minimum and maximum) of the observations. Since it reports order statistic
s (rather than, say, the mean) the five-number summary is appropriate for ordinal measurements, as well as interval and ratio measurements.
It is possible to quickly compare several sets of observations by comparing their five-number summaries, which can be represented graphically using a boxplot.
The five-number summary is sometimes represented as in the following table:
These are the number of moons of each planet in the Solar System
.
It helps to put the observations in ascending order: 0, 0, 1, 2, 13, 27, 61, 63. There are eight observations, so the median is the mean of the two middle numbers, (2 + 13)/2 = 7.5. Splitting the observations either side of the median gives two groups of four observations. The median of the first group is the lower or first quartile, and is equal to (0 + 1)/2 = 0.5. The median of the second group is the upper or third quartile, and is equal to (27 + 61)/2 = 44.
The smallest and largest observations are 0 and 63.
So the five-number summary would be 0, 0.5, 7.5, 44, 63.
> moons <- c(0, 0, 1, 2, 63, 61, 27, 13)
> fivenum(moons)
[1] 0.0 0.5 7.5 44.0 63.0
> summary(moons)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 0.5 7.5 20.88 44.0 63.0
Descriptive statistics
Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...
that provides information about a set of observations. It consists of the five most important sample percentiles
Percentile
In statistics, a percentile is the value of a variable below which a certain percent of observations fall. For example, the 20th percentile is the value below which 20 percent of the observations may be found...
:
- the sample minimum (smallest observation)
- the lower quartileQuartileIn descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled...
or first quartile - the medianMedianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
(middle value) - the upper quartileQuartileIn descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled...
or third quartile - the sample maximum (largest observation)
In order for these statistics to exist the observations must be from a univariate
Univariate
In mathematics, univariate refers to an expression, equation, function or polynomial of only one variable. Objects of any of these types but involving more than one variable may be called multivariate...
variable that can be measured on an ordinal, interval or ratio scale
Level of measurement
The "levels of measurement", or scales of measure are expressions that typically refer to the theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed his theory in a 1946 Science article titled "On the theory of scales of measurement"...
.
Use and representation
The five-number summary provides a concise summary of the distributionProbability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
of the observations. Reporting five numbers avoids the need to decide on the most appropriate summary statistic. The five-number summary gives information about the location (from the median), spread (from the quartiles) and range (from the sample minimum and maximum) of the observations. Since it reports order statistic
Order statistic
In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference....
s (rather than, say, the mean) the five-number summary is appropriate for ordinal measurements, as well as interval and ratio measurements.
It is possible to quickly compare several sets of observations by comparing their five-number summaries, which can be represented graphically using a boxplot.
The five-number summary is sometimes represented as in the following table:
median | |
1st quartile | 3rd quartile |
Minimum | Maximum |
Example
This example calculates the five-number summary for the following set of observations: 0, 0, 1, 2, 63, 61, 27, 13.These are the number of moons of each planet in the Solar System
Solar System
The Solar System consists of the Sun and the astronomical objects gravitationally bound in orbit around it, all of which formed from the collapse of a giant molecular cloud approximately 4.6 billion years ago. The vast majority of the system's mass is in the Sun...
.
It helps to put the observations in ascending order: 0, 0, 1, 2, 13, 27, 61, 63. There are eight observations, so the median is the mean of the two middle numbers, (2 + 13)/2 = 7.5. Splitting the observations either side of the median gives two groups of four observations. The median of the first group is the lower or first quartile, and is equal to (0 + 1)/2 = 0.5. The median of the second group is the upper or third quartile, and is equal to (27 + 61)/2 = 44.
The smallest and largest observations are 0 and 63.
So the five-number summary would be 0, 0.5, 7.5, 44, 63.
Example in R
It is possible to calculate the five-number summary in the R programming language using thefivenum
function. The summary
function, when applied to a vector, displays the five-number summary together with the mean (which is not itself a part of the summary).> moons <- c(0, 0, 1, 2, 63, 61, 27, 13)
> fivenum(moons)
[1] 0.0 0.5 7.5 44.0 63.0
> summary(moons)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 0.5 7.5 20.88 44.0 63.0
See also
- Sample maximum and minimumSample maximum and minimumIn statistics, the maximum and sample minimum, also called the largest observation, and smallest observation, are the values of the greatest and least elements of a sample....
- QuartileQuartileIn descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled...
- MedianMedianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
- Seven-number summarySeven-number summaryIn descriptive statistics, the seven-number summary is a collection of seven summary statistics, and is a modification or extension of the five-number summary...
- Three-point estimationThree-point estimationThe three-point estimation technique is used in management and information systems applications for the construction of an approximate probability distribution representing the outcome of future events, based on very limited information...