Table of Contents
Interquartile Range
Interquartile range (IQR) is a measure of spread, based on quartiles, that still maintains the idea of range, but is not influenced by the extreme values. It measures the spread of the middle half of the data.
IQR is defined as: IQR = Q3 – Q1
Semi-interquartile range
Semi-interquartile range or quartile deviation (QD) is also based on the differences between the quartiles, as these differences are expected to increase (decrease) with the increase (decrease) in the variability of data. Thus QD is defined as
\[QD=\frac{\left( {{Q}_{3}}-{{Q}_{2}} \right)+\left( {{Q}_{2}}-{{Q}_{1}} \right)}{2}\]
\[\Rightarrow QD=\frac{\left( {{Q}_{3}}-{{Q}_{1}} \right)}{2}\]
Example 01 |
Find the interquartile range and quartile deviation for the following data on the number of hours spent in watching television last week for 10 students: 18, 7, 15, 27, 22, 20, 24, 27, 30, and 12.
Solution:
First arrange the observations in ascending order as: 7, 12, 15, 18, 20, 22, 24, 27, 27, 30.
Here n = 10.
\[\therefore \frac{n+1}{4}=\frac{11}{4}=2+0.75~~and~~3\left( \frac{n+1}{4} \right)=\frac{33}{4}=8+0.25\]
Thus, Q1 = 2nd value + 0.75( 3rd value – 2nd value) = 12 + 0.75( 15 – 12) = 14.25
Q3 = 8th value + 0.25( 9th value – 8th value) = 27 + 0.25( 27 – 7) = 27
Therefore,
IQR = Q3 – Q1 = 27 -14.25 = 12.75
QD = 12.75 / 2 = 6.375
Identification of outlier
IQR helps in detecting if there is any potential outlier in the data set. From a set of values how do we say if any value is really unusually high or low? There must be some rule of thumb for determining if there is any outlier.
An outlier is defined as an observation which falls more than 1.5 X IQR (called a step) above Q3 or below Q1.
If an observation falls more than 3 X IQR above Q3 or below Q1, then it is known as an extreme outlier and the observation falling between 1.5 X IQR and 3 X IQR above Q3 or below Q1 is known as a suspect outlier.
The values (Q1 – 1.5IQR, Q3 + 1.5IQR) are known as inner fences and (Q1 – 3IQR, Q3 + 3IQR) are known as outer fences.
In above example, the inner fences are
Q1 – 1.5 X IQR = 14.25 – 1.5 X 12.75 = 14.25 – 19.125 = – 4.875
Q3 + 1.5 X IQR = 27 + 1.5 X 12.75 = 27 + 19.125 = 46.125
There are no observation in the dataset that falls beyond the interval (- 4.875, 46.125) which indicates that there is no outlier in the dataset.
Example 02 |
A marketing consultant observed 51 consecutive shoppers at a supermarket. His variable of interest was how much each shopper spent in the store. Here are the data ( in $).
3, 9, 9, 11, 13, 14, 15, 16, 17, 17, 18, 18, 19, 20, 20, 20, 21, 22, 23, 24, 25, 25, 26, 26, 28, 28, 28, 28, 28, 32, 35, 36, 39, 39, 41, 43, 44, 45, 45, 47, 49, 50, 53, 55, 59, 61, 70, 83, 86, 93, 125.
Identify if there is any outlier in the dataset.
Solution:
Here Q1 = 19, Q2 = median = 28, Q3 = 45, IQR = Q3 – Q1 = 45 – 19 = 26
Inner fences:
Q1 – 1.5 X IQR = -20
Q3 + 1.5 X IQR = 84
Outer fences
Q1 – 3 X IQR = -59
Q3 + 3 X IQR = 123
Now looking at the data we can easily identify the observations 86 and 93 as suspect outliers (since these two values are more than 84, 1.5 X IQR above Q1) and 125 as extreme outlier (since the value is more than 123).
Merits and Demerits of Interquartile Range
Merits:
1. IQR and QD are easy to calculate.
2. These are not affected by the presence of extremely large or small values.
3. These measures are suitable for distributions with open-end classes.
Demerits:
1. IQR and QD do not consider all observations in the dataset.
2. These are highly affected by sampling fluctuations.
3. These are not amenable to further algebraic treatment.
Karl Pearson’s Coefficient |