
Descriptive Statistics
(Lecture Three)

![]() |
Dr. Lari Arjomand
Introduction:
The purpose of this lecture is to help you to understand conceptually the meanings of
measures of locations (i.e., mean, median, and mode) and measures of variability (i.e.,
range, variance, standard deviation, and coefficient of variation).
Measures of Location for Ungrouped or Raw Data:
Measures of location give information about location in
a group of numbers or data. The measures of location presented in this lecture note for
ungrouped (raw) data are the mean, the median, and the mode.
Arithmetic Mean:
The arithmetic mean (or the average or simply mean) is
computed by summing all numbers and dividing by the number of observations. For example,
to compute the arithmetic mean of a sample of numbers, such as 19, 20, 21, 23, 18,
25, and 26, first sum the numbers: (19+20+21+23+18+25+26) = 152, and then calculate the sample
mean by dividing this total (152) by the number of observations (7), which gives a
mean of 21.7 or about 22.
The mean uses all the observations and each observation affects the mean. Even though the
mean is sensitive to extreme values (i.e., extremely large or small data can cause the
mean to be pulled toward the extreme data) it is still the most widely used measure of
location. This is due to the fact that the mean has valuable mathematical properties that
make it convenient for use with inferential statistics analysis. For example, the sum of
the deviations of the numbers in a set of data from the mean is zero, and the sum of the
squared deviations of the numbers in a set of data from the mean is minimum value. These
points will be explained in detail in lecture number 14.
Weighted Mean:
In some cases the data in the sample or population should not be weighted equally, and
each value weighted according to its importance. For example, suppose Lari wants to find
his average in stat course, and assume that the exams are weighted as follows:
First Test..............100 Points.....15%
Second Test..........100 Points.....20%
Third Test.............100 Points......25%
Final Test.............100 Points......30%
Assignments.........050 Points.....10%
Availabe Points.....450 Points......100%
Assume Lari made 90, 71, 87, 77, and 40 on first test, second test, third test, final
exam, and the assignments, respectively. Larie's average in the stat course is calculated
as follows:
(90x0.15+71x0.20+87x0.25+77x0.30+40x0.10)/(0.15+0.20+0.25+0.30+.010)=76.55 or 77 points.
Median
The median is the middle value in an ordered array of observations. If there
is an even number of data in the array, the median is the average of the two middle
numbers. If there is an odd number of data in the array, the median is the middle
number. For example, suppose you want to find the median for the following set of data:
74, 66, 69, 68,73, 70
First, we arrange the data in an ordered array:
66, 68, 69, 73, 70, 74
Since there is an even number of data, the average of the middle two numbers (i.e., 69 and
73) is the median (142/2 = 71). Note that in general, location of the median
is=(n+1)/2 where n=total number of items.
Generally, the median provides a better measure of location than the mean when there are
some extremely large or small observations (i.e., when the data are skewed to the right or
to the left). For this reason, median income is used as the measure of location for the
U.S. household's income. Note that if the median is less than the mean, the data
set is skewed to the right (i.e., data having lower limit but not upper limit will result
in positively skewed to the right). If the median is greater than the mean, the
data set is skewed to the left (data having upper limit but no lower limit will result in
negatively skewed to the left). Median does not have important mathematical properties for
use in future calculations. See the following figure:

Mode:
The mode is the most frequently occurring value in a
set of observation. For example, given 2, 3, 4, 5, 4, the mode is 4, because there are
more fours than any other number. Data may have two modes. In this case we say the data
are bimodal, and observations with more than two modes are referred to as multimodal.
Note that the mode does not have important mathematical properities for future use. Also,
the mode is not a helpful measure of location, because there can be more than one mode or
even no mode.
Measures of Variability for Ungrouped or Raw Data:
Measures of variability represent the dispersion of a set of data. For example, let's go
back the Lari's grade in the stat course:
Lari made 90, 71, 87, 77, and 40 on first test, second test, third test, final exam, and
the assignments, respectively. Remember that Lari's average in the course was 77. What
does this average score mean to Lari? Should he be satisfied with this information?
Measure of location (mean in this case) does not produce or grant sufficient or adequate
information to describe the data set. What is needed is a measure of variability of the
data. Note that a small value for a measure of dispersion indicates that the data are
around the mean; therefore, the mean is a good representative of the data set. On the
other hand, a large measure of dispersion indicates that the mean is not a good
representative of the data set. Also, measures of dispersion can be used when we want to
compare the distributions of two or more sets of data. In this lecture we will talk about
range, variance, standard deviation, and coefficient of variation for ungrouped or raw
data.
Range:
The range is the difference between the largest
observation of a data set and the smallest observation. The major disadvantage of the
range is that it does not include all of the observations. Only the two most extreme
values are included and these two numbers may be untypical observations. For example,
given that the ages for a sample of 8 students at CSC are: 24, 18, 22, 19, 25, 20, 23, and
21, the range for this data set is: 25 - 18 = 7.
Variance:
An important measure of variability is variance.
Variance is the average of the squared deviations from the arithmetic mean. For
example, suppose that the height (in inches) of a sample of students at CSC are as
follows:
Height in inches
66
73
68
69
74
The following steps are used to calculate the variance:
1. Find the arithmetic mean.
2. Find the difference between each
observation and the mean.
3. Square these differences.
4. Sum the squared differences.
5. Since the data is a sample, divide the
number (from step 4 above) by the number of observations minus one, i.e., n-1 (where n is
equal to the number of observations in the data set). Later on, this term (n-1) will be
called the degrees of freedom.
Following the above steps, the variance is calculated as follows:
Height.............Deviation..............Square
(Inches)........................................Deviation
66....................66-70= - 4.............16
73....................73-70= +3..............09
68....................68-70= - 2..............04
69....................69-70= - 1..............01
74....................74-70= +4..............16
Total of column one = 350, and total of column three = 46
Arithmetic mean = (350)/(5) = 70 inches and variance = (46)/(5-1) = 11.5 squared inches.
As you see in the above example, the variance is not expressed in the same units as the
observations. In other words, the variance is hard to understand because the
deviations from the mean are squared, making it too large for logical explanation. These
problems can be solved by working with the square root of the variance, which is
called standard deviation.
Standard Deviation:
Both variance and standard deviation provide the same information; one can always be obtained from the other. In other words, the
process of computing a standard deviation always involves computing a variance. As we
said, since standard deviation is the square root of the variance, it is always expressed
in the same units as the raw data. For example, in the above problem the variance
was 11.5 square inches. The standard deviation is the square root of 11.5 which is equal
to 3.4 inches (expressed in same units as the raw data).
Meaning of Standard Deviation:
One way to explain the standard deviation as a measure of variation of a data set is to
answer questions such as how many measurements are within one, two, and three standard
deviations from the mean. To answer questions such as this, we need to talk about
empirical rule and Chebyshev's rule. The following rules present the guidelines to help
answer the questions of how many measurements fall within 1, 2, and 3 standard deviations.
Empirical Rule:
This rule generally applies to mound-shaped data, but specifically to the data that are
normally distributed, i.e., bell shaped. The rule is as follows:
Approximately 68% of the measurements (data) will fall within one standard deviation of
the mean, 95% fall within two standard deviations, and 97.7% (or almost 100% ) fall within
three standard deviations. See the following figure:

For example, in the height problem, the mean height was 70
inches with a standard deviation of 3.4 inches. Thus, 68% of the heights fall between 66.6
and 73.4 inches, one standard deviation, i.e., (mean + 1 standard deviation) = (70 + 3.4)
= 73.4, and (mean - 1 standard deviation) = 66.6. Ninety five percent (95%) of the heights
fall betweeen 63.2 and 76.8 inchesd, two standard deviations. Ninety nine and seven tenths
percent (99.7%) fall between 59.8 and 80.2 inches, three standard deviations. See the
following figure:

Z Score:
We can pick any point on the X axis in the above figure and find out how many standard
deviations above or below the mean that point falls. In other words, a Z score represents
the number of standard deviations an observation (X) is above or below the mean. The
larger the Z value, the further away a value will be from the mean. Note that values
beyond three standard deviations are very unlikely. Note that if a Z score is
negative, the observation (X) is below the mean. The Z score is found by using the following
relationship:
Z = (a given value - mean) / standard deviation
For example, for a data set that is normally
distributed with a mean of 25 and a standard deviation of 5, you want to find out the Z
score for a value of 35. This value (X = 35) is 10 units above the mean, with a Z
value of:
Z = (35 - 25)/(5) = (10)/(5) = +2
This Z score shows that the raw score (35) is two
standard deviations above the mean. Would you be pleased with a grade in this course that
is 2 standard deviations above the mean of the class? The topic of Z
score will be discussed in more detail in lecture note six.
Chebyshev's Rule:
Chebyshev's rule applies to any sample of measurements regardless of the shape of their
distribution. The rule states that:
It is possible that none of the measurements will fall within one standard deviation of
the mean. At least 75% (or 3/4) of the measurements will fall within two standard
deviations of the mean, and 89% (or 8/9) of the measurements will fall within three
standard deviations of the mean.
Generally, according to this rule, at least 1 - (1/k squared)
of the measurements will fall within [(mean + - (k)
standard deviation)], i.e., within k standard
deviation of the mean, where k is any number greater than one. For example, if k
= 2.8, at least .87 of all values fall within (mean + - 2.8 x standard deviation), because
1 - (1/k squared) = 1 - (1/7.84) = 1 - 0.13 = 0.87.
Coefficient of Variation:
We said that standard deviation measures the variation in a set of data. For distributions
having the same mean, the distribution with the largest standard deviation has the
greatest variation. But when considering distributions with different means, decision
makers can't compare the uncertainty in distribution only
by comparing standard deviations. In this case, the coefficient of variation is used,
i.e., the coefficients of variation for different distributions
are compared, and the distribution with the largest coefficient of varation value has the
greatest relative variation.
The coefficient of variation expresses the standard deviation as a percentage of the mean,
i.e., it reflects the variation in a distribution relative to the mean:
Coefficient of Variation (C.V.) = (standard deviation / mean) x 100
For example, Mark teaches two sections of statistics. He gives each section a different
test covering the same material. The mean score on the test for the day section is 27,
with a standard deviation of 3.4. The mean score for the night section is 74 with a
standard deviation of 8.0. Which section has the greatest variation or dispersion of
scores?
Day Section....................Night Section
Mean.......27.......................94
S.D............03.4..................08.0
Direct comparison of the two standard deviations shows that the night section has
the greatest variation. But comparing the coefficient of variations show quite
different results:
C.V.(day) = (3.4/27) x 100 = 12.6% and C.V.(night)
= (8/94) x 100 = 8.5%
Thus, based on the size of the coefficient of variation, Mark finds that the night section
test results have a smaller variation relative to its mean than do the day section test
results.
![]()

Links
related to this lecture

All contents copyright (c) 1996.
All rights reserved.
Updated: 04/03/02