Descriptive statistics- Measures of central tendency and measures of dispersion


Ed 510 Applications of Educational Research



 
 
 
Here are some terms that help you master the course material.  Define them and create examples for each.
 
     
  • Average 
  • Central tendency 
  • Central limit theorem 
  • Deviation scores 
  • Mean 
  • Measure of dispersion 
  • Median 
  • Mode 
  • Score distribution 
  • Skewness 
  • Standard deviation 
  • Statistical norm 
  • Sum of square 
  • Variance

Measures of central tendency
 

Many times educators wish to describe the performance of individuals and groups using estimates of general trends in data that they have collected. At the same time they will want to understand how individuals tend to vary from general patterns. Measures of central tendency allow researchers to estimate the general tendencies that describe a group as a whole. Measures of dispersion allow researchers to estimate patterns of deviation from these general tendencies or patterns of scores.

 
One should remember that statistics is the science of estimation. Therefore statistics uses many kinds of indices that are really estimates of group performance. The place to begin to understand this process of estimation is the central limit theorem.

 
The central limit theorem asserts that the best estimate of individual performance is the mean of the group. This means that if one knows nothing about individual scores but does know the average score of a group then that average or mean value will be the best bet when guessing what the score of any individual selected at random is likely to be.

 
Example:: The mathematics resource teacher is beginning to plan an enrichment curriculum for the 4th grade class at the Smedley School. She would like to design activities that most students are going to enjoy and that will stimulate them. Since she doesn't have time to examine all the Iowa Math Battery Scores for each student she can use the average score on the math test as a good indicator of the abilities of any child in the 4th grade. In fact, the mat resource teacher will be correct about 68 percent of the time if she uses the mean as her guideline for instructional design.

 
Why is that true? The bell shaped curve is a familiar idea. It describes a distribution of scores that is referred to as normal. A bell shaped curve is also perfectly symmetrical. That means that score values above the central value are balanced by as many values below the central value. In fact the curve gets its bell shape from another observation made by the central limit theorem. For any score at a measured distance above the central value in a distribution, there will be another score falling at an equal distance below the center of the distribution of scores.

 
The highest part of the bell shaped curve lies at the center of the entire bell shaped distribution. The highest part is highest because most scores fall in that location; and that location is also the center of the distribution. Therefore, the most frequent score also appears to define the central values of a normal distribution of test scores. Guessing the center is therefore also guessing the most frequently occurring score. To find out why 68 percent of all scores also fall in the center of the distribution one has to read on.

 
What are the statistics that define the center of a distribution. There are three.

 
The median is the arithmetic center of any distribution. One has only to find the middle of any distribution of scores. For example, if math scores in the 4th grade are as follows 68, 72, 62, 53, 45, 30, 39, one finds the center by first arranging the scores from lowest to highest.
Distribution A 
 

30, 39, 42, 45, 62, 68, 72
 


 
      The middle score is 45 and that is the median in Distribution A.

 
      The mode is the most common or frequent score.
     
      To see the relationship between the median and mode in a distribution, consider the following group of numbers.
Distribution B 
 

30, 39, 45, 52, 62, 62, 62, 63, 72, 78, 79
 


 
 
 
 
The mode is 62 and the median is also 62. In distribution B both mode and median fall in the same place and are centrally located.

 
Of course the mode and median do not have to fall together in the center of a distribution. And when that happens the distribution is no longer symmetrical.
In a perfectly balanced distribution, the mean, mode and median will be of the same value.

 
Distribution C 
 

30, 35, 40, 45, 50, 55, 60, 60, 60, 65, 70, 75, 80, 85, 90 
 

What are the mode, median and mean in distribution C?
 

There are also distributions in which mode and median do not fall in the same location. These distributions are described as skewed. The bell shape no longer describes the shape of the distribution of scores. When distributions are skewed, they can be either positively or negatively skewed. Positive skew occurs when the mode falls below the median value. Negative skew occurs when the mode falls above the median value. This seems like a contradiction. However, when most scores fall below the median then the value of the mode will be less than the value of the median. When the mode falls above the median then the value of the mode will be greater than the value of the median. The median retains its value because it is centrally located. Consider the following expression.

 
MEDIAN - MODE = ? 
 
  • If the mode is the smaller value, the answer will be a positive number. Hence positive skew.
  • If the mode is the larger value, the answer will be negative. Hence negative skew.
  • Which distribution has negative skew and which has positive skew? A or B?
A. 30, 35, 40, 40, 40, 55, 60, 61, 62, 66, 70, 75, 80, 85, 90
 

B. 30, 35, 40, 45, 50, 55, 60, 61, 62, 65, 70, 70, 70, 85, 90

The practical value of this kind of information lies in the description it gives of group performance. If distributions A and B were real test scores, it would be possible to know which was the easier and which the more difficult test. Furthermore, if an educator wanted to use a test as a benchmark, to decide whether most material in a lesson or unit had been mastered by most members of a class, then negative skew would actually be a desirable finding. Likewise a positive skew would be desirable if a test were being used to eliminate examinees in competitive situations, for example admission to law school.
Measures of dispersion
 
When the mean, mode and median are known, then it is possible to use test scores to describe the most typical members of a group. But within the normal range there will also be a range of individual variation. Measures of dispersion provide descriptions of how individuals in a group deviate from the center on average.

 
One of the most common statistics that is used to estimate the extent to which the average individual will deviate from mean or central values is the standard deviation. The name suggests that it will define how scores will deviate from a central value in a standard or predictable way. Thus the standard deviation is used to estimate the expected amount of deviation from the mean in a distribution of scores.

 
The standard deviation of a distribution will vary in value depending on three factors: the number of individuals in a group, the magnitude of scores in a distribution and the range of scores That means that the more subjects there are in a study, the greater the size of the scores, and the more scores in a distribution will cause the standard deviation to be larger, and vice versa. It is important to remember that the standard deviation is a statistical. Its value defines a linear distance above and below the mean. A standard deviation of 10 will define a distance that is 10 points above and below the mean. If the mean of a distribution of scores is 50, then the standard deviation will mark a distance that starts with a score of 40 and ends at a score of 60. The mean of 50 will be exactly in the middle of these two values.

 
The standard deviation also establishes a boundary around the value of the mean. This boundary makes estimation of typical performance more precise. The central limit theorem declares that the mean is the best estimate. However the estimate is refined by adding the standard deviation. 68 percent of all the scores in a normal distribution will fall within the boundaries marked by a score of 40 and a score of 60.

 

 

The calculation of the standard deviation

The calculation of the standard deviation demonstrates why it is able to set boundaries around the central value, the mean of the distribution. Each step in the calculation leads to the estimation of a linear value that sets the average distance of scores from the mean.

 
Step
What it describes
The statistic
The symbol
List all scores from lowest to highest
The distribution of scores
X stands for score
X1 .....Xn
Calculate the mean
The average score in the distribution
The mean of the distribution
_
X
Subtract the mean from each score 
The linear distance of each score from the group mean
Individual deviation scores
      _
X - X or 

x

Square each difference score
The area under the distribution or curve that is occupied by individual variation 
Squared deviation score
x 2
Add all of the deviation scores
The sum of the square deviation scores - or the total area of individual variation in the distribution or group of scores
The sum of squares
S x 2
Divide the sum of squares by the total number of scores or individuals in the group
The average amount of individual variation in a group of scores
The variance
Sx2/ N, or 
S 2
Take the square root of the variance
The average linear distance from the mean based on the average linear estimate of total variation
The standard deviation
(S 2) 1/2, or
S

An actual calculation would look like this.
 
Subject Score Calculation of deviation scores Deviation scores Squared deviation score
John 10 10-50 -40 1600
Mary 20 20-50 -30 900
Jossey 30 30-50 -20 400
Adam 40 40-50 -10 100
Michael 50 50-50 0 0
Peggy 60 60-50 10 100
Samantha 70 70-50 20 400
Vera 80 80-50 30 900
Robert 90 90-50 40 1600

 
 

Summarizing questions
 

Using 25.82 as the value for one standard deviation, then what would the be the value for 2 standard deviations above the mean, 3 standard deviations above the mean, and 3 standard deviations below the mean?
 

Where would the following scores be located: 26, 37, 42, 53, 67, 72, 81, 89 in relation to one standard deviation above or below the mean?

return to the course schedule

Page created January 5, 2001. Page modified January 20, 2001.  Copyright Antonia D'Onofrio 2001/2002/2003.