Correlation and prediction

Ed 510 Educational Statistics



 
 
Here are some terms that will help you master course material. 
  • Prediction 
  • Slope 
  • Regression 
  • Line of least squares 
  • Scatter gram 
  • Correlation coefficient 
  • Positive correlation 
  • Negative correlation
  • Sign 
  • Magnitude

 

Just having one set of scores to describe a group of learners provides information about the sample but limited information. Most of the time a realistic portrayal of learners requires that we have more than one source of information, and thus more than one set of scores.
 
 
 

When educational researchers investigate a statistical relationship between two sets of scores they are studying the correlation between those scores.
 
 
 

Correlation refers to the statistical association between two sets of scores. The relationship may be direct. That is, as one set of score values increases so also the values in the second set of scores increases. This pattern of direct correspondence is represented as a positive correlation.
 
 
 

Correlations can be inverse. When one set of score values increases, the values in the second set of scores decreases, or vice versa. When there is a inverse pattern of correspondence, the correlation is negative.
 
 
 

Correlation are described in terms of sign (positive or negative). They are also described in terms of the magnitude of the correlation (values between -1.0 through 0 to +1.0. Larger values indicate that the correlation is greater or stronger. The closer a correlation is to zero, the weaker the correlation. A correlation of zero signifies that there is no relationship between the two sets of scores.
 
 
 

For example, the correlation between scores on a math test and scores on a reading test might be .85. This is considered a strong, positive correlation. The scores on the math test are related to performance on the reading test, and the statistical relationship is convincing.
 
 
 

The correlation between scores on a reading test and an IQ test may be .50. This is considered a moderate, positive correlation. There is some statistical correspondence between reading and IQ but this relationship is not particularly strong.
 
 

The correlation between errors on a coordination test and the total score on a test might be -.85. This is considered a negative, strong correlation. The more errors made, the lower the test score.
 
 

Thus errors in coordination correspond statistically to the total score ut the relationship is inverse.
 
 
 

The correlation between spelling scores and reading motivation may be -.20. This would be considered a negative, weak correlation. This correlation would be interpreted to mean that there is almost no relationship between spelling ability and reading motivation, and that the relationship might even be inverse.
 
 
 

Describing correlational relationships
 

Educational researchers describe correlations between two sets of scores by first graphing them. The graph that is drawn of the statistical correspondence between two sets of scores is termed a scatter gram.
 

Just look at this!

The correlation coefficient
 

The calculation of the correlation coefficient is based on a few basic ideas. The most important idea is that of covariance. When the correlation coefficient is calculated the first value to be determined is the covariance, or the extent to which two sets of score vary in a similar way.
 
 
 

The second fundamental idea is that of total variation. The calculation of the correlation coefficient also calculates the pooled or total amount of variance shared by both distributions of scores.
 
 

The equation used to calculate the coefficient employs both ideas.

r  =  Cov x, y / (N-2)
      __________

        (Sx2 y2) 1/2


  • The numerator determines the covariance for two distributions of scores. 
  • The denominator determines the pooled shared variation in two distributions of scores 
  • The coefficient then is a ratio that represents the ratio between the covariance and the pooled variation of distributions. When covariance and pooled variation are equal the coefficient will be perfect, 1.0, positive or negative. 
  • When covariance is smaller than pooled variation, the coefficient will be less than one. The small the covariance relative to pooled variation, the smaller the ratio, the smaller the coefficient. 
  • When the covariance is equal to zero, the correlation coefficient will be zero.

Prediction
 

Correlations allow educational researchers to predict scores. Prediction depends on the degree of correspondence between the scores in two distributions of scores. The more perfect the correspondence, the higher the correlation. The higher the correlation between the two distributions of scores, the more accurate the prediction will be. This is true for both positive and negative correlations.
 

The line of least squares

The line of least squares describes the correspondence between two distributions of scores. When there is perfect correspondence (a correlation of 1.0 positive or negative). This internet link will lead you to a figure that contains the line of least square for a predictive relationship.
 
 

 Take another look!




Note that the line joins scores in the two respective distributions. The scores are joined by intercepts, dots that form the straight line that moves from the bottom left to the upper right corners of the figure.
 

When one examines the line of least squares one can see that by knowing a score in one distribution it is possible to predict a corresponding score in the second distribution of scores.
 

When the correspondence between two distributions of scores is less than perfect, the line of lest squares will loose its slope and will begin to level off. This internet link will lead to you several figures that describe less than perfect correlations, and therefore lines of least squares that loose slope.
 


 You probably already noticed the slope changes when you change the numbers!



The slope of the line of least squares is referred to as the regression coefficient. The value of the regression coefficient describes the slope of the line of least squares. This coefficient has no upper or lower limit on its possible values. However, the larger it is, the stronger the correspondence between the two distributions of scores. For this reason, a regression coefficient is interpreted in the same way as a correlation coefficient.
 

Any scores that do not fall on the line are considered errors of prediction. They are also referred to as residual points. As the number of residual points increases, the line formed from intercepts becomes more difficult to draw.
 

return to the course schedule
 
 

Page created January 5, 2001. Page modified January 20, 2001.  Copyright Antoinia D'Onofrio 2001/2002/2003.