Sampling and sampling statistics


Ed 510 Applications of Educational Research
 


 
 

Here are some terms that will help you understand this week's lesson.
 

Population
Random sampling
Sample
Sampling methods
Sampling error of the mean SEM
Scientific sampling
Stratified random sampling
Tailored sampled, also weighted samples
Weighted samples


 

Introduction

    Samples and populations are related concepts.  A population is the entire lot on which a universe of observations can be found.  Samples are drawn from populations.  Therefore they are smaller than the population itself.  Although this seems obvious one sometimes can detect confusion in the words used by researchers.  For example, one might read in a study about the sample population.  There is no such thing.  The researcher is probably trying to express that a sample was drawn from a particular population of importance to a study.  In this case the population should be referred to as the population of interest.  The sample is a subset of that population.
 

Populations
 

    Populations may be either theoretical or real.  Theoretical populations do not exist in real terms.  They are instead large distributions of numbers or scores that have been generated in order to model a particular measurable phenomenon.  For example, an economist may create a theoretical population that represents the activity of stock market prices under specific conditions.  An educational researcher may create a theoretical population that represents the hypothetical distribution of responses to items on a test so that real distributions, based on actual scores, can be compared to the hypothetical case.  Theoretical populations always represent score distributions under a hypothetical set of conditions.  Real distributions of scores are then compared to the hypothetical case.
 

It would be interesting to see if members of the class could think of examples where an educational researcher might find this approach useful.
 

Real populations are based on real scores or numerical values that have been collected as actual data.  These populations are described in terms of the measured characteristics of important variables.  For example, the population of IQ scores in Media refers to the scores as a population rather than to the human beings who took the test.  The study of populations is fundamentally about the statistical characteristics of distributions.
 

Samples
 

Samples are subsets of populations.  They are described in terms of the percentage of the population that is sampled.
 

One reads of a 10 percent sample, or a 25 percent sample.  Samples are also described in terms of the number (N) of observations that any sample contains.  Thus a sample is described as:   a 25 percent sample (N=100).  One can tell immediately that the population size was 400.

Samples are used because they are more efficient and less costly to study than entire populations.  They are also more accessible in many cases.  It is much easier to study a sample of Learning Disabled students than it is to hunt down every simile individual in a category for the purposes of research.

Because samples are supposed to represent the population from which they have been drawn, they are proxies for the population itself.  It is essential that the statistical description of a sample be a closely matched to the statistical description of its parent population.  When a population of 10,000 IQ scores has a mean of 115 and a standard deviation of 15, it is expected that the sample that represents that population also has a mean of 115 and a standard deviation of 15.  When there is a close match, the sample is described as unbiased.  When there are discrepancies between sample and population in terms of important descriptive statistics, then the sample is described as biased.

Sampling bias is important to document.  The reason?  Anytime a sample deviates statistically from its parent population it ceases to be a good proxy.  The greater the deviation, the greater the bias.  Information that is gathered from such a sample cannot be used to generalize to the population.  So sampling bias refers to the extent to which a sample and its statistical characteristics do not correspond to the characteristics of the parent population.
 
 

Have some fun.

A party boat sailed off the coast of Cape May to fish for flounder.  It was reported that a huge population of flounder could be found in a certain location, approximately 10,000 fish in all.

It was further reported that on average fish in this population weighed about 5 pounds.

As baskets of founder were weighed by the party goers, they noticed that one basket contained flounder that weighed either 1 or 7 pounds, but averaged 5 pounds.

Another basket contained flounder that weighed exactly 5 pounds per fish, and averaged naturally 5 pounds.

Yet a third basket contained flounder that weighed either 4 pounds or 6 pounds, and averaged 5 pounds.

A fourth basket contained fish that weighed 6 to 8 pounds per fish.  Thus the average weight of a flounder could not be 5 pounds.  What does this final set of measurements represent?

Answer:  B gmvlf!  Can you decode the cryptogram?


 

Sampling error- Sampling error can be demonstrated statistically.  It is a value that basically determines the extent to which a sample deviates from population expectations.  Imagine an infinite number of samples drawn from a population of numbers.  Each sample consists of scores and a mean can be calculated for the sample.  Each mean should approximate the population mean.  However this will not always happen.  Some samples will have means that are larger than the population mean, others will have means that are smaller than the population mean.  This finding represents sampling error.  However when all the samples are averaged together, the grand mean of all sample means should approximate the mean of the population.  When this does not happen one looks for individual samples whose means are out of line.  These will be the samples that contribute to sampling errors.
 

The samples should be understood as comparable to individual scores.  One can graph a set of individual scores as a frequency distribution and locate the center (and the mean) of the distribution.  One can also graph a set of samples using sample means instead of scores.  The grand mean of all samples should define the center of the sampling distribution.
 

Scientific or random sampling - This an approach to sampling that seeks to minimize bias.  It is also referred to as random sampling.  Random sampling implies that all observations have been drawn from the population in such a way that each observation has an equally likely chance of being observed or drawn.  This phenomenon is called the equal likelihood assumption and it is the key characteristic of random or scientific sampling.
 

In order to insure that a sample is random, the selection of sample members is conducted in ways that enhance randomization.  Numbers picked out of a hat or the use of a table of random numbers are frequently used approaches.
More modern practices include the use of computer software packages that use random number generators.
 

Systematic sampling - If a population is very large and its members can be arranged or listed in a sequential way, it is also possible to select a random number as a starting point and then select all other members of the sample on a percentage basis.  When this procedure is used, it is referred to as systematic sampling.
 

Stratified random sampling is another approach to random sampling.  A sample is subdivided according to categorical or discrete variables that are important to a research question.  Think of the population from which Dr. Peoples' drew her research subjects.  They are described by variables such as sex, tenure status, school principal or not, or personality type (AE or RPS), etc.  Then individual members are drawn from the parent population to represent membership in each of the categories.  Any member that is drawn must also be a member of each category.  So, for example, an individual drawn as part of a stratified sample must be either a male or a female, tenured or non tenured, a principal or non principal, and either an AE or a RPS.  In a stratified random sample each category of subjects must also be proportionally represented.  Thus if 50 percent of the population is male, then 50 percent of the sample must also be male.  If 40 percent of the population is AE, then 40 percent of the sample must also be AE.  The process of sampling subjects continues until all categories are filled, and in filling those categories, the resulting size of each group represents the percentages of each group in the population at large.
 

Weighted or Tailored samples - There are times when researchers solve problems using samples that come from populations that are difficult to describe.  A researcher may not know how large a population is and therefore is unable to calculate with confidence the important descriptive statistics for the population.  Under these circumstances it is impossible to know if a random sample represents the population.
 

Weighted samples are therefore samples that consist of groups of observations that are studied separately.    The members are drawn at random according to characteristics that are known to be important and descriptive of the population.  It is believed that as several samples are aggregated, the overall effect of all samples will be to represent the population at large.  However, individual samples will be poor proxies.  Only when taken together do they represent the population at large.
 

Studies of consumer behavior, publishers of text book series and standardized tests, studies of voter behavior, and US census studies of the country's population use weighted samples frequently.  The exact size of the population of interest is unknown, as are the descriptive statistics.  Therefore knowledge of a population will be crafted from the study of samples that when pooled begin to resemble the larger population statistically.  That is why sampling is frequently based on regions of the country as the starting point, rather than the country as a whole.
 

Sample accuracy - Stratified random samples are believed to be the most accurate.  Sampling errors are estimated to be 1 percent.  Simple random samples have an error rate of approximately 5 percent.  Weighted samples can vary in accuracy from 85 to 90 percent and when well designed have error rates from 10 to 15 percent.  What principles might a researcher apply to decide which sampling method to use?  It may  not be necessary to have 99 percent accuracy, or even desirable.
 

How large should a sample be?  Your internet readings discuss this point at length.  It is important to remember that the smaller the population, the greater the percentage of members must be sampled in order to insure reasonable accuracy. Why would that be the case?
 

In addition, as a population exceeds 10,000 members in size, the size of a sample needed to guarantee a reasonable level of accuracy tends to level off.  A sample size of 1000 will be about as accurate for a population of 10,000 as for a population of 100.000.  Why would that be the case?
 

Finally, consider the guidelines that appear in the table below.  They help us understand that there is always a tradeoff between sample size and sampling error.
 
 
 
 

Sample size is smaller Sample size is larger
Sampling is random Generalizability is high 
Errors may be high
Least error
Most confidence
Sampling is non random Most error
Least confidence
Errors may be low  Generalizability is low

How does a researcher sort through this information?
 
 

Page created March 17, 2001. Copyright - Antonia D'Onofrio - 2001/2002/2003.