Inference
Home Up Course Web SPSS Graphs Inference Categorical Data Comparisons Regression

 

What is a sampling distribution?

A sampling distribution is a distribution of statistics. We can form a sampling distribution by repeatedly taking random samples from the population. Each time we take a random sample, we could calculate a statistic for the sample. If we then looked at the distribution of all the values of the statistic from all of the random samples, this would be the sampling distribution. In practice, we don't actually take all possible samples from the population. We know what the sampling distribution will look like for a hypothesized parameter without really constructing the distribution. What we do is collect just one sample, and then calculate the value of the statistic for the one sample. The purpose of the sampling distribution is that it allows us to determine where the statistic for our one sample fits on the distribution of that statistic. If our statistic does not look like it fits in with the sampling distribution, we conclude that the sampling distribution is incorrect, and therefore the parameter we hypothesized in order to create the sampling distribution is also incorrect.

What is the standard error?

The standard error is the standard deviation of a sampling distribution. To distinguish standard deviations in the sampling distribution of statistics from standard deviations in the population distribution of original scores, we call the standard deviation of the sampling distribution the standard error .In order to determine if our observed statistic is likely to have come from the sampling distribution, we must know how many standard errors the statistic is from the mean of the sampling distribution. If the observed statistic is far from the mean, then we conclude that the observation is not likely under our hypothesized value for the parameter, so we reject the hypothesized value. For example, suppose we hypothesize a value for a mean of a population. If we then conduct a two-sided test of this mean with a 5% maximum Type I error rate, then we will reject the hypothesized value if the observed mean is at least 1.96 standard errors from the hypothesized mean.

What does it mean for a statistical procedure to be robust?

It means that the procedure still works well even when the assumptions are not quite correct. For example, the t test is a robust test for means. One assumption of the t test is that the distribution of scores is normally distributed. If this is not the case, the t test still works quite well if the sample size is large enough. Because the t test is so robust, we can set our rule-of-thumb for "large enough" quite small. With real data we find that 30 or so participants is almost always large enough. The t test will even work nicely with a smaller number than 30, as long as the original population distribution of scores is not heavily skewed.

What is power?

Power is the probability that a researcher will reject a particular null hypothesis if this null hypothesis is not true. Since power depends on the true value of the parameter, we never actually know what the power is because we don't know what the true value of the parameter is. To get around this problem, we can posit different possibilities for the true value of the parameter and then calculate the power for each probability. The further away the true value of the parameter is from the null hypothesized value, the more power a test of the null hypothesis will have. For example, if the null hypothesis is that the population mean is 50, but the true population mean is actually 55, then the power might be, say, 70%. If, however, the true population mean is actually 58, then the power might be 80%.

A primary use for calculating power is to find the minimum number of participants that are needed for a study. To do this, the researcher determines the main null hypothesis of interest, as well as the minimum alternative hypothesis that is still of interest. The researcher then calculates the power for different sample sizes until the sample size is found that will yield the desired power. For example, again suppose that the null hypothesis is that the population mean is 50. The researcher may not consider it a very important difference if the true value of the population mean is 51, or even 52. That is, this researcher considers a true mean of 51 or 52 to be just about as uninteresting as if the true mean is 50. Suppose that the researcher decides that if the population mean is 53, that would be different enough from the null hypothesized value of 50 to be worth spending money to find this out.

The researcher now proceeds with calculating power for different sample sizes. Suppose that the target power chosen is 80%, when the maximum allowable Type I error rate is 5%. (This means that the maximum allowable Type II error rate is 20%.) The researcher might calculate that when the null hypothesis is that the population mean is 50, and when the actual population mean is 53, a sample size of 25 results in 60% power. This means the researcher needs a bigger sample size. Perhaps a sample size of 50 results in 70% power and a sample size of 100 results in 84% power. The researcher will gradually zoom in on the appropriate sample size. Perhaps in this example the researcher will discover that 86 participants will provide 80% power.

Note that this does not mean that the actual power of the study is 80%. It will only be exactly 80% if the true population mean is 53, but we don't know that this is the case. If the population mean is less than 53, the power will be less than 80%. That is alright, though, because when the mean is less than 53 it is not much different from 50, so we don't care that we may not reject the null hypothesis. The difference between the true mean and the hypothesized mean is trivial. If the population mean is actually greater than 53, then we will have more than 80% power. The 80% power we calculated provides us a minimum probability of detecting an important result. We certainly don't mind if the power is actually more than 80% when the difference between the true mean and the hypothesized mean is even bigger than what we proposed in our power calculation.

How do you calculate the power for testing a hypothesis about a mean?

The steps for any power calculation are as follows:

(1) Determine the primary null hypothesis of interest.

(2) Choose an alternative hypothesis. For calculating the sample size needed in the study, this should be the alternative hypothesis that is close to the null hypothesis, yet still interesting. Values that are too close to the null hypothesis aren't different enough from this hypothesis to be interesting. Note also that the closer the alternative hypothesis is to the null hypothesis, the more expensive the study will be. Often a researcher must strike a compromise between cost and differences from the null hypothesis that will likely be detected.

(3) Determine the maximum allowable Type I and Type II error rates. Note that power is one minus the Type II error rate.

(4) Determine the critical value for a test of the null hypothesis. First you can determine the critical value on a standard distribution, but then this must be converted to a critical value on the sampling distribution.

(5) Calculate the probability of obtaining a statistic in the rejection region if the chosen alternative hypothesis is true. Use the sampling distribution for the alternative hypothesis when making this calculation.

As an example of these steps, suppose that the scores on a certain standardized test are normally distributed. The published mean is 50 and the standard deviation is 10. A school district researcher was given the task of determining if the district population mean would be higher than the national population mean if the students in the district took this particular test. The district does not actually want to pay to have every student take the test, so the researcher will conduct a study using a sample of students chosen at random from the district.

The null hypothesis of interest is that the district population mean is 50. This is because a mean of 50 would mean that students in the district do not score any different than typical students across the nation. The researcher might determine that scores of 51, 52, or 53 would not be viewed by the public as much different than 50, so the researcher chooses an alternative hypothesis of 54. (The researcher may later make this even larger if the sample size calculation results in a sample size that is too large.) The researcher also chooses a maximum Type I error rate of 5% and a maximum Type II error rate of 20% (i.e., 80% power).

The critical value for an upper-tailed test of the null hypothesis is 1.645 on the standard normal distribution. That means, the critical value on the sampling distribution will need to be 1.645 standard deviations above the mean. The sampling distribution under the null hypothesis has a mean of 50. The standard error depends on the sample size. Suppose that the researcher took a guess that 25 children will be needed. Then the standard error of the sampling distribution is 10 divided by the square root of 25, or 2. (Recall that the standard error is just the standard deviation of the sampling distribution.) This means that the critical value on the sampling distribution under the null hypothesis is 50 + (1.645)(2) = 53.29. That is, 53.29 is 1.645 standard deviations above the mean of 50. When the hypothesis test is actually conducted, if the sample mean of the scores of the 25 children is 53.29 or larger, the researcher will reject the null hypothesis and conclude that the district population mean is above the national population mean.

Now we are to the final step. We must calculate the probability of obtaining a sample mean of 53.29 or larger if our chosen alternative hypothesis is true. The chosen alternative hypothesis is 54. The probability of obtaining a mean of 53.29 or more from the sampling distribution under the alternative hypothesis is the probability of getting a Z statistic larger than (53.29 - 54) / 2 = -0.355. We can look this up on the table for the standard normal distribution. It is about 0.64. The power is 64%. This is less than the 80% power that the researcher wanted, so a bigger sample size is needed. The researcher must redo the power calculation, this time with a larger sample size. This is repeated until the desired power is obtained.

Sometimes in the process of calculating the power the researcher discovers that the sample size needed is so large that the cost will be prohibitive. The following choices are available. (1) Cancel the study. (2) Decrease the power requirements and hope for the best. (3) Recalculate the sample size with a larger difference between the null and alternative hypotheses. In experimental studies, this is equivalent to refining the treatment so that there will be a larger effect. (4) Reduce the standard deviation of the scores by using a more precise measuring instrument or by including more factors in the study.

URL http://edpsych.ed.sc.edu/seaman/edrm711/questions/inference.htm

This web was developed by Michael A. Seaman.
This page last updated on 30 January 2000 .
The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been reviewed or approved by the University of South Carolina.