Validity

Home Up Types of Research Ethics Lit Review Role of Theory Hypotheses Variables Sampling Measurement Validity Designs Analysis Inferential Statistics Article Critiques Miscellaneous

Do researchers address all four types of validity (conclusion, internal, construct, external) in all types of research?

No, there are some types of research that do not address some types of validity. For example, in a case study a researcher studies a specific entity (a person, a school, an organization) and provides a rich description of the characteristics of this entity. There is no attempt to generalize beyond this. Rather, it is up to readers of the study to determine if there are aspects of the research that apply to them, or to entities that they are studying.

In many associational or relational research studies, it is impossible to account for extraneous variables. Therefore, internal validity is addressed to a lesser degree in these studies than it is in an intervention study that is designed to demonstrate causal relationships. In sum, different types of studies address validity aspects to different degrees. This is, in fact, one of the major characteristics that distinguishes among different types of studies.

Please provide examples of threats to each of the four types of validity.

External validity is threatened whenever the sample is not representative of the population. This occurs almost any time that a convenience sample is used. For example, consider a study conducted to determine if teaching prosodic elements of reading will increase reading comprehension. The researcher used a first-grade class in a school located in a high socioeconomic neighborhood. The target population is clearly all children who are learning to read, but the effects that are observed in this particular study may be quite different from the effects that might be observed in another school.

Construct validity is threatened whenever the way we operationally define the variable does not match the theoretical intent of the construct. This is a problem anytime we do not create the intended treatment or anytime that we have a poor measure of our response variable. For example, consider a study of the effectiveness of teaching algebra without a heavy emphasis on symbolic manipulation. The explanatory variable would be the method of teaching and the response variable would be achievement in algebra. If our treatment is designed so that it still requires substantial symbolic manipulation, then we are not operationally creating a treatment to match our intent. Also, if our measure of algebra achievement is in any way deficient (e.g., it has poor reliability), then we are failing to obtain a valid measure of algebra achievement so that we are not really measuring the construct of interest.

Internal validity is threatened when extraneous variables could very well provide an alternative explanation for changes in the response variable. For example, consider a study to determine if computer simulations can increase the understanding of planetary motion. Suppose that the researcher locates one school that has a computer lab and one school that does not have a lab. Next the researcher installs the computer simulation at the school that has the lab. At the end of the general science unit on planetary motion, the researcher gives a test to students in the two schools to determine if the computer simulation worked correctly. There are many obvious threats to internal validity in this study. First, the students at the two schools may be very different. The schools may be located in different kinds of neighborhoods where achievement levels, parenting practices, student motivation, and other variables differ. The teachers of the general science classes in the two schools may be different from one another. These just a few of the alternative explanations for why the scores on the test of planetary motion may be different in the two schools.

Conclusion validity is threatened whenever we attempt to detect a relationship among variables. Relationships can occur in our sample that are not present in the population, and vice versa. To reduce the chance of making incorrect conclusions, we must account for the possibility that sample results include an element of chance. For example, consider a study to compare the average cholesterol level of meat eaters to that of vegetarians. We can obtain a sample of meat eaters and a sample of vegetarians, but what if all of the vegetarians that we happen to use in our sample are also long distance runners. This could lead to conclusions about the relationship of cholesterol to diet that really is a relationship of cholesterol to exercise. Our result occurred by chance (i.e., we just happened to sample long distance runners). Techniques in inferential statistics can be used to account for chance, and thus give us a high level of confidence that our findings are real, and not due to chance.

What is specificity of variables, and how is this a threat to external validity?

If a study result is externally valid, we can generalize this result that we obtained for the sample to a larger population. In order to obtain external validity, what we do in the study needs to represent what will happen in the population. This will only be true if (1) the sample represents the population and (2) the way methods are operationalized in the study is similar to how they will occur in the population.

Sample bias means that the sample does not represent the population. We can avoid sample bias by using a good sampling technique, such as random sampling. Specificity of variables means that we must operationalize variables a certain way in our study. Since these variables will be operationalized differently in the population (e.g., different times of day will be used, different materials will be used, different buildings will be used), it is impossible to be certain that the way we used the variables in a single study will generalize to other ecological situations (i.e., to other combinations of time, place, and materials). Thus, the study must be replicated in different ways before we can ecologically generalize.

In sum, we can generalize to the population if we select a representative sample. We can generalize to other ecologys if we replicate the study and operationalize the variables in different ways. If we operationalize the variables different ways, we will have more than one specific definition for the variables and thus will avoid specificity of variables.

What is "proximal similarity" and how does this relate to the concept of external validity?

Proximal similarity refers to the degree of similarity between the elements of a study and the external elements where you wish to apply the results of the study. For example, if a study took place with students at the University of South Carolina as participants, these students would probably be proximally similar to students in the same degree programs at the University of Georgia, though they would be proximally dissimilar to students at the California Institute of Technology. As another example, consider a study conducted in the third grade in an inner-city school district. The characteristics of this class might be proximally similar to other inner-city third grade classes, but proximally dissimilar to third grade classes in suburban and rural schools. Proximal similarity can be assessed when the researcher provides us the details about the study and the participants in the study.

What are ecological generalizations?

Ecological generalizations are made when the results of a study are applied to settings and conditions outside of the study. This differs from population generalizations which occur when the results of a study are applied to individuals outside of the study. For example, applying the results obtained in one four-year college to another four-year college will involve both ecological and population generalizations. The ecological generalizations would be due to applying the results in a different setting where there is a different curriculum, different instructors, and so on. The population generalization would be due to applying the results to people who were not in the original study. Ecological generalizations are most valid when proximal similarity can be established between the study and the new setting.

Does the possibility of lying on a questionnaire threaten internal validity? If so, does it threaten content or construct validity?

Lying on a questionnaire most definitely threatens internal validity, because it threatens the validity of the instrument. Anytime the instrument is invalid, so are our study results. This is one way in which there can be an instrumentation threat to internal validity.

Be careful to refer to content and construct validity as evidences of instrument validity. Although instrument validity is a necessary condition for internal validity, instrument and internal validity are not the same. Internal validity is dependent on there being instrument validity, but it also depends on so much more.

If someone lies on a questionnaire, that would threaten the construct-related evidence of validity. The questionnaire may still have relevant questions (content-related evidence of validity), but if the person lies, then their behaviors do not correspond to what they record on the instrument.

What is statistical regression?

Statistical regression is a natural phenomenon that occurs when we select individuals to be in a study on the basis that they scored  near the top (or bottom) on an observation. Some of these top scorers achieved this score because of positive measurement errors. That is, they scored higher than their true score. Thus, we incorrectly identify these individuals as being top scorers. The next time these individuals are measured, their score will be lower than it was the first time. This pushes the mean score down. The phenomenon is often referred to as "regression to the mean" because individuals who are chosen on the basis of very low (or very high) scores will tend to score closer to the mean the next time that they are observed. This threatens internal validity because regression to the mean makes it appear as though the group is changing, even when it really isn't.

Is instrumentation only an internal validity threat in pre-post test designs? Can instrumentation also be a threat to construct validity?

"Instrumentation" refers to the method of data collection and the type of "instruments" you use in this process. If the process of collecting data is changed in some way from the pre-test to the post test (e.g., two different tests are used, the scorer does not score as carefully on the second administration of the test), then this creates a threat to internal validity. That is because observed changes from pre-test to post test are due to the changes in the instrument rather than being due to the intervention. There are other situations when instrumentation can be a problem. Consider a multi-group design with only a post test. If the groups are tested with two different tests, or if a rater scores the groups using different criteria (either intentionally or unintentionally), or if the scorer checks tests for one group first, but then is fatigued when checking the second group so (s)he checks them differently, this would create a threat to internal validity. Again, the differences in groups that might be attributed to the independent variable are, in fact, due to the differences in the use of the instrument. This is an instrumentation threat. (Note that Trochim states a different opinion. He believes that instrumentation is only a threat in pre-post test designs.)

Threats to construct validity occur when our instrument is unreliable or does not measure what we intend to measure. In some of the examples above, the procedure would threaten construct validity as well as internal validity. If raters use different criteria for different people or become fatigued when using the instrument, then the scores lack construct validity. In other examples, this is not a problem. For example, if the researcher uses different pre- and post tests, then both tests could be valid (i.e., the construct validity is good), but the different scores obtained on the tests might be due to the different tests used, rather than the intervention. That is, the tests might measure the same construct with different scales, thus yielding different scores even though both tests are valid.

URL http://edpsych.ed.sc.edu/seaman/edrm700/questions/validity.htm


This page was last modified July 22, 2004
For comments or questions contact:
mseaman@sc.edu

The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been reviewed or approved by the University of South Carolina.