Lots of psychological studies involve measuring people twice. For example, in the imagery literature, there's a minor industry that seeks to relate self-reports about imagery to performance on cognitive tasks that seem to involve visual imagery, such visual memory tests or mental rotation tasks.
(A typical mental rotation task presents two line drawings of 3-D figures and asks if one is a simple rotation of the other, for example:Image from http://www.skeptic.com here.)
Participants in such studies thus receive two tests, the cognitive test in question and also a self-report imagery test of some sort, such as the Vividness of Visual Imagery Questionnaire (VVIQ), which asks people to form various visual images and then rate their vividness. Correlations will often -- though by no means always -- be found. This will be taken to show that people with better (e.g. more vivid) imagery do in fact have more skill at the cognitive task in question.
This drives me nuts.
Reactivity between measures is, I think, a huge deal in such cases. Let me clarify by developing the imagery example a little farther.
Suppose you’re a participant in an experiment on mental imagery – an undergraduate, say, volunteering to participate in some studies to fulfill psychology course requirements. First, you’re given the VVIQ, that is, you’re asked how vivid your visual imagery is. Then, immediately afterward, you’re given a test of your visual memory – for example, a test of how many objects you can correctly recall after staring for a couple of minutes at a complex visual display. Now if I were in such an experiment and I had rated myself as an especially good visualizer when given the VVIQ, I might, when presented with the memory test, think something like this: “Damn! This experimenter is trying to see whether my imaging ability is really as good as I said it was! It’ll be embarrassing if I bomb. I’d better try especially hard.” Conversely, if I say I’m a poor visualizer, I might not put too much energy into the memory task, so as to confirm my self-report or what I take to be the experimenter’s hypothesis. Reactivity can work the other way, too, if the subjective report task is given second. Say I bomb the memory (or some other) task, then I’m given the VVIQ. I might be inclined to think of myself as a poor visualizer in part because I know I bombed the first task.
In general, participants are not passive innocents. Any time you give them two different tests, you should expect their knowledge of the first test to affect their performance on the second. Exactly how subjects will react to the second test in light of the first may be difficult to predict, but the probability of such reactivity should lead us to anticipate that, even if measures like the VVIQ utterly fail as measures of real, experienced imagery vividness, some researchers should find correlations between the VVIQ and performance on cognitive tasks. Therefore the fact that some researchers do find such correlations is no evidence at all of the reality of the posited relationship, unless there's a pattern in the correlations that could not just as easily be explained by reactivity.
In the particular case at hand, actually, I think the overall pattern of data positively suggests that reactivity is the main driving force behind the correlations. For example, to the extent there is a pattern in the relationship between the VVIQ and memory performance, the tendency is for the correlations to be higher in free recall tasks than in recognition tasks. Free recall tasks (like trying to list items in a remembered display) generally require more effort and energy from the subject than recognition tests (like “did you see this, yes or no?”) and so might be expected to show more reactivity between the measures.
The problem of reactivity between measures will plague any psychological subliterature in which participants are generally aware of being measured twice -- including much happiness research, almost any area of consciousness studies that seeks to relate self-reported experience and cognitive skills, the vast majority of longitudinal psychological studies, almost all studies on the effectiveness of psychotherapy or training programs, etc. Rarely, however, is it even given passing mention as a source of concern by people publishing in those areas.
I'm regretful about possibly adding fuel to your skeptical fire, but if you haven't seen Norbert Schwarz's discussion of context effects on self-report you might be interested-- might also be useful for other x-phi-ers looking for cautionary tales about survey methods-- see http://sitemaker.umich.edu/norbert.schwarz/self-report, especially Schwarz, N. (1999). Self-reports: How the questions shape the answers. American Psychologist, 54, 93-105.
ReplyDeleteI'd like to note, though, that even if a method may be subject to reactivity, that shouldn't be enough reason to say that it's "no evidence at all." This method could be a good-enough first step towards arguing that a pattern exists, even if it may increase your chances of finding it. I don't know this literature, but from your discussion it doesn't seem that there's evidence that these studies give *entirely* false information, just that reactivity *might* increase correlations (but even if that is true, it is far from proving that they have been created out of thin air). Reactivity might not always increase correlations, either-- for example, trying harder will make you do worse on a difficult task. Hence the need, as you and many psychologists say, for convergence of multiple methods! But I think we should see a reactivity-prone study as one imperfect piece of evidence, not false evidence.
Emma B: Thanks for your comment and for the link to the interesting article!
ReplyDeleteI was just thinking myself that saying "no evidence at all" might be overstatement. In retrospect, I find myself on the fence. Consider these two cases:
Case 1: I offer participants $5 if they will tell me they prefer vanilla ice cream to chocolate. Is the fact that some particular undergraduate accepts my offer evidence that she does prefer vanilla?
Case 2: I'm interested in whether ethicists vote more or less often than do other professors. I find that they vote about .98 times/year compared to other professors who vote about 1.05, but the difference is within sampling error (p = .12, say). Is this evidence that ethicists vote less?
It seems in both cases, there's a weak reading of "evidence" on which the cases do provide evidence. After all the undergraduate might have refused my $5, saying that she won't be paid to lie -- and maybe 5% of respondents would so refuse. And after all, there is a trend for ethicists to vote less, which might be confirmed in further study. But in neither case is there what we might think of as scientific-quality evidence -- evidence worth citing as such in a review article, evidence strong enough to even tentatively support a conclusion.
I think the situation is the same when you have a study whose effects could just as plausibly be due to reactivity as to the hypothesized effect. Of course, there will be some cases in which reactivity will be a possible explanation but you might think it is still more plausible that the effect is due to the hypothesized cause; that will be a matter of judgment and ideally there may be something in the data that favors one interpretation over the other.
Eric, I totally dig what you're saying here. I would only add that reactivity between measurements suffers from an inferential gap similar to what we see in correlations between measurement results. For example, if we state the reactivity thesis as such: One measurement yields specific results as a reaction to another measurement and its results -- there is still the problem, if we wish to see it as such, of establishing a relation of causality between the contiguous events.
ReplyDeleteCAROL BERNSTEIN SAY: I am not a schizophrenic.
ReplyDeleteI agree with that, Badda -- thanks!
ReplyDeleteCalmness, Eric, calmness; it's not all about "immoderate purchase" as regards others and the form your interest in them might *legitimately take*, some of it is about children.
ReplyDelete