Wednesday, August 07, 2013

What a Non-Effect Looks Like

(and what's wrong with the typical meta-analysis)

I've been reading the literature on whether business ethics classes have any effect on student attitudes. The literature has several features that I've come to associate with non-effects in psychology. Another literature like this is the literature attempting to find objective behavioral correlates of the subjective reports of imagery experiences. The same holds for the literature on the relationship between religiosity and moral behavior and the literature on the psychological significance of colored dreaming (back when people thought they mostly dreamed in black and white).

However, the typical review article or quantitative meta-analysis in these fields does not conclude that there is no effect. Below, I'll discuss why.

The features are:

(1.) A majority of studies show the predicted positive effects, but a substantial minority of studies (maybe a third) show no statistically significant effect.

(2.) The studies showing positive vs. negative effects don't fit into a clearly interpretable pattern -- e.g., it's not like the studies looking for X result almost all show effects while those looking for Y result do not.

(3.) Researchers reporting positive effects often use multiple measures or populations and the effect is found only for some of those measures or populations (e.g., for women but not men, for high-IQ subjects but not for low-IQ subjects, by measure A but not by measure B) -- but again not in a way that appears to replicate across studies or to have been antecedently predicted.

(4.) Little of the research involves random assignment and confounding factors readily suggest themselves (e.g., maybe participants with a certain personality or set of interests are both more likely to have taken a business ethics class and less likely to cheat in a laboratory study of cheating, an association really better explained by those unmeasured differences in personality or interest rather than by the fact that business ethics instruction is highly effective in reducing cheating).

(5.) Much of the research is done in a way that seems almost to beg the participants to confirm the hypothesis (e.g., participants are asked to report their general imagery vividness and then they are given a visual imagery task that is a transparent attempt to confirm their claims of high or low imagery vividness; or a business ethics professor asks her students to rate the wrongness of various moral scenarios, then teaches a semester's worth of classes, then asks those same students to rate the wrongness of those same moral scenarios).

(6.) There is a positive hypothesis that researchers in the area are likely to find attractive, with no equally attractive negative hypothesis (e.g., that subjective reports of imagery correlate with objective measures of imagery; or that business ethics instruction of the sort the researcher favors leads students to adopt more ethical attitudes).

The really striking thing to me about these literatures is that despite what seems likely some pretty strong positive-effect biases (features 4-6), still researchers in these areas struggle to show a consistent pattern of statistical significance.

In my mind this is the picture of a non-effect.

The typical meta-analysis will report a real effect, I think, for two reasons, one mathematical and one sociological. Mathematically, if you combine one-third null-effect studies with two-thirds positive-effect studies, you'll typically find a statistically significant effect (even with the typical "file-drawer" corrections). And sociologically, these reviews are conducted by researchers in the field, often including their own work and the work of their friends and colleagues. And who wants to devalue the work in their own little academic niche? See, for example, this meta-analysis of the business ethics literature and this one of the imagery literature.

In a way, the mathematical conclusion of such meta-analyses is correct. There is a mathematically discoverable non-chance effect underneath the patterns of findings -- the combined effects of experimenter bias, participants' tendency to confirm hypotheses they suspect the researcher is looking for, and unmeasured confounding variables that often enough align positively for positively-biased researchers to unwittingly take advantage of them. But of course, that's not the sort of positive relationship that researchers in the field are attempting to show.

For fun (my version of fun!) I did a little mock-up Monte Carlo simulation. I ran 10,000 sample experiments predicting a randomly distributed Y from a randomly distributed X, with 60 participants in each control group and 60 in each treatment group, adding two types of distortion: First, two small uncontrolled confounds in random directions (average absolute value of correlation, r = .08), and second a similarly small random positive correlation to model some positive-effect bias. (Both X and Y were normally distributed with a mean of 0 and a standard deviation of 1. The confounding correlation coefficients were chosen by randomly selecting r from a normal distribution centered at 0 with a standard deviation of 0.1; for the one positive-bias correlation, I took the absolute value.)

Even with only these three weak confounds, not all positive, and fairly low statistical power, 23% of experiments had statistically significant results at a two-tailed p value of < .05 (excluding the 5% in which the control group correlation was significant by chance). If we assume that each researcher conducts four independent tests of the hypothesis, of which only the "best", i.e., most positive, correlation is pursued and emphasized as the "money" result in publication, then 65% of researchers will report a statistically significant positive result, the average "money" correlation will be r = .28 (approaching "medium" size), and no researcher will emphasize in publication a statistically significant negative result.

Yeah, that's about the look of it.

(Slightly revised 10:35 AM.)

Update August 8th:

The Monte Carlo analysis finds 4% with a significantly negative effect. My wife asks: Wouldn't researchers publish those effects too, and not just go for the strongest positive effect? I think not, in most cases. If the effect is due to a large negative uncontrolled confound, a positive-biased researcher, prompted by the weird result, might search for negative confounds that explain the result, then rerun the experiment in a different way that avoids those hypothetized confounds, writing off the first as a quirky finding due to bad method. If the negative effect is due to chance, the positive-biased author or journal referee, seeing (say) p = .03 in the unpredicted direction, might want to confirm the "weird result" by running a follow-up -- and being due to chance it will likely be unconfirmed and the paper unpublished.


G. Randolph Mayes said...

Eric, nice piece. Are studies concluding that there is no effect substantially more likely to be published in this field than in other scientific fields? If not, then shouldn't part of our picture of a non effect consider the systematic bias toward publishing the 5% of findings likely to be due to chance?

jonathan weinberg said...

Great post, Eric!

Eric Schwitzgebel said...

Thanks for the kind comments, Randy and Jonathan!

Randy: Yes, this is the "file drawer" phenomenon. It is not uncommon for meta-analyses, following Rosenthal, to use a file-drawer correction assuming that null-effect studies are less likely to be published. This correction works if you assume that there isn't a systematic bias in the data collection *other* than the overrepresentation of statistically significant findings, e.g., it does nothing to correct for an uncontrolled positive confound throughout the literature.

G. Randolph Mayes said...

Cool, thanks. I was not familiar with that term.

jsalvati said...

When you say "Mathematically, if you combine one-third null-effect studies with two-thirds positive-effect studies, you'll typically find a statistically significant effect (even with the typical "file-drawer" corrections)."

Are you referring to "This correction works if you assume that there isn't a systematic bias in the data collection *other* than the overrepresentation of statistically significant findings, e.g., it does nothing to correct for an uncontrolled positive confound throughout the literature." or do you also have some other issue in mind?

Eric Schwitzgebel said...

The first point is just an observation about typical meta-analyses that I have seen in the literature. The second point is an explanation of why the first might be so.