I think the most recent meta-analysis of the relationship between religosity and crime is still Baier and Wright 2001. I'm reviewing it again in preparation for a talk I'm giving Sunday on what happens when there's a non-effect in psychology but researchers are disposed to think there must be an effect.
I was struck by this graph from the Baier and Wright:
The authors comment:
The mean reported effect size was r = -.12 (SD = .09), and the median was r = -.11. About two-thirds of the effects fell between -.05 and -.20, and, significantly, none of them was positive. (p. 14, emphasis added).Hm, I think. No positive tail?! I'm not sure that I would interpret that fact the same way Baier and Wright seem to.
Then I think: Hey, let's try some Monte Carlos!
Baier and Wright report 79 effect sizes from previous studies, graphed above. Although the distribution doesn't look quite normal, I'll start my Monte Carlos by assuming normality, using B&W's reported mean and SD. Then I'll generate 10,000 sets of 79 random values (representating hypothetical effect sizes) normally distributed around that mean and standard deviation (SD).
Of the 10,000 simulated distributions of 79 effect sizes with that mean and SD, only 9 distributions (0.09%) are entirely zero to negative. So I think we can conclude that it's not chance that the positive tail is missing. Options are: (a.) The population mean is higher than B&W report or the SD is lower, (b.) The distribution isn't normal, (c.) The positive effect-size studies aren't being reported.
My money is on (c). But let's try (a). How high would the mean have to be (holding SD fixed) for at least 20% of the Monte Carlos to show no positive values? In my Monte Carlos it happens between mean -.18 and -.19. But the graph above is clearly not a graph of a sample from a population with that mean (which would be near the top of the fourth bar from left). This is confirmable by a t-test on the distribution of effect sizes reported in their study (one-sample vs. -.185, p < .001). Similar considerations show that it can't be an SD issue.
How about (b)? The eyeball distribution looks a bit skewed, anyway -- maybe that's the problem? The graph can be easily unskewed simply by taking the square root of the absolute values of the effect sizes. The resulting distribution is very close to normal (both eyeball and by Anderson-Darling). This delivers the desired conclusion: Only 35% of my Monte Carlos end up with even a single positive-tail study, but it delivers this result at the cost of making sense. Taking the square root magnifies the difference between very small effect sizes and diminishes the difference between large effect sizes, inflating the difference between a study with effect size r = .00 and a study with effect size r = -.02 to a larger magnitude difference than the difference between effect size r = -.30 and effect size r = -.47. (All these r's are actually present in the B&W dataset.) The two r = .00 studies in the B&W dataset become outliers far from the three r = -.02 studies in their dataset, and it's this artificial inflation of that immaterial difference that explains the seeming Monte Carlo confirmation after the square-root "correction".
So the best explanation would seem to be (c): We're not seeing the missing tail because, at least as of 2001, the research that would be expected, even if just by chance, to show even a non-significant positive relationship between religiosity and crime simply isn't published.
If researchers also show a systematic bias toward publishing their research that shows the largest negative relationship between religiosity and crime, we can even get something like Baier and Wright's distribution with a mean effect size of zero.
Here's the way I did it: I assumed that mean effect size of religiosity on crime is 0.0 and the SD for the effect size among the studies was 0.12. I assumed 100 researchers, 25% of whom only ran one independent analysis, 25% of whom ran 2 analyses, 25% of whom ran 4, and 25% of whom ran 8. I assumed that each researcher published only their "best" result (i.e., greatest negative relationship), but only if the trend was non-positive. I then ran 10,000 Monte Carlos. The average number of studies published was 80, the average published study's effect size was r = -.12, and the average SD of the effect sizes was .08.
And it wasn't too hard to find a graph like this:
I don't believe that this analysis shows that religion and crime are unrelated. I suspect they are related, if in no other way than by means of uncontrolled confounds. But I do think this analysis suggests that a non-effect plus a substantial positivity bias in publication could result in a pattern of reported effects that looks a lot like the pattern that is actually reported.
This is, of course, a file-drawer effect, and perhaps it could be corrected by a decent file-drawer analysis. But: Baier and Wright don't attempt such an analysis. And maybe more importantly: The typical Rosenthal-style file-drawer analysis assumes that the average unpublished result has an effect size of zero, whereas the effect above involves removing wrong-sign studies disproportionately often, and so couldn't be fully corrected by such an analysis.