Friday, August 20, 2010

Philosopher's Annual vs. Philosophical Review

Brian Leiter just revealed this year's selections for the Philosopher's Annual "attempt to pick the ten best papers of the year". I'm led to wonder: How successful are such attempts? Selection for the Philosopher's Annual is certainly prestigious -- but to what extent does selection reflect durable quality as opposed to other factors?

There is of course no straightforward way to measure philosophical quality. But here's my thought: If an article is still being cited thirty years after publication, that's at least a sign of influence. [Insert here all necessary caveats, qualifications, hesistations, and hem-haws about the relationship between quality and citation rates.]

I compared citation rates of the 30 articles appearing in the first three volumes of Philosopher's Annual (articles published 1977-1979) with the citation rates of the first ten articles published in Philosophical Review during the each of those same years. (I only included citations of the articles' original appearances, not citations of later reprints of those articles, since the latter are much harder to track. I see no reason to think this would bias the results.) To the extent the Philosopher's Annual selection committee adds valuable expertise to the process, they ought to be able to beat a dumb procedure like just selecting the first ten articles each year in a leading journal.

Evidently, however, they don't. Or at least they didn't. (Time will tell if this year's committee did any better.)

The median total citation rate in the ISI citation database was 14 for Philosopher's Annual and 18 for Philosophical Review -- that's total citations in indexed journals over the course of 30+ years (excluding self-citations), less than half a citation per year on average. (The difference in median is not statistically significant by the Mann-Whitney test, p = .72.) The median total citations since 2001 is 2.5 for Philosopher's Annual and 3.5 for Philosophical Review (again not significantly different, p = .62).

But the medians might not tell the whole story here. Look at the distribution of citation rates in the following graph.

Although the median is about the same, it looks like Philosophical Review has more articles near the median, while Philosopher's Annual has more barely-cited articles and much-cited articles. It's hard to know this for sure, though, since we're dealing with small numbers subject to chance fluctuation: Only three articles, all with Philosopher's Annual, had 100 or more citations. (My measure of the difference in spread is statistically marginal: Levene's test for equal variances on a log(x+1) transform of the data, p = .09.)

The three articles with at least 100 citations? Walton's "Fearing Fictions" (118 cites), Lewis's "Attitudes De Dicto and De Se" (224 cites), and Perry's "The Problem of the Essential Indexical" (301 cites) -- all good choices for influential articles from the late 1970s. The most cited article from the Philosophical Review list was Watson's "Skepticism about Weakness of Will" (73 cites). Worth noting: Lewis's "Attitudes De Dicto and De Se", though counted toward PA rather than PR, was actually published in Philosophical Review -- just not among the first ten articles its year. Also, skimming forward through the mid-1980s, my impressionistic sense is that it is not the case that 10% of the PA articles are as influential as the three just mentioned. So possibly the apparent difference is chance after all, at least on the upside of the curve.

Maybe those people who selected articles for Philosopher's Annual in the late 1970s were more likely both to pick swiftly-forgotten duds and to pick home-run articles, compared to an arbitrary sample from Philosophical Review. The selection process did not appear to favor quality, as measured by influence over three decades, but possibly the selection procedure added variance in quality, both on the upside and the downside. The normative implications of all this, I leave to you.

UPDATE, August 22:

Given the uncertainty about whether Philosopher's Annual shows greater variance, I couldn't resist looking at the next three years (which, it turns out, are articles published in 1980, 1981, and 1983, skipping 1982). The trend toward similar median and greater variance appears to be confirmed. On the high side, PA had three articles with 100 or more cites (R. Dworkin "What is Equality: Part 2" [426 cites], Lewis "New Work for a Theory of Universals [262], and Shoemaker "Causality and Properties" [106]), while PR had only one article in that group, the least-cited of the four (Dupre, "Natural Kinds and Biological Taxa" [104]). On the low side, PA had 6 articles cited 3 times or fewer, while PR had only 3. Here is a new graph of the spread, merging all the data from 1977-1983:

The medians of the merged data remain tied: 15.5 for Philosopher's Annual vs. 16.0 for Philosophical Review (Mann-Whitney, p = .75). And Levene's test for equal variance (on the log(x+1) transform) now shows statistical significance (p = .009).


J.Vlasits said...

Does Philosopers' Annual always publish no more than one article per journal? This year it did, and that might be a reason for the large spread. If PA is commited to honoring a wide variety of articles that are *not* from flagship journals (Journal of Philosophy, for example, got no awards), that might still be a service since it puts interesting articles in front of the eyes of people who would never have taken a look at them otherwise. For example, I imagine many philosophers of mind and perhaps ethicists who work with modality would be interested in Benthem et al.'s paper on ceteris paribus preferences but may not have come across it in the Journal of Philosophical Logic. In that case, even though it does not flag the best articles for posterity (let posterity decide that), it might bring articles to the table that would not be seen by those only reading JPhil, Philosophical Review, PPR, Nous, etc. and specialty journals.

Eric Schwitzgebel said...

J. Vlastis: Philosopher's Annual often publishes more than one article from a journal. But your point still holds: Even if, say, the typical PA article is about as important/influential/good as the typical Phil Review article, to the extent PA articles are chosen from journals outside the elite few, PA is doing a service to put them before a wider audience.

Jonathan Cohen said...

Hey Eric:

It strikes me that influence (/citation rate) and quality can come apart for all kinds of reasons. The citation info here is interesting by itself, but I am not very confident about using it to operationalize philosophical quality. So how do you operationalize philosophical quality then? I wish I had an algorithm: grading, journal refereeing, tenure letter writing, etc. would be a lot easier!

Allen Coates said...

It might be interesting to compare citation rates of PA articles with, say, the ten most cited philosophy articles of a given year. Perhaps this would be a better indicator of how well the PA picks the ten most influential articles?

Dustin Locke said...

I agree with J. Vlasits comment and Eric's response. Just to expand the point a bit more, it's pretty impressive that the ten articles chosen by Philosopher's Annual did almost as good as the ones chosen for print by Philosophical Review. One way to interpret the Philosopher's Annual in light of this result is as if they were saying something like this:

"Hey, you see these ten articles right here? Although many/most of them were not published in one of the very best journals, they are still going to be *as influential* as the articles that were."

I'd say that's a pretty good service. (Of course, we also have to worry about whether selection for PA plays any causal role in their citation rates, but I'll let others do that worrying.)

Eric Schwitzgebel said...

Jon: Granted!

Allen: That would be a pretty high bar. It might be interesting to look at those numbers, though, to see just how far short PA falls. No one, of course, is a perfect predictor, so to me the more interesting question -- and the one where the conclusion isn't foregone -- is whether the supposed expert can beat the dumb algorithm.

Dustin: I agree. But I don't want to concede *too* much here. Most of the time a good proportion of the articles *are* from the top few leading journals. And there's still the somewhat separate issue of dumb strategy vs. expert, with the experts' median performance at least not outshining the dumb strategy.

Dustin Locke said...

But of course the dumb strategy ain't that dumb. (Calling it dumb makes it sounds as if the strategy is just to pick 10 articles *at random* from all the articles published. But of course that is not what you intend.) Can't we agree to call it "parasitically intelligent"? I mean, you're picking 10 articles (essentially at random) from Phil Review, and the articles in Phil Review were certainly not chosen by a dumb strategy, but a relatively intelligent one. (Step 1: Set-up the expectation that submissions will be accepted only if quite good. Step 2: Have an expert or two review the submissions and select the best ones.) Why do you say that PA "ought to be able" to beat that kind of process?

(One more complication. Shouldn't we expect articles published in Phil Review to get more citations for the very reason that they were published in Phil Review? Of course, one might also think that PA papers will get more citations for the very reason that they were picked by PA. Perhaps these factors will off-set one another, perhaps not. Perhaps I'm looking to far into all of this. Haha.)

Wait, I have an idea: how about comparing articles published by Phil Review *and* chosen by PA to articles published by Phil Review chosen at random?

Eric Schwitzgebel said...

Dustin: Clearly, there are lots of ways this analysis could be developed or improved. It also occurs to me to use age or institutional affiliation as predictors too.

By "dumb" I just mean this: A dumb strategy doesn't require experts in the field to sweat it out trying to exercise their expert judgment. In psychology there is a bit of a literature on the extent to which simple formulas can (or cannot) beat out expert clinical judgment in matters like predicting how long a romantic relationship will endure. That's the kind of literature I had in mind, implicitly, in the background of this post.

Justin Fisher said...

Here's a hypothesis: high citation rates tend to reflect papers that are at a relatively early stage of a long debate -- i.e. controversial contributions to a big controversy; whereas Phil Annual attempts to pick papers that give especially persuasive or decisive arguments about important issues -- i.e., controversy stoppers.

If that's right, then you would predict that Phil Annual papers would end up being cited *less* than run-of-the-mill moderately flawed publications in top journals, which is exactly what you observed.

Eric, you claim to be attempting to determine how successful Phil Annual is, but it seems like your method is ill-suited to doing this. Now, if Phil Annual claimed to be selecting not the ten "best" but the ten "most likely to get a lot of citations" papers, your method would be good. But of course, if Phil Annual had the latter as their goal, then they'd pick a lot sensationally bad arguments by big-name researchers, rather than the generally quite good arguments by not-so-famous researchers, as they do. I think your criticism of Phil Annual is off base here.

Eric Schwitzgebel said...

Justin: Your hypothesis about citation rates is interesting and probably has considerable truth in it. Of course citation rates, as I acknowledge, are an imperfect measure of quality. But I very much doubt that what we're seeing in the selected articles is PA papers that are so good they are conversation stoppers and PR articles that consist of bad arguments by famous philosophers. For one thing: Even if an article is a conversation stopper, it should still be cited: As X has shown.... I can imagine, but only barely imagine, papers that put an excellent, definitive end to an existing controversy and are hardly ever cited because they do such a good job of it. Normally, to put an end to one controversy is to help lay the groundwork for another.

Dustin Locke said...

"A dumb strategy doesn't require experts in the field to sweat it out trying to exercise their expert judgment."

Yes, I figured that's what you meant. But the strategy you call "dumb" is indeed a strategy that "requires experts in the field to sweat it out trying to exercise their expert judgment" (i.e., the Phil Review referees doing their Phil Review referee thing). But maybe I'm missing something here.

Eric Schwitzgebel said...

No, Dustin, that's right. The only question is whether the PA committee adds anything valuable to that process.

Unknown said...

The selection may be useful in other ways: in our case (Marc Lange's "Tale of Two Vectors", Dialectica 63:4, 397-431), it was useful to convince Blackwell-Wiley to make the article available for free.