The order in which moral dilemmas are presented matters to people's judgments and can substantially influence later judgments about abstract moral principles. This is true even among professional ethicists with PhD's in philosophy. In 2012 and 2015, Fiery Cushman and I published empirical evidence supporting these claims. We invite a metaphilosophical conclusion: If even professional philosophers' expert judgments are easily swayed by order of presentation, then such judgments might not be stable enough to serve as secure grounds for philosophical theorizing.
Synthese has recently published two critiques of the literature on order effects in philosophy, which address Fiery's and my work (HT Wesley Buckwalter). Both critiques make valuable points. However both also admit of some clear replies.
To fix ideas, consider two versions of the famous Trolley Problem:
Push: A runaway boxcar is headed toward five people it will kill if nothing is done. Jane can stop the boxcar by pushing a hiker with a heavy backpack in front of the boxcar, killing him but saving the five.
Switch: A runaway boxcar is headed toward five people it will kill if nothing is done. Vicki can stop the boxcar by flipping a switch to divert it to a sidetrack where it will kill one person instead of the five.
Fiery and I presented Push-type and Switch-type scenarios (fleshed with a bit more detail) to professional philosophers and two comparison groups of non-philosophers. We found that when professional philosophers saw a Push-type scenario before a Switch-type scenario, 73% rated the two scenarios equivalently on a 7-point scale. Then later in the questionnaire when asked about the Doctrine of the Double Effect -- a moral principle often interpreted implying that Push-type cases are morally worse than Switch-type cases -- only a minority, 46%, endorsed that principle. In contrast, among philosophers who saw Switch before Push only 54% rated the two scenarios equivalently, and then later a majority, 62%, endorsed the Doctrine of the Double Effect. Endorsement of the principle thus seemed to shift, post-hoc, to rationalize philosophers' order-manipulated judgments about the scenarios.
We found similar effects for Action-Omission, Moral Luck, and "Asian disease" type cases (though not consistently for every measure across the board). Philosophers with PhDs and self-reported competence or specialization in ethics showed no smaller effects than other philosophers or than comparison groups of non-philosophers -- and in fact trended slightly (non-significantly) toward showing larger order effects.
In general, we found pretty substantial effect sizes, suggesting substantial instability of judgment even in philosophical respondents' areas of expertise. Hence the metaphilosophical worry.
Critique by Zachary Horne and Jonathan Livengood.
Horne and Livengood make three main points about the literature on order effects in philosophy:
(A.) First, they helpfully distinguish between what they call "updating effects" and "genuine ordering effects". Genuine ordering effects, in their terminology, are effects measured only after all the stimuli have been presented. "Updating effects" are measures taken along the way, and might well reflect participants' learning. There is of course nothing irrational in judging Scenario B differently as a result of seeing Scenario A because one learned something by seeing Scenario A. Most philosophical research on order effects, they note, takes the measures along the way -- and thus might be measuring learning rather than true order effects.
(B.) Second, they point out that perceptual judgments also show order effects. Thus, if we are to reject any type of evidence that shows order effects, then we must reject perceptual evidence too, which would lead to radical skepticism.
(C.) Third, they point out that order can sometimes reasonably make a difference to the evaluation of evidence. For example, a smile followed by a frown, on the same person's face, is a different type of evidence than a frown followed by a smile.
On (A): I find the labels tendentious (since if we know there isn't learning-type updating going on, what we might want to call "genuine order effects" can plausibly be measured mid-stream), however it probably is correct that most studies do not sufficiently rule out the possibility of learning or updating in the course of the experiment, if they have novice participants and take the measurements after each scenario rather than after both scenarios. However, since our participants were experts, we think it unlikely that a significant number learned anything in the process of our brief experiment that would rationally justify shifting their judgment about the equivalency or non-equivalency of Push and Switch. And as Horne and Livengood note, our measure of endorsement of the Doctrine of the Double Effect is a measurement of a "genuine ordering effect" even by their own lights.
On (B): Yes, of course it would be silly to reject all means of learning that are subject to any order effects! The epistemic sting, as they note, depends not on the mere existence of an order effect in one case, but on how large and how prevalent the order effects are. This is an open empirical question. But the limited empirical evidence that exists suggests that order effects are substantial and prevalent in moral dilemma cases. So far, we have found order effects in all of the scenario types we've tried, with about a 10-20% shift in opinion on the moral equivalency of our scenario pairs and in preference for the risky option in the "Asian disease" cases.
On (C): It's interesting to consider cases in which earlier evidence rightly colors our reaction to later evidence, but trolley problems presented to disciplinary experts seems a different kind of case.
Finally, Horne and Livengood suggest that exposure to a pair of dilemmas in our study is unlikely to have a long-lasting impact on professional philosophers' beliefs. I agree. They continue, "But if there is no long-lasting impact, then we think the effect is unlikely to matter to actual philosophical practice outside of the laboratory" (p. 17). I don't think this follows. Fiery's and my view is not that philosophers' opinions are permanently influenced by the order in which the scenarios are presented on any single occasion, but rather that their opinions are unstable -- possibly influenced one direction on one occasion, in another direction on another occasion. This instability is what drives the metaphilosophical worry.
Critique by Regina Rini:
Rini -- a recent guest blogger here at the Splintered Mind -- looks only at our 2012 study. (Our 2015 study wasn't published until after her paper was in press.) She finds it plausible that if professional philosophers were already familiar with these cases they would not exhibit order effects of the sort Fiery and I find. She suggests that perhaps respondents were not previously familiar with the cases -- or at least not familiar in the right sort of way. She calls this the "familiarity problem" and offers four possible explanations:
(1.) The respondents were not really experts. She wonders if our participants, recruited through the internet, really had the degrees they claimed to have.
(2.) The respondents didn't carefully attend to our scenarios. Maybe they breezed through them so quickly that they failed to notice relevant features.
(3.) The respondents might not have familiar responses to these types of scenarios. Perhaps they have so far refrained from forming judgments on such cases and principles.
(4.) The respondents might not have diachronically stable familiar responses. This is the explanation Fiery and I favor. However, Rini helpfully points out that as long as philosophers are aware that their responses are not diachronically stable, the metaphilosophical threat is reduced: Presumably philosophers who are aware that their responses are not stable would be reluctant to ground their theorizing on those responses.
On (1): I am not aware of a general problem in the survey literature of respondents' frequently misreporting their educational status -- though certainly a bit of misreporting is possible. One specific piece of evidence against this possibility in our own study is that we recruited philosophers mostly by asking department chairs to forward a recruitment email to faculty and graduate students in their departments. Most of our "philosopher" participants took the survey within just a few days of these emails.
On (2): The median response time on the first scenario was 40 seconds, on the second scenario was 34 seconds. While these are not huge response times, if you stop to count out 34 seconds now, you'll probably notice that it's a reasonable amount of time for a thoughtful response to a brief scenario.
On (3) and (4): These are potentially quite serious issues, and in fact our follow-up study in 2015 was designed specifically to address them, after we saw an early version of Rini's critique. In our 2015 study we specifically asked participants if they were previously familiar with the scenarios. We also asked whether they regarded themselves as "having had a stable opinion" about the issues before participating in the experiment, and whether they regarded themselves as experts on those very issues. We also added a "reflection" condition to help address concern (2). In the reflection condition we asked participants to reflect carefully before responding and enforced a minimum 15-second delay between when participants reported having finished reading the scenario and when their response options appeared.
We did not find that self-reported familiarity or stability reduced the size of order effects in two different types of scenario pairs (trolley problems and risky-choice "Asian disease"-type problems), nor did we find reduced order effects in the reflection condition compared to a normal control condition without special instructions to reflect.
For example, percentage rating the Push and Switch scenarios equivalently:
Thus, I am inclined to think that Rini's fourth suggestion is the most plausible -- that participants do not have diachronically stable familiar responses, despite high levels of expertise. But since those who report having stable responses were no less subject to order effects than were those who reported not having stable responses, self-knowledge of stability appears to be largely absent. Despite Rini's interesting suggestion that instability is metaphilosophically non-threatening if people are aware of it, Fiery's and my results suggest that we should not hasten to that comfort.
Both Horne and Livengood and Rini emphasize that we only have very limited evidence about order effects on professional philosophers' judgments. I agree! Fiery's and my two studies are hardly decisive. Convergent evidence from several different labs would be necessary before drawing any confident conclusions, especially if those conclusions are at variance with what one feels one knows from personal experience. Rini also makes positive suggestions for follow-up experimental work that might be done, which I am inclined to support. Both critiques raise important methodological concerns that ought to help shape and direct future work on this topic.