Monday, October 13, 2008

New Version of the Moral Sense Test, Especially Designed for Philosophers

Fiery Cushman at Harvard and I are running a new version of the "Moral Sense Test", which asks respondents to make moral judgments about hypothetical scenarios. We're especially hoping to recruit people with philosophy degrees for this test so that we can compare philosophers' and non-philosophers' responses. So while I would encourage all readers of this blog to take the test (your answers, though completely anonymous, will be treasured!), I would especially appreciate it if people with graduate degrees in philosophy would take the time to complete it.

The test should take about 15-20 minutes, and people who have taken earlier versions of the Moral Sense Test have often reported it interesting to think about the kinds of moral dilemmas posed in the test.

Here's the link to the test.

(By the way, I'm off to Australia on Wednesday, and I doubt I'll have time to post to the blog between now and when I recover from my jet lag. But if you notice any problems with the test, do please email me so I can correct it immediately!)

[Update, October 14: Discussion of the test is warmly welcomed either by email or in the comments section of this post. However, if you are planning to take the test, please do so before reading the comments on this post.]

[Update, October 15: By the way, people should feel free to retake the test if they want. Just make sure you answer "yes" to the question of whether you've taken the Moral Sense Test before!]


Anonymous said...

I hope this doesn't give anything important away about the test (and if it does, feel free to delete my post), but this survey left me with the disquieting feeling that utilitarianism, deontology, and virtue ethics are all three lacking in some fundamental respect when it comes to explaining goodness/ethical behavior.

The worst part is that I can't put my finger on what makes this so disquieting. I'll defend contractarianism (esp. Rawls) pretty vigorously, but this makes me think that we're actually living less from within an ethical framework and more from some strange state of cognitive dissonance - deontology when convenient, utilitarianism when necessary, and virtue ethics to convince ourselves that we really are good people.

Eric Schwitzgebel said...

Thanks for that comment, Jake! I'd be interested to know if other readers (among those who weren't initially inclined to think we're living in a strange state of cognitive dissonance about morality) have the same reaction.

kvond said...

Jake: "this makes me think that we're actually living less from within an ethical framework and more from some strange state of cognitive dissonance - deontology when convenient, utilitarianism when necessary, and virtue ethics to convince ourselves that we really are good people."

For some - perhaps cognitively dissonant - reason, I find this wonderful. Awesome.

If I had to script a response, this perhaps would correspond to an immanent causal appreciation of ethics (an expression of the social field, the habitus), a criteria-laden problem solving, driven by a threshold of dissonance, and then at last a reflexive identity function, seeing oneself "as".

As for the nature of the dissonance that drives deontology to utilitarianism, perhaps the Aristotlean inspired standard offered by Ray Griffin and Cobb, flexing between measures of 1). Discord and 2). Unnecessary triviality.


Genius said...

It is very hard to get away from
1) the intuition that the person making the decision could not possibly reliably know the proposed facts (and that somehow that should influence the answer) and
2) that there is a reliable measure of the immediate benefits of surrendering to a terrorist threat and the long term utilitarian disadvantages of having a policy that that is the right thing to do.

Anonymous said...

Hi Eric, just thought I'd leave some feedback on the test - one question asked whether people should be differently punished when the do the exact same thing in the exact same frame of mind; that came straight after questions about whether to how much to fine drunk drivers who differ with respect to the damage they do.

That confused me - it made it seem like the question might be using some notion of 'doing the same thing' that was meant to be in common between the drunk driver who kills someone, and the drunk driver who hits a tree. Is that right? (it seemd to me the drunk drivers did morally relevant different things, and hence that fining them differently is consistent with answering 'no' to the later question).

Brandon said...

I felt that on some of the questions I lacked sufficient information -- in particular about the other people supposedly involved in the scenarios. (A common issue with quandary problems; somehow, in considering whether they should push a fat man in front of an oncoming trolley, nobody thinks to ask the fat man what his opinion is....)

Eric Schwitzgebel said...

Thanks for all the comments, folks!

Kvond: There's something in me, too, that delights in the failure of a simple normative theory to capture all the cases. A matter of philosophical taste, perhaps, underwritten by something deep in my psychology I don't understand....

Genius: I agree that there are problematic epistemic idealizations in these sorts of trolley scenarios. (In fact, a grad student of mine wrote an seminar paper on exactly this topic last year.) I don't see how to avoid these problems completely without abandoning the whole enterprise, though some scenarios are more problematic in this respect than others.

My hope is that averaging over several scenarios for each research question and having some comparison questions that vary only on one or a few dimensions, we can minimize these problems; but I agree that they can't be eliminated and present persistent problems in interpreting the data.

I happen to think that psychological studies are virtually *all* problematic in one way or another and multiply interpretable, so I tend not to believe any psychological results unless I see multiple studies using different paradigms that all point in the same direction.

Gabe: That's a good point. I'm aware of that general response to "moral luck" cases. In designing the questions we had to make compromises between simplicity and philosophical nuance. I could not think of a way to account for that possible reading of what constitutes the "doing the same thing" that didn't result in a prompt that would be confusing (and possibly also problematic in other ways), so I'm hoping that those who read the moral luck cases as you do would assume that in the sense of the question the two drunk drivers "do the same thing" with different consequences. Of course, we can't be sure how respondents did interpret the question -- always a problem with questionnaires, especially short ones like this.

Brandon: There's always a compromise between length and simplicity in scenarios of this sort. Perhaps we did not always hit the right compromise. (In the vaccine cases, especially, I think we erred in providing too little information about the patients.) We hope that people make background assumptions as bland and neutral as possible (e.g., assuming that the fat man isn't a mass-murderer, etc.) and in particular on your last point that there isn't time to have a conversation with the man about this and so no time to win his possible consent. But I agree we can't be completely sure how people are interpreting the scenarios. That's one reason we have more than one scenario for each research question.

Brandon said...

Hi, Eric,

The vaccine case was the one that was uppermost in my mind, too; it was the one that was serious enough to lead me to mention it at all. I tend to have more or less the same view of these kinds of tests you do: valuable insofar as you can see patterns, as long as those patterns seen are ones that are stable and not based on quirks; so I was expecting that there was some compensation for this.

Do you know if anyone has done a trolley problem where people were asked to put themselves in the place of one of the (potential) victims? (I noticed that you included some analogous questions here, which I think was excellent, and one that should be more widely used, since it's a key feature of moral reasoning that standard sorts of superagents-with-fiat-power-over-life-and-death dilemmas simply don't catch.)

Anonymous said...

My impression of the test was that even though I have a PhD in philosophy (albeit not with an ethics AOS), and even though I use the trolley problems and several other thought-experiments in my intro classes, emphasizing to my students the need to appeal to principles in determining matters of right and wrong, or at least to maintain some level of consistency, my answers on the test ended up being completely contradictory and without any guiding principle at all.

Anonymous said...

Eric & Genius:

First, I agree that knowing the outcome of a series of events with the certainty implied in thought experiments certainty impacts whether an action is ethical or not, but maybe it is still useful when discussing ethics to "know" in an inductive sense.

Sure, Foot and Thompson could have structured the Trolley and Fat Man differently and not explicitly stated that the five or one would die, but it doesn't seem to me that saying before the fact that you "know" they will die changes the morality of the action. Maybe Superman flies in and saves the trolley. Perhaps you push the fat man too early and he rolls out of the way. But we don't reasonably expect these things to happen and such possibilities don't factor into our decision-making process.

If one is forced to limit thought experiments to instances where one inductively knows (as opposed to reasonably expect, think it's possible, etc.) it can still be useful, I think. I feel as though thought experiments make knowledge claims like "X will die" as a sort of philosophical housecleaning. It provides clarity to examine the question at hand, rather than get caught up in "Does he really know?" or "What are her other options?" I guess my question is how can thought experiments be expanded from there?

BTW, the disquieting feeling I described yesterday kept me up all night thinking, but I've rambled enough already!

Neil said...

Actually what the test shows I think is pretty simple. We come up with the responses to moral dilemmas using two different kinds of mental systems; roughly an intuitive system and a more reflective system. They often yield incompatible results. There's nothing surprising about this: it is easy enough to design a nonmoral survey which will generate responses that conflict (that is, subjects will give responses to vignettes that vary form those they would give if they followed their endorsed principles). In the nonmoral case, it is (usually) obvious that when the intuitive system conflicts with reflective, it ought to be ignored. Ethics is harder inasmuch as this is not obvious. Why not? Because in the nonmoral case, there are usually well validated independent and convergent ways of discovering the answers, and these methods allow us to show that the reflective system is more reliable. Since we lack these independent and convergent lines of enquiry in ethics (in all except a handful of cases) what follows from the data generated is less clear.

Here's a suggestion. Since I know the work of Cushman et al, I can confidently predict that they will conclude that philosophers are not very much more consistent than the folk. And they will (I predict) have good reason to come to this conclusion. But for philosophers, their performance on the test should be further data they use, in coming to reflective equilibrium. Philosophers should insist on two versions of the test, a before and after. So Eric, what do you say to allowing us to do it again (once the data collection phase is over)?

Anibal Monasterio Astobiza said...

This test is interesting to know if "moral pushs" (intuitions) can be shaped by learning and training or reflective exercise.

Anonymous said...

I worked through a few of the questions on the test before giving up in frustration. Assuming that many people may share my frustration yet finish it anyway, I have doubts about the reliability and utility of the test's results.

I see two main types of problems. The hypothetical scenarios are underdeveloped and unrealistic, and the sorts of answers available to respondents may not accurately reflect their actual moral views -- whether reflective or intuitive.

Concerning the scenarios, most respondents' instincts in real life would probably be to avoid the dilemma if at all possible. (In the vaccine case, my response was: Are the other two people unable to discuss the situation and volunteer? And shouldn't I myself volunteer?) The scenario must be set up in such a way as to force a choice, as for me the vaccine scenario does not. Yet even if it does force a choice, respondents may not view their choice as genuinely representative of their values, because they may be dissatisfied with both options.

Concerning the responses, the most obvious example is the question asking whether the respondent tends to give priority to principles, consequences, or character in making moral judgments. My real-life answer is: I weigh them all and assign different priorities in different types of cases. I don't recall what answer I gave, but whatever I clicked, it doesn't represent my actual intuitions. The test provides no way for me to indicate what I really think.

Many of the questions ask the respondent to rate some action on a scale measuring moral goodness/badness. In some of the cases, my response would be that the action is good in some ways and bad in others. If the dilemma is genuinely tragic -- as some of these would be, I think -- then whatever one does, one's actions are bad in some respect.

Also, rating the goodness/badness of the action is different from asking what one ought to do.

After reading the "Learn" page on the test site, I am not sure I understand the aim or methodology of the test.

The test poses scenarios about which we may have deeply conflicting intuitions and asks us to rate the goodness of actions that cohere with some of these intuitions but not others. The aim seems to be to force us to assign priority to some intuitive values over others. I have doubts about how well these "forced" priority assignments reflect how people would seek to resolve analogous conflicts between values in everyday life. Perhaps most people's intuitive response would be to find a way of doing justice to both conflicting values, rather than simply sacrificing one in favor of the other.

Brandon said...

Sure, Foot and Thompson could have structured the Trolley and Fat Man differently and not explicitly stated that the five or one would die, but it doesn't seem to me that saying before the fact that you "know" they will die changes the morality of the action. Maybe Superman flies in and saves the trolley. Perhaps you push the fat man too early and he rolls out of the way. But we don't reasonably expect these things to happen and such possibilities don't factor into our decision-making process.

This may be true; but in these scenarios we are not engaging in decision-making in examining moral dilemmas, but in evaluation. (Our decision-making in real situations may or may not conform to our evaluations of those situations when we are not actually in them.) And genuine possibilities, even if they are of low probability, arguably can affect our evaluations. For instance, in actual decision-making I might dismiss the possibility that, even given that I don't see a pedestrian, I still might hit one, precisely because it is low probability; but in evaluation I might think that I should still have taken extra steps to safeguard against that possibility, particularly if this just happens to be the rare case when I actually did hit a pedestrian I didn't think was there. Insofar as we are evaluating, we are engaging in what might be called an act of moral taste, hopefully good taste; and in moral taste, as in aesthetic taste, evaluation does not consider only those things that were explicitly there to be deliberated about from the beginning, but all the features that we can recognize when we are evaluating.

I don't think this problem is insuperable; I think Eric's right that you can manage to get around it by not building anything on bits and pieces, but only on stable patterns in large amounts of evidence.

Eric Schwitzgebel said...

What a thoughtful and helpful set of comments so far! Thanks everybody!

Brandon: That's an interesting set of reflections on the difference between decision-making and evaluation. That's also an interesting idea about trying a variation with the evaluator in the position of one of the victims. We could set up a trolley scenario with "you" being the one person on the sidetrack and the agent by the switch being "Larry". Would people respond differently than they do to the standard version?

Anon 12:24 and Neil: I think it's possible that philosophers will show more consistency in their answers and between their espoused views and their answers. I'd say the jury's still out on that. Neil: Sometimes the "gut" system seems to work better, doesn't it? And I appreciate your suggestion about retaking the test. I've sent an email to Fiery about the feasibility of that.

Jake: I'm sorry this caused you worry! I'm hoping it's healthy Socratic confusion that will eventually lead to better insight! And obviously, I'm sympathetic with what you say about Superman, etc., though I concede that it's not entirely straightforward.

Thanks for the kinds words, Anibal.

Anon 4:45: I'm actually sympathetic with all the concerns you raise, and I appreciate your voicing them. I've become convinced that the vaccine case in particular is too underdescribed to force the action. Our hope is that even if individual scenarios are problematic in their setup, the overall trends across questions may still shine through. But even the best scenarios are pretty unrealistic and require some suspension of disbelief. I'm optimistic that at least for the better among the scenarios, people can do it.

I agree that the connection between answers to such scenarios and what people would do in real life is tenuous at best! I also agree that the question whether something is morally bad or good is not identical to the question about what one "ought" to do. And I agree that it would be very difficult to try to disentangle from the responses to most scenarios those impulses that are generally deontological, consequentialist, or virtue-driven. But our aim is not mainly to try to disentangle things in that way. (I don't want to reveal what our main aims really are, just yet.) I agree that the cases involve conflicting values -- that's why they're dilemmas! -- but so also do many real world cases; and in real world cases we are often forced to choose between conflicting values.

I'm sorry you felt you didn't learn much from the "Learn" page. That page was designed for the MST in general, not our specific version of it; and furthermore it would corrupt the subject pool if it gave away too much about what actually was being studied!

If you have the chance, visit this blog again in a month or two or three, where we'll be posting a bit about our results!

Brad C said...

Hi Eric,

Some comments:

First, I did not like the morally good - bad scale. There is an important distinction between something being morally permissible and its being morally good and I worry your set up does not do justice to this point. I think, in this connection, of Thomson's good samaritain discussion. I wanted in some cases to respond that it was morally permissible and was at a loss about what to put - neither good nor bad seems too weak (and confuses my view with another) and morally good was misleading because it implies approval or a willingness to commend the action.

Second, I thought the final questions put me in a forced choice situation that lacked a an option I wanted. When you ask whether killing is worse than, better than, or the same as allowing to die, for example, I want to say that it depends on other factors. It would be a challenge to spell that story out, but just the fact that I wanted to put that shows that the test might be forcing people to report results that do not reflect their actual thoughts.

Brad C said...

I should add that I thought it was very interesting to take and I will be using something like this in classes I teach in the future - what a great way to introduce the method of reflective equilib to students and to talk about what the Socratic method aimed to accomplish!

Anonymous said...

Don't worry, Eric. Totally Socratic. It actually ends up dovetailing nicely with something I've been kicking around in my head for a while. We'll see where it leads!

Eric Schwitzgebel said...

Thanks for the kind remarks, Brad and Jake.

Brad: I don't think any scale is wholly unproblematic. Permissible to impermissible seems a little weird (what would it be to be in the middle between those?) and it lacks a natural counterweight on the morally good side: impermissible - permissible - supererogatory, for example, might be seen as shifting categories in the middle. I've seen some of Fiery's other scales and I thought about the scaling issue a fair bit. In the end good - neutral - bad just seemed the least problematic.

Philosophers almost never like forced choices, in my experience. They want to add nuance. Me too. My usual methodology in dealing with philosophers is to encourage them to write in the margins, clarifying their answers, complaining about the methodology, etc. But this test is in the MST mold and I didn't want to deviate too much from standard MST methods. As a result, though, I'm inclined to use a light touch in interpreting relationships between the post-test answers and the scenario answers.

Anonymous said...

In the penalty questions I considered that one should face a greater penalty for being in the unlucky situation where one does kill a person by accident than when one, but I don't think that you are any more wrong in that situation.
I had two reasons
1) justice must be SEEN to be done
2) the state can't be expected to know exactly what risk I took - even if it thinks it does and even if I do, someone actually being killed is some evidence for there being a greater risk taken.

Not sure if the test results will recognize that nuance.

Maybe I should have just assume that the scenarios were water tight?

there is also as per brad an issue with the definition of good. if I take it as showing approval there are something that I might consider good and yet not want to approve of.


milkshake said...

Dear Eric,

Thank you for your kind invitation to participate in your moral dilemma test that you created in order to test ethical standards of people working in various fields.

I couldn't respond to your repeated invitations sooner because the Akismet automatic spam filter in Wordpress fortunately re-directed your messages into the junk folder.

I would like you to know that I am willing to push any goddamned philosopher under a streetcar if it makes the World a better place.

Neil said...

Eric, you say that sometimes the gut system works better. I doubt this can be true. If your claim is correct, than so is the following claim: upon reflection, we see that the gut system works better than the reflective system. But this claim conflicts with the original claim. So the original claim is false. All that can be true is that the gut system is reliable: it can't be true that it is more reliable than the reflective system.

Neil said...

I should add a caveat to the previous comment. It is certainly true that there are actual situations in which one would make a better decision by relying upon system 1 rather than system 2. But that is just to say that in these circumstances system 2 does not generate the result that it would in ideal circumstances (where 'ideal' is an actual state of affairs). As it happens, I'm reading Gigerenzer at the moment. He frequently makes the mistake of saying that because there are such circumstances, system 1 is more reliable than 2. But what he needs to show is that is the following: system 1 sometimes gives outputs that are at least plausibly better than those that system 2 would yield under any actual conditions. And that can't be done, for the reasons I mentioned in my last comment).

Anonymous said...

Hi Eric,

Fair enough - I am sure there are no problem free solutions to the issues I mentioned.

On the scaling issue: I was thinking this might be better because it would allow for my response without losing anything:

1) Morally commendable
2) Morally Permissible
3) Morally Neutral
4) Morally Impermissible
5) Morally Repugnant

But, again, I see the point about the difficulty in designing these things!


Eric Schwitzgebel said...

Thanks for the continuing comments, folks!

Genius: Yes, I agree that responses to the fine issue are likely not to be pure measures of blameworthiness but may also be compounded with issues about restitution, symbolism, general public policy, etc. I'm hoping that this won't make a big difference to the main research questions we have in mind, though!

Milkshake: Is that consequentialism or spite? (Or one aided by the other?) ;)

Neil: I'm not sure I entirely follow your argument. So let's say that as a matter of empirical fact in some real-world situation -- let's say to be concrete a poker table where you're trying to figure out whether your opponent is bluffing -- you're better off going with your gut which may be responding to subtle tells than you are trying to figure it out intellectually. Is your thought that if your gut is better in situations like this, ideally you should know that fact at an intellectual level and then incorporate it into your final intellectual judgment? If so, I'm not sure I disagree, except that I don't find the idealization very realistic. In a non-idealized world, sometimes we're better off going with our "guts" than with what seems right to us intellectually. (I make no claims here about relative proportion.) It's only that relatively weaker claim I meant to make.

Brad: That's an interesting scale, but I think it too raises issues. For example, I'm not sure about the difference between "permissible" and "neutral". It seems something can be permissible because it is neutral and so "permissible" does not necessarily imply anything better than neutral. Also "commendable" and "repugnant" may not quite be opposites: "Commendable" seems on its face to have something to do with what merits praise, while "repugnant" seems to have to do with what generates a certain kind of moral-aesthetic reaction.

The core problem, I think, is that there are multiple normative moral dimensions that don't entirely align, and evaluations that are on the face of it on a single dimension nevertheless somehow seem to change character in a subtle way when they cross from one side of neutrality to the other.

As a result, I think there is no perfect scale. So I remain convinced that good - neutral - bad is the best way to go, given its simplicity, thinness, and relative lack of assumptions.

kvond said...

Eric: " I remain convinced that good - neutral - bad is the best way to go, given its simplicity, thinness, and relative lack of assumptions."

I agree that this is a fundamental axis of judgement. But one also, as much as it pain me, acknowledge that there is an additional, or parallel axis: good - neutral - evil. I think no ethical/moral scale can be complete without it. If forced (as these tests like to do) to use the good/evil scale, I am unsure if the answers would match the good/bad scale. It would be interesting to have to rate decisions along both axes.

Anonymous said...

Don't forget lawful - neutral - chaotic. And, of course, no one really knows what to do with so-called 'true neutral'... but who really wants a druid as their player character anyway?

Eric Schwitzgebel said...

Kvond: I'm inclined to think the term "evil" is probably too laden to be a useful measure in this sort of test -- or at least, if we're going to have only one or two measures, I'd probably go for good-neutral-bad and/or something like required-permitted-forbidden.

Anon 12:03: But will I get an XP bonus if I act consonantly with my alignment?