The Splintered Mind: moral psychology

Showing posts with label moral psychology. Show all posts

Thursday, February 13, 2025

Imagining Yourself in Another's Shoes vs Extending Your Concern

I have a new article out today, "Imagining Yourself in Another's Shoes vs. Extending Your Concern: Empirical and Ethical Differences". It's my case against the "Golden Rule" and against attempts to ground moral psychology in "imagining yourself in another's shoes", in favor of an alternative idea, inspired by the ancient Chinese philosopher Mengzi, that involves extending one's concern for nearby others to more distant others.

My thought is not that Golden Rule / others' shoes thinking is bad, exactly, but that both empirically and ethically, Mengzian extension is better. The key difference is: In Golden Rule / others' shoes thinking, moral expansion involves extending self-concern to other people, while in Mengzian extension, moral expansion involves extending concern for nearby others to more distant others.

We might model Others' Shoes / Golden Rule thinking as follows:

* If I were in the situation of Person X, I would want to be treated in manner M.
* Golden Rule: do unto others as you would have others do unto you.
* Thus, I will treat Person X in manner M.

We might model Mengzian Extension as follows:

* I care about Person Y and want W for them.
* Person X, though more distant, is relevantly similar to Person Y.
* Thus, I want W for Person X.

Alternative and more complex formulations are possible, but this sketch captures the core difference. Mengzian Extension grounds general moral concern on the natural concern we already have for others close to us, whether spatially close, like a nearby suffering animal or child in danger, or relationally close, like a close relative. In contrast, the Golden Rule grounds general moral concern on concern for oneself.

[Mengzi; image source, cropped]

An Ethical Objection:

While there's something ethically admirable about seeing others as like oneself and thus as deserving the types of treatment one would want for oneself, there's also something a bit... self-centered? egoistic?... about habitually grounding moral action through the lens of hypothetical self-interest. It's ethically purer and more admirable, I suggest, to ground our moral thinking from the beginning in concern for others.

A Developmental/Cognitive Objection:

Others' Shoes thinking introduces needless cognitive challenges: To use it correctly, you must determine what you would want if you were in the other's position and if you had such-and-such different beliefs and desires. But how do you assess which desires (and beliefs, and emotions, and personality traits, and so on) to change and which to hold constant for this thought experiment? Moreover, how do you know how you would react in such a hypothetical case? By routing the epistemic task through a hypothetical self-transformation, it potentially becomes harder to know or justify a choice than if the choice is based directly on knowledge of the other's beliefs, desires, or emotions. In extreme cases, there might not even be facts to track: What treat would you want if you were a prize-winning show poodle?

Mengzian Extension presents a different range of cognitive challenges. It requires recognizing what one wants for nearby others, and then reaching a judgment about whether more distant others are relevantly similar. This requires generalizing beyond nearby cases based on an assessment of what do and do not constitute differences that are relevant to the generalization. Although this is potentially complex and demanding, it avoids the convoluted hypothetical situational and motivational perspective-taking required by Others' Shoes thinking.

A Practical Objection:

Which approach more effectively expands moral concern to appropriate targets? If you want to convince a vicious king to be kinder to his people, is it more effective to encourage him to imagine being a peasant, or is it more effective to highlight the similarities between people he already cares about and those who are farther away? If you want to encourage donations to famine relief, is it better to ask people how they would feel if they were starving, or to compare distant starving people to nearby others the potential donor already cares about?

Armchair reflections and some limited empirical evidence (e.g., from my recent study with Kirstan Brodie, Jason Nemirow, and Fiery Cushman) suggest that across an important range of cases, Mengzian extension might be more effective -- though the question has not been systematically studied.

More details, of course, in the full paper.

Friday, June 28, 2024

Is the World Morally Well Ordered? Secular Versions of the "Problem of Evil"

Since 2003, I've regularly taught a large lower-division class called "Evil", focusing primarily on the moral psychology of evil (recent syllabus here). We conclude by discussing the theological "problem of evil" -- the question of whether and how evil and suffering are possible given an omnipotent, omniscient, benevolent God. Over the years I've been increasingly intrigued by a secular version of this question.

I see the secular "problem of evil" as this: Although no individual or collective has anything close to the knowledge or power of God as envisioned in mainstream theological treatments, the world is not wholly beyond our control; so there's at least some possibility that individuals and collectives can work toward making the world morally well ordered in the sense that the good thrive, the evil suffer, justice is done, and people get what they deserve. So, how and to what extent is the world morally well ordered? My aim today is to add structure to this question, rather than answer it.

(1.) We might first ask whether it would in fact be good if the world were morally well-ordered. One theological response to the problem of evil is to argue no. A world in which God ensured perfect moral order would be a world in which people lacked the freedom to make unwise choices, and freedom is so central to the value of human existence that it's overall better that we're free and suffer than that we're unfree but happy.

A secular analogue might be: A morally well-ordered world would, or might, require such severe impingements on our freedom as to not be worth the tradeoff. It might, for example, require an authoritarian state that rewards, punishes, monitors, and controls in a manner that -- even if it could accurately sort the good from the bad -- fundamentally violates essential liberties. Or it might require oppressively high levels of informal social control by peers and high-status individuals, detecting and calling out everyone's moral strengths and weaknesses.

(2.) Drawing from the literature on "immanent justice" -- with literary roots in, for example, Shakespeare and Dostoyevsky -- we might consider plausible social and psychological mechanisms of moral order. In Shakespeare's Macbeth, one foul deed breeds another and another -- partly to follow through on and cover up the first and partly because one grows accustomed to evil -- until the evil is so extreme and pervasive that the revulsion and condemnation of others becomes inevitable. In Dostoyevsky's Crime and Punishment, Raskolnikov torments himself with fear, guilt, and loss of intimacy (since he has a life-altering secret he cannot share with most others in his life), until he unburdens himself with confession.

We can ask to what extent it's true that such social and psychological mechanisms cause the guilty to suffer. Is it actually empirically correct that those who commit moral wrongs end up unhappy as a result of guilt, fear, social isolation, and the condemnation of others? I read Woody Allen's Crimes and Misdemeanors as arguing the contrary, portraying Judah as overall happier and better off as a result of murdering his mistress.

(3.) Drawing from the literature on the goodness or badness of "human nature", we can ask to what extent people are naturally pleased by their own and others' good acts and revolted by their own and others' evil. I find the ancient Chinese philosopher Mengzi especially interesting on this point. Although Mengzi acknowledges that the world isn't perfectly morally ordered ("an intent noble does not forget he may end up in a ditch; a courageous noble does not forget he may lose his head"; 3B1), he generally portrays the morally good person as happy, pleased by their own choices, and admired by others -- and he argues that our inborn natures inevitably tend this direction if we are not exposed to bad environmental pressures.

(4.) We can explore the extent to which moral order is socially and culturally contingent. It is plausible that in toxic regimes (e.g., Stalinist Russia) the moral order is to some extent inverted, the wicked thriving and the good suffering. We can aspire to live in a society where, in general -- not perfectly, of course! -- moral goodness pays off, perhaps through ordinary informal social mechanisms: "What goes around comes around." We can consider what structures tend to ensure, and what structures tend to pervert, moral order.

Then, knowing this -- within the constraints of freedom and given legitimate diversity of moral opinion (and the lack of any prospect for a reliable moralometer) -- we can explore what we as individuals, or as a group, might do to help create a morally better ordered world.

[Dall-E interpretation of a moralometer sorting angels and devils, punishing the devils and rewarding the angels]

Friday, February 16, 2024

What Types of Argument Convince People to Donate to Charity? Empirical Evidence

Back in 2020, Fiery Cushman and I ran a contest to see if anyone could write a philosophical argument that convinced online research participants to donate a surprise bonus to charity at rates statistically above control. (Chris McVey, Josh May, and I had failed to write any successful arguments in some earlier attempts.) Contributions were not permitted to mention particular real people or events, couldn't be narratives, and couldn't include graphics or vivid descriptions. We wanted to see whether relatively dry philosophical arguments could move people to donate.

We received 90 submissions (mostly from professional philosophers, psychologists, and behavioral economists, but also from other Splintered Mind readers), and we selected 20 that we thought represented a diversity of the most promising arguments. The contest winner was an argument written by Matthew Lindauer and Peter Singer, highlighting that a donation of $25 can save a child in a developing country from going blind due to trachoma, then asking the reader to reflect on how much they would be willing to donate to save their own child from going blind. (Full text here.)

Kirstan Brodie, Jason Nemirow, Fiery, and I decided to follow up by testing all 90 submitted arguments to see what features were present in the most effective arguments. We coded the arguments according to whether, for example, they mentioned children, or appealed to religion, or mentioned the reader's assumed own economic good fortune, etc. -- twenty different features in all. We recruited approximately 9000 participants. Each participant had a 10% chance of winning a surprise bonus of $10. They could either keep the whole $10 or donate some portion of it to one of six effective charities. Participants decided whether to donate, and how much, before knowing if they were among the 10% receiving the $10.

Now, unfortunately, proper statistical analysis is complicated. Because we were working with whatever came in, we couldn't balance argument features, most arguments had multiple coded features, and the coded features tended to correlate between submissions. I'll share a proper analysis of the results later. Today I'll share a simpler analysis. This simple analysis looks at the coded features one by one, comparing the average donation among the set of arguments with the feature to average donation among the set of arguments without the feature.

There is something to be said, I think, for simple analysis even when they aren't perfect: They tend to be easier to understand and to have fewer "researcher degrees of freedom" (and thus less opportunity for p-hacking). Ideally, simple and sophisticated statistical analyses go hand-in-hand, telling a unified story.

So, what argument features appear to be relatively more versus less effective in motivating charitable giving?

Here are our results, from highest to lowest difference in mean donation. "diff" is the dollar difference in mean donation, N is the number of participants who saw an argument with that feature, n is the number of arguments containing that feature, and p is the statistical p-value in a two-sample t test (without correction for multiple comparisons). All analyses are tentative, pending double-checking, skeptical examination, and possibly some remaining data clean-up.

Predictive Argument Features, Highest to Lowest

Does the argument appeal to the notion of equality?
$3.99 vs $3.39 (diff = $.60, N = 395, n = 4, p < .001)

... mention human evolutionary history?
$3.93 vs $3.39 (diff = $.55, N = 4940, n = 5, p < .001)

... specifically mention children?
$3.76 vs $3.26 (diff = $.49, N = 4940, n = 27, p < .001)

... mention a specific, concrete benefit to others that $10 or a similar amount would bring (e.g., 3 mosquito nets or a specific inexpensive medical treatment)?
$3.75 vs $3.44 (diff = $.41, N = 1718, n = 17, p < .001)

... appeal to the diminishing marginal utility of dollars kept by (rich) donors?
$3.69 vs $3.29 (diff = $.40, N = 2843, n = 27, p < .001)

... appeal to the massive marginal utility of dollars transferred to (poor) recipients?
$3.65 vs $3.25 (diff = $.40, N = 3758, n = 36, p < .001)

... mention, or ask the participant to bring to mind, a particular person who is physically or emotionally near to them?
$3.74 vs $3.34 (diff = $.34, N = 318, n = 3, p = .061)

... mention particular needs or hardships such as clean drinking water or blindness?
$3.56 vs $3.23 (diff = $.30, N = 4940, n = 49, p < .001)

... refer to the reader's own assumed economic good fortune?
$3.58 vs $3.31 (diff = $.27, N = 3544, n = 35, p < .001)

... focus on one, single issue? (e.g. trachoma)
$3.61 vs $3.40 (diff = $.21, N = 800, n = 8, p = .07)

... remind people that giving something is better than nothing? (i.e. corrective for drop-in-the-bucket thinking)
$3.56 vs $3.40 (diff = $.15, N = 595, n = 6, p = .24)

... appeal to the views of experts (e.g. philosophers, psychologists)?
$3.47 vs $3.39 (diff = $.07, N = 2629, n = 27, p = .29)

... reference specific external sources such as news reports or empirical studies?
$3.47 vs $3.40 (diff = $.07, N = 1828, n = 18, p = .41)

... explicitly mention that donation is common?
$3.46 vs $3.41 (diff = $.05, N = 736, n = 7, p = .66)

... appeal to the notion of randomness/luck (e.g., nobody chose the country they were born in)?
$3.43 vs $3.41 (diff = $.02, N = 1403, n = 14, p = .80)

... mention religion?
$3.35 vs $3.42 (diff = -$.07, N = 905, n = 9, p = .48)

... appeal to veil-of-ignorance reasoning or other perspective-taking thought experiments?
$3.29 vs $3.23 (diff = -$.14, N = 4940, n = 8, p = .20)

... mention that giving could inspire others to give? (i.e. spark behavioral contagion)
$3.29 vs $3.43 (diff = -$.14, N = 896, n = 9, p = .20)

... explicitly mention and address specific counterarguments?
$3.29 vs $3.45 (diff = -$.15, N = 1829, n = 19, p = .048)

... appeal to the self-interest of the participant?
$3.22 vs $3.49 (diff = -$.30, N = 2604, n = 22, p < .001)

From this analysis, several argument features appear to be effective in increasing participant donations:

mentioning children and appealing to the equality of all people,

mentioning concrete benefits (one or several),

mentioning the reader's assumed economic good fortune and the relatively large impact of a relatively small sacrifice (the "margins" features), and

mentioning evolutionary history (e.g., theories that human beings evolved to care more about near others than distant others).

Mentioning a particular near person might also have been effective, but since only three arguments were coded in this category, statistical power was poor.

In contrast, appealing to the participant's self-interest (e.g., that donating will make them feel good) appears to have backfired. Mentioning and addressing counterarguments to donation (e.g., responding to concerns that donations are ineffective or wasted) might also have backfired.

Now I don't think we should take these results wholly at face value. For example, only five of the ninety arguments appealed to evolutionary history, and all of those arguments included at least two other seemingly effective features: particular hardships, margins, or children. In multiple regression analyses and multi-level analyses that explore how the argument features cluster, it looks like particular hardships, children, and margins might be more robustly predictive -- more on that in a future post. ETA (Feb 19): Where the n < 10 arguments, effects are unlikely to be statistically robust.

What if we combine argument features? There are various ways to do this, but the simplest is to give an argument one point for any of the ten largest-effect features, then perform a linear regression. The resulting model has an intercept of $3.09 and a slope of $.13. Thus, the model predicts that participants who read arguments with none of these features will donate $3.09, while participants who read a hypothetical argument containing all ten features will donate $4.39.

Further analysis also suggests that piling up argument features is cumulative: Arguments with at least six of the effective features generated mean donations of $3.89 (vs. $3.37), those with at least seven generated mean donations of $4.46 (vs. $3.38), and the one argument with eight of the ten effective features generated a mean donation of $4.88 (vs. $3.40) (all p's < .001). This eight-feature argument was, in fact, the best performing argument of the ninety. (However, caution is warranted concerning the estimated effect size for any particular argument: With approximately only 100 participants per argument and a standard deviation of about $3, the 95% confidence intervals for the effect size of individual arguments are about +/- $.50.)

------------------------------------------------------

Last month, I articulated and defended the attractiveness of moral expansion through Mengzian extension. On my interpretion of the ancient Chinese philosopher Mengzi, expansion of one's moral perspective often (typically?) begins with noticing how you react to nearby cases -- whether physically nearby (a child in front of you, about to fall into a well) or relationally nearby (your close family members) -- and proceeds by noticing that remote cases (distant children, other people's parents) are similar in important respects.

None of the twenty coded features captured exactly that. ("Particular near person" was close, but neither necessary nor sufficient: not necessary, because the coders used a stringent standard for when an argument invoked a particular near person, and not sufficient since invoking a particular near person is only the first step in Mengzian extension.) So I asked UCR graduate student Jordan Jackson, who studies Chinese philosophy and with whom I've discussed Mengzian extension, to read all 90 arguments and code them for whether they employed Mengzian extension style reasoning. He found six that did.

In accord with my hypothesis about the effectiveness of Mengzian extension, the six Mengzian extension arguments outperformed the arguments that did not employ Mengzian extension:

$3.85 vs $3.38 (diff = $.47, N = 612, n = 6, p < .001)

Among those six arguments are both the 2020 original contest winner written by Lindauer and Singer and also the best-performing argument in the present study -- though as noted earlier, the best-performing argument in the current study also had many other seemingly effective features.

In case you're curious, here's the full text of that argument, adapted by Alex Garinther, and quoting extensively, from one of the stimuli in Lindauer et al. 2020

HEAR ME OUT ON SOMETHING. The explanation below is a bit long, but I promise reading the next few paragraphs will change you.
As you know, there are many children who live in conditions of severe poverty. As a result, their health, mental development, and even their lives are at risk from lack of safe water, basic health care, and healthy food. These children suffer from malnutrition, unsanitary living conditions, and are susceptible to a variety of diseases. Fortunately, effective aid agencies (like the Against Malaria Foundation) know how to handle these problems; the issue is their resources are limited.
HERE'S A PHILOSOPHICAL ARGUMENT: Almost all of us think that we should save the life of a child in front of us who is at risk of dying (for example, a child drowning in a shallow pond) if we are able to do so. Most people also agree that all lives are of equal moral worth. The lives of faraway children are no less morally significant than the lives of children close to us, but nearby children exert a more powerful emotional influence. Why?
SCIENTISTS HAVE A PLAUSIBLE ANSWER: We evolved in small groups in which people helped their neighbors and were suspicious of outsiders, who were often hostile. Today we still have these “Us versus Them” biases, even when outsiders pose no threat to us and could benefit enormously from our help. Our biological history may predispose us to ignore the suffering of faraway people, but we don't have to act that way.
By taking money that we would otherwise spend on needless luxuries and donating it to an effective aid agency, we can have a big impact. We can provide safe water, basic health care, and healthy food to children living in severe poverty, saving lives and relieving suffering.
Shouldn't we, then, use at least some of our extra money to help children in severe poverty? By doing so, we can help these children to realize their potential for a full life. Great progress has been made in recent years in addressing the problem of global poverty, but the problem isn't being solved fast enough. Through charitable giving, you can contribute towards more rapid progress in overcoming severe poverty.
Even a donation $5 can save a life by providing one mosquito net to a child in a malaria-prone area. FIVE DOLLARS could buy us a large cappuccino, and that same amount of money could be used to save a life.

Thursday, January 25, 2024

Imagining Yourself in Another's Shoes vs. Extending Your Concern: Empirical and Ethical Differences

[new paper in draft]

The Golden Rule (do unto others as you would have others do unto you) isn't bad, exactly -- it can serve a valuable role -- but I think there's something more empirically and ethically attractive about the relatively underappreciated idea of "extension" found in the ancient Chinese philosopher Mengzi.

The fundamental idea of extension, as I interpret it, is to notice the concern one naturally has for nearby others -- whether they are relationally near (like close family members) or spatially near (like Mengzi's child about to fall into a well or Peter Singer's child you see drowning in a shallow pond) -- and, attending to relevant similarities between those nearby cases and more distant cases, to extend your concern to the more distant cases.

I see three primary advantages to extension over the Golden Rule (not that these constitute an exhaustive list of means of moral expansion!).

(1.) Developmentally and cognitively, extension is less complex. The Golden Rule, properly implemented, involves imagining yourself in another's shoes, then considering what you would want if you were them. This involves a non-trivial amount of "theory of mind" and hypothetical reasoning. You must notice how others' beliefs, desires, and other mental states relevantly differ from yours, then you must imagine yourself hypothetically having those different mental states, and then you must assess what you would want in that hypothetical case. In some cases, there might not even be a fact of the matter about what you would want. (As an extreme example, imagine applying the Golden Rule to an award-winning show poodle. Is there a fact of the matter about what you would want if you were an award winning show poodle?) Mengzian extension seems cognitively simpler: Notice that you are concerned about nearby person X and want W for them, notice that more distant person Y is relevantly similar, and come to want W for them also. This resembles ordinary generalization between relevant cases: This wine should be treated this way, therefore other similar wines should be treated similarly; such-and-such is a good way to treat this person, so such-and-such is probably also a good way to treat this other similar person.

(2.) Empirically, extension is a more promising method for expanding one's moral concern. Plausibly, it's more of a motivational leap to go from concern about self to concern about distant others (Golden Rule) than to go from concern from nearby others to similar more distant others (Mengzian Extension). When aid agencies appeal for charitable donations, they don't typically ask people to imagine what they would want if they were living in poverty. Instead, they tend to show pictures of children, drawing upon our natural concern for children and inviting us to extend that concern to the target group. Also -- as I plan to discuss in more detail in a post next month -- in the "argument contest" Fiery Cushman and I ran back in 2020, the arguments most successful in inspiring charitable donation employed Mengzian extension techniques, while appeals to "other's shoes" style reasoning did not tend to predict higher levels of donation than did the average argument.

(3.) Ethically, it's more attractive to ground concern for distant others in the extension of concern for nearby others than in hypothetical self-interest. Although there's something attractive about caring for others because you can imagine what you would want if you were them, there's also something a bit... self-centered? egoistic? ... about grounding other-concern in hypothetical self-concern. Rousseau writes: "love of men derived from love of self is the principle of human justice" (Emile, Bloom trans., p. 235). Mengzi or Confucius would never say this! In Mengzian extension, it is ethically admirable concern for nearby others that is the root of concern for more distant others. Appealingly, I think, the focus is on broadening one's admirable ethical impulses, rather than hypothetical self-interest.

[ChatGPT4's rendering of Mengzi's example of a child about to fall into a well, with a concerned onlooker; I prefer Helen De Cruz's version]

My new paper on this -- forthcoming in Daedalus -- is circulating today. As always, comments, objections, corrections, connections welcome, either as comments on this post, on social media, or by email.

Abstract:

According to the Golden Rule, you should do unto others as you would have others do unto you. Similarly, people are often exhorted to "imagine themselves in another's shoes." A related but contrasting approach to moral expansion traces back to the ancient Chinese philosopher Mengzi, who urges us to "extend" our concern for those nearby to more distant people. Other approaches to moral expansion involve: attending to the good consequences for oneself of caring for others, expanding one's sense of self, expanding one's sense of community, attending to others' morally relevant properties, and learning by doing. About all such approaches, we can ask three types of question: To what extent do people in fact (e.g., developmentally) broaden and deepen their care for others by these different methods? To what extent do these different methods differ in ethical merit? And how effectively do these different methods produce appropriate care?

Tuesday, November 07, 2023

The Prospects and Challenges of Measuring Morality, or: On the Possibility or Impossibility of a "Moralometer"

Could we ever build a "moralometer" -- that is, an instrument that would accurately measure people's overall morality? If so, what would it take?

Psychologist Jessie Sun and I explore this question in our new paper in draft: "The Prospects and Challenges of Measuring Morality".

Comments and suggestions on the draft warmly welcomed!

Draft available here:

https://osf.io/preprints/psyarxiv/nhvz9

Abstract:

The scientific study of morality requires measurement tools. But can we measure individual differences in something so seemingly subjective, elusive, and difficult to define? This paper will consider the prospects and challenges—both practical and ethical—of measuring how moral a person is. We outline the conceptual requirements for measuring general morality and argue that it would be difficult to operationalize morality in a way that satisfies these requirements. Even if we were able to surmount these conceptual challenges, self-report, informant report, behavioral, and biological measures each have methodological limitations that would substantially undermine their validity or feasibility. These challenges will make it more difficult to develop valid measures of general morality than other psychological traits. But, even if a general measure of morality is not feasible, it does not follow that moral psychological phenomena cannot or should not be measured at all. Instead, there is more promise in developing measures of specific operationalizations of morality (e.g., commonsense morality), specific manifestations of morality (e.g., specific virtues or behaviors), and other aspects of moral functioning that do not necessarily reflect moral goodness (e.g., moral self-perceptions). Still, it is important to be transparent and intellectually humble about what we can and cannot conclude based on various moral assessments—especially given the potential for misuse or misinterpretation of value-laden, contestable, and imperfect measures. Finally, we outline recommendations and future directions for psychological and philosophical inquiry into the development and use of morality measures.

[Below: a "moral-o-meter" given to me for my birthday a few years ago, by my then-13-year-old daughter]

Friday, September 15, 2023

Walking the Walk: Frankness and Social Proof

My last two posts have concerned the extent to which ethicists should "walk the walk" -- that is, live according to, or at least attempt to live according to, the ethical principles they espouse in their writing and teaching. According to "Schelerian separation", what ethicists say or write can and should be evaluated independently of facts about the ethicist's personal life. While there are some good reasons to favor Schelerian separation, I argued last week that ethical slogans ("act on that maxim you can at the same time will to be a universal law", "maximize utility") will tend to lack specific, determinate content without a context of clarifying examples. One's own life can be a rich source of content-determining examples, while armchair reflection on examples tends to be impoverished.

Today, I'll discuss two more advantages of walking the walk.

[a Dall-E render of "walking the walk"]

Frankness and Belief

Consider scientific research. Scientists don't always believe their own conclusions. They might regard their conclusions as tentative, the best working model, or just a view with enough merit to be worth exploring. But if they have doubt, they ought to be unsurprised if their readers also have doubt. Conversely, if a reader learns that a scientist has substantial doubts about their own conclusions, it's reasonable for the reader to wonder why, to expect that the scientist is probably responding to limitations in their own methods and gaps in their own reasoning that might be invisible to non-experts.

Imagine reading a scientific article, finding the conclusion wholly convincing, and then learning that the scientist who wrote the article thinks the conclusion is probably not correct. Absent some unusual explanation, you’ll probably want to temper your belief. You’ll want to know why the scientist is hesitating, what weaknesses and potential objections they might be seeing that you have missed. It’s possible that the scientist is simply irrationally unconvinced by their own compelling reasoning; but that’s presumably not the normal case. Arguably, readers of scientific articles are owed, and reasonably expect, scientific frankness. Scientists who are not fully convinced by their results should explain the limitations that cause them to hesitate. (See also Wesley Buckwalter on the "belief norm of academic publishing".)

Something similar is true in ethics. If Max Scheler paints a picture of a beautiful, ethical, religious way of life which he personally scorns, it's reasonable for the reader to wonder why he scorns it, what flaws he sees that you might not notice in your first read-through. If he hasn't actually tried to live that way, why not? If he has tried, but failed, why did he fail? If a professional ethicist argues that ethically, and all things considered, one should be a vegetarian, but isn't themselves a vegetarian and has no special medical or other excuse, it's reasonable for readers and students to wonder why not and to withhold belief until that question is resolved. People are not usually baldly irrational. It's reasonable to suppose that there's some thinking behind their choice, which they have not yet revealed readers and students, which tempers or undercuts their reasoning.

As Nomy Arpaly has emphasized in some of her work, our gut inclinations are sometimes wiser than our intellectual affirmations. The student who says to herself that she should be in graduate school, that academics is the career for her, but who procrastinates, self-sabotages, and hates her work – maybe the part of her that is resisting the career is the wiser part. When Huck Finn tells himself that the right thing to do is to turn in his friend, the runaway slave Jim, but can't bring himself to do it – again, his inclinations might be wiser than his explicit reasoning.

If an ethicist's intellectual arguments aren't penetrating through to their behavior, maybe there's a good reason. If you can't, or don't, live what you intellectually endorse, it could be because your intellectual reasoning is leaving something important out that the less intellectual parts of you rightly refuse to abandon. Frankness with readers enables them to consider this possibility. Conversely, if we see someone who reasons to a certain ethical conclusion, and their reasoning seems solid, and then they consistently live that way without tearing themselves apart with ambivalence, we have less grounds for suspecting that their gut might be wisely fighting against flaws their academic reasoning than we do when we see someone who doesn’t walk the walk.

What is it to believe that eating meat is morally wrong (or any other ethical proposition)? I favor a dispositionalist approach (e.g., here, here, here). It is in part to be disposed to say and intellectually judge that eating meat is morally wrong. But more than that, it is to give weight to the avoidance of meat in your ethical decision-making. It is to be disposed to feel you have done something wrong if you eat meat for insufficient reason, maybe feeling guilt or shame. It is to feel moral approval and disapproval of others' meat-avoiding or meat-eating choices. If an ethicist intellectually affirms the soundness of arguments for vegetarianism but lacks the rest of this dispositional structure, then (on the dispositionalist view I favor) they don't fully or determinately believe that eating meat is ethically wrong. Their intellectually endorsed positions don't accurately reflect their actual beliefs and values. This completes the analogy with the scientist who doesn't believe their own conclusions.

Social Proof

Somewhat differently, an ethicist's own life can serve as a kind of social proof. Look: This set of norms is livable – maybe appealing so, with integrity. Things don't fall apart. There's an implementable vision, which other people could also follow. Figures like Confucius, Buddha, and Jesus were inspiring in part because they showed what their slogans amounted to in practice, in part because they showed that real people could live in something like the way they themselves lived, and in part because they also showed how practically embodying the ethics they espoused could be attractive and fulfilling, at least to certain groups of people.

Ethical Reasons to Walk the Walk?

I haven't yet discussed ethical reasons for walking the walk. So far, the focus has been epistemology, philosophy of language, and philosophy of mind. However, arguing in favor of certain ethical norms appears to involve recommending that others adhere to those norms, or at least be partly motivated by those norms. Making such a recommendation while personally eschewing those same norms plausibly constitutes a failure of fairness, equity, or universalization – the same sort of thing that rightly annoys children when their parents or teachers say "do as I say, not as I do". More on this, I hope, another day.

Friday, September 01, 2023

Does It Matter If Ethicists Walk the Walk?

The Question: What's Wrong with Scheler?

There's a story about Max Scheler, the famous early 20th century Catholic German ethicist. Scheler was known for his inspiring moral and religious reflections. He was also known for his horrible personal behavior, including multiple predatory sexual affairs with students, sufficiently serious that he was banned from teaching in Germany. When a distressed admirer asked about the apparent discrepancy, Scheler was reportedly untroubled, replying, "The sign that points to Boston doesn't have to go there."

[image modified from here and here]

That seems like a disappointing answer! Of course it's disappointing when anyone behaves badly. But it seems especially bad when an ethical thinker goes astray. If a great chemist turns out to be a greedy embezzler, that doesn't appear to reflect much on the value of their chemical research. But when a great ethicist turns out to be a greedy embezzler, something deeper seems to have gone wrong. Or so you might think -- and so I do actually think -- though today I'm going to consider the opposite view. I'll consider reasons to favor what I'll call Schelerian separation between an ethicist's teaching or writing and their personal behavior.

Hypocrisy and the Cheeseburger Ethicist

A natural first thought is hypocrisy. Scheler was, perhaps, a hypocrite, surreptitiously violating moral standards that he publicly espoused -- posing through his writings as a person of great moral concern and integrity, while revealing through his actions that he was no such thing. To see that this isn't the core issue, consider the following case:

Cheeseburger Ethicist. Diane is a philosophy professor specializing in ethics. She regularly teaches Peter Singer's arguments for vegetarianism to her lower-division students. In class, she asserts that Singer's arguments are sound and that vegetarianism is morally required. She openly emphasizes, however, that she herself is not personally a vegetarian. Although in her judgment, vegetarianism is morally required, she chooses to eat meat. She affirms in no uncertain terms that vegetarianism is not ethically optional, then announces that after class she'll go to the campus cafeteria for a delicious cheeseburger.

Diane isn't a hypocrite, at least not straightforwardly so. We might imagine a version of Scheler, too, who was entirely open about his failure to abide by his own teachings, so that no reader would be misled.

Non-Overridingness Is Only Part of the Issue

There's a well-known debate about whether ethical norms are "overriding". If an action is ethically required, does that imply that it is required full stop, all things considered? Or can we sometimes reasonably say, "although ethics requires X, all things considered it's better not to do X"? We might imagine Diane concluding her lesson "-- and thus ethics requires that we stop eating meat. So much the worse for ethics! Let's all go enjoy some cheeseburgers!" We might imagine Scheler adding a preface: "if you want to be ethical and full of good religious spirit, this book gives you some excellent advice; but for myself, I'd rather laugh with the sinners."

Those are interesting cases to consider, but they're not my target cases. We can also imagine Diane and Scheler saying, apparently sincerely, all things considered, you and I should follow their ethical recommendations. We can imagine them holding, or seeming to hold, at least intellectually, that such-and-such really is the best thing to do overall, and yet simply not doing it themselves.

The Aim of Academic Ethics and Some Considerations Favoring Schelerian Separation

Scheler and Diane might defend themselves plausibly as follows: The job of an ethics professor is to evaluate ethical views and ethical arguments, producing research articles and educating students in the ideas of the discipline. In this respect, ethics is no different from other academic disciplines. Chemists, Shakespeare scholars, metphysicians -- what we expect is that they master an area of intellectual inquiry, teach it, contribute to it. We don't demand that they also live a certain way. Ethicists are supposed to be scholars, not saints.

Thus, ethicists succeed without qualification if they find sound arguments for interesting ethical conclusions, which they teach to their students and publish as research, engaging capably in this intellectual endeavor. How they live their lives matters to their conclusions as little as it matters how research chemists live their lives. We should judge Scheler's ethical writings by their merit as writings. His life needn't come into it. He can point the way to Boston while hightailing it to Philadephia.

On the other hand, Aristotle famously suggested that the aim of studying ethics "is not, as... in other inquiries, the attainment of theoretical knowledge" but "to become good" (4th c. BCE/1962, 1103b, p. 35). Many philosophers have agreed with Aristotle, for example, the ancient Stoics and Confucians (Hadot 1995; Ivanhoe 2000). We study ethics -- at least some of us do -- at least in part because we want to become better people.

Does this seem quaint and naive in a modern university context? Maybe. People can approach academic ethics with different aims. Some might be drawn primarily by the intellectual challenge. Others might mainly be interested in uncovering principles with which they can critique others.

Those who favor a primarily intellectualistic approach to ethics might even justifiably mistrust their academic ethical thinking -- sufficiently so that they intentionally quarantine it from everyday life. If common sense and tradition are a more reasonable guide to life than academic ethics, good policy might require not letting your perhaps weird and radical ethical conclusions change how you treat the people around you. Radical utilitarian consequentialist in the classroom, conventional friend and husband at home. Nihilistic anti-natalist in the journals, loving mother of three at home. Thank goodness.

If there's no expectation that ethicists live according to the norms they espouse, that also frees them to explore radical ideas which might be true but which might require great sacrifice or be hard to live by. If I accept Schelerian separation, I can conclude that property is theft or that it's unethical to enjoy any luxuries without thereby feeling that I have any special obligation to sacrifice my minivan or my children's college education fund. If my children's college fund really were at stake, I would be highly motivated to avoid the conclusion that I am ethically required to sacrifice it. That fact would likely bias my reasoning. If ethics is treated more like an intellectual game, divorced from my practical life, then I can follow the moves where they take me without worrying that I'll need to sacrifice anything at the end. A policy of Schelerian separation might then generate better academic discourse in which researchers are unafraid to follow their thinking to whatever radical conclusions it leads them.

Undergraduates are often curious whether Peter Singer personally lives as a vegan and personally donates almost all of his presumably large salary to charitable causes, as his ethical views require. But Singer's academic critics focus on his arguments, not his personal life. It would perhaps be a little strange if Singer were a double-bacon-cheeseburger-eating Maserati driver draped in gold and diamond bling; but from a purely argumentative perspective such personal habits seem irrelevant. The Singer Principle stands or falls on its own merits, regardless of how well or poorly Peter Singer himself embodies it.

So there's a case to be made for Schelerian separation -- the view that academic ethics and personal life are and should be entirely distinct matters, and in particular that if an ethicist does not live according to the norms they espouse in their academic work, that is irrelevant to the assessment of their work. I feel the pull of this idea. There's substantial truth in it, I suspect. However, in a future post I'll discuss why I think this is too simple. (Meanwhile, reader comments -- whether on this post, by email, or on linked social media -- are certainly welcome!)

-------------------------------------------

Follow-up post:

"One Reason to Walk the Walk: To Give Specific Content to Your Assertions" (Sep 8, 2023)

Friday, June 23, 2023

Dishonesty among Honesty Researchers

Until recently, one of the most influential articles on the empirical psychology of honesty was Shu, Mazar, Gino, Ariely, and Bazerman 2012, which purported to show, across three studies, that people who sign honesty pledges at the beginning of a process in which dishonesty is possible will be much more honest than those who sign at the end of the process. The result is intuitive (if you sign before doing a task, that might change your behavior in a way that signing after wouldn't), and it suggests straightforward, practical interventions: Have students, customers, employees, etc., sign honesty pledges before occasions in which they might be tempted to cheat.

Unfortunately, there appear to have been not just one but two separate instances of fraud in this study, and the results appear not to replicate in an honest (presumably), preregistered replication attempt.

The first fraud was revealed in 2021, and concerned customers' honest or dishonest reporting of mileage to an insurance company. The data appear to have been largely fabricated, either by Ariely or by whoever supplied him the data; none of the other collaborators are implicated.

The second fraud was revealed last week, and appears to be entirely separate, involving multiple papers by Gino, including Study 1 in the already-retracted Shu et al. 2012. In Shu et al., Study 1, participants could receive financial advantage by overreporting travel expenses or how many math puzzles they had solved earlier in the experiment, and purportedly there was less overreporting if participants signed an honesty pledge first. Several participants' results appear to have been strategically shifted from one condition to another to produce the reported effect. In light of an apparent pattern of fraud across several papers, Harvard has put Gino on administrative leave.

Yes, two apparently unrelated instances of fraud, by different researchers, on the very same famous article about honesty.

[some of the evidence of fraud, from https://datacolada.org/109]

For those who follow such news, Gino's case might bring to mind another notorious case of fraud by a Harvard psychologist: In 2010, Marc Hauser, was found to have faked and altered data in his work on rule-learning in monkeys (e.g., here) and subsequently resigned his academic post.

I have three observations:

First, Gino, Ariely, and Hauser are (or were) three of the most prominent moral psychologists in the world. Although Hauser's discovered fraud concerned monkey rule-learning, he was probably as well known for his work on moral cognition, which culminated in his 2006 book Moral Minds. This is a high rate of discovered fraud among leading moral psychology researchers, especially if we assume that most fraud goes undiscovered. I am struck by the parallel to my series of papers on the moral behavior of ethicists (overview here). Ethicists, and apparently also moral psychologists, appear to behave no better on average than socially similar people who don't study morality.

One might think that ethics and moral psychology would either (a.) tend to draw people particularly interested in advancing ethical ends (for example, particularly attuned to the importance of honesty) and thus presumably personally more ethical than average or (b.) at least make ethics personally more salient for them and thus presumably motivationally stronger. Either neither (a) nor (b) are true or studying ethics and moral psychology also has some countervailing negative effect.

Second, around the time of the discovery of his fraud, Hauser was working on a book titled Evilicious, concerning humans' widespread appetite for behaving immorally, and similarly Gino recently published a book titled Rebel Talent: Why It Pays to Break the Rules at Work and in Life. The titles perhaps speak for themselves: Part of studying moral psychology is studying bad moral psychology. (The "rebels" Gino celebrates might not be breaking moral rules -- celebrating that might impair sales and lucrative speaking gigs -- but the idea does appear to generalize.)

Third, the first and second observation suggest a mechanism by which the study of ethics and moral psychology can negatively affect the ethics of the researcher. If people mostly aim -- as I think they do -- toward moral mediocrity, that is, not to be good or bad by absolute standards but rather to be about as morally good as their peers, then if your opinion about what is common changes, your own behavior will tend to change accordingly to match. The more you study the worst side of humanity, the more you can think to yourself "well, even if X is bad, it's not as bad as all that, and people do X all the time". If you study dishonesty, you might be struck by the thought that dishonesty is everywhere -- and then if you are tempted to be dishonest you might think, "well, everyone else is doing it". I can easily imagine someone in Gino's position thinking, probably most researchers have from time to time shifted around a few rows of data to make their results pop out better. Is it really so bad if I do it too? And then once the deed has been done, it probably becomes easier, for multiple reasons, to assume that such fraud is widespread and just part of the usual academic game (grounds for thinking this might include rationalization, positive self-illusion, and using oneself as a model for thinking about others).

I do still think and hope that fraud is very much the exception. In interacting with dozens of researchers over the years and working through a variety of raw datasets, I've seen some shaky methodology and maybe a few instances of unintentional p-hacking; but I have never witnessed, suspected, seen signs of, or heard any suggestion of outright fraud or data alteration.

Friday, February 24, 2023

Moral Mediocrity, Apologizing for Vegetarianism, and Do-Gooder Derogation

Though I'm not a vegetarian, one of my research interests is the moral psychology of vegetarianism. Last weekend, when I was in Princeton giving a talk on robot rights, a vegetarian apologized to me for being vegetarian.

As a meat-eater, I find it's not unusual for vegetarians to apologize to me. Maybe this wouldn't be so notable if their vegetarianism inconvenienced me in any way, but often it does not. In Princeton, we were both in line for a catering spread that had both meat and vegetarian options. I was in no obvious way wronged, harmed, or inconvenienced. So what is going on?

Here's my theory.

Generally speaking, I believe that people aim to be morally mediocre. That is, rather than aiming to be morally good (or not morally bad) by absolute standards, most people aim to be about as morally good as their peers -- not especially better, not especially worse. People might not conceptualize themselves as aiming for mediocrity. Often, they concoct post-hoc rationalizations to justify their choices. But their choices implicitly reveal their moral target. Systematically, people avoid being among the worst of their peers while refusing the pay the costs of being among the best. For example, they don't want to be the one jerk who messes up a clean environment; but they also don't want to be the one sucker who puts in the effort to keep things clean if others aren't also doing so. (See my notes on the game of jerk and sucker.)

Now if people do in fact aim to be about as morally good as their peers, we can expect that under certain conditions they don't want their peers to improve their moral behavior. Under what conditions? Under the conditions that your peers' self-improvement benefits you less than the raising of the moral bar costs you.

Let's say that your friends all become nicer to each other. This isn't so bad. You benefit from being in a circle of nice people. Needing to become a bit nicer yourself might be a reasonable cost to pay for that benefit.

But if your friends start becoming vegetarians, you accrue the moral costs without the benefits. The moral bar is raised for you, implicitly, at least a little bit; but the benefits go to non-human animals, if they go anywhere. You now either have to think a bit worse of yourself relative to your peers or you have to start changing your behavior. How annoying! No wonder vegetarians are moved to apologize. (To be clear, I'm not saying we should be annoyed by this, just that my theory predicts that we will be annoyed.)

Note that this explanation works especially well for those of us who think it is morally better to avoid eating meat than for those of us who see no moral difference between eating meat and eating vegetarian. If you really see no moral difference (deep down, and not just because of superficial, post-hoc rationalization), then you'll see the morally motivated vegetarian just as morally confused. If they apologize, it would be like someone apologizing to you for acting according to some other mistaken moral principle, such as apologizing for abstinence before marriage. No one needs to apologize to you for that, unless they are harming or inconveniencing you in some way -- for example, because they are dating you and think you'll be disappointed. (Alternatively, they might apologize for the more abstract wrong of seeing you as morally deficient because you follow different principles; but that type of apology looks and feels a little different, I think.)

If this moral mediocrity explanation of vegetarian apology works, it ought to generalize to other cases where friends follow higher moral standards that don't benefit you. Some possible examples: In a circle of high school students who habitually cheat on tests, a friend might apologize for being unwilling to cheat. In a group of people who feel somewhat guilty about taking a short cut through manicured grass, one might decide they want to take the long way, apologizing to the group for the extra time, feeling more guilt than would accompany an ethically neutral reason for delay. On this model, the felt need for the apology would vary with a few predictable parameters: greater need the closer one is to being a peer whose behavior might be compared, greater need the more vivid and compelling the comparison (for example if you are side by side), lesser need the more the moral principle can be seen as idiosyncratic and inapplicable to the other (and thus some apologies of this sort suggest that the principle is idiosyncratic).

Do-gooder derogation is the tendency for people to think badly of people who follow more demanding moral standards. The moral mediocrity hypothesis is one possible explanation for this tendency, predicting among other things that derogation will be greater when the do-gooder is a peer and, perhaps unintuitively, that the derogation will be greater when the moral standard is compelling enough to the derogator that they already feel a little bit bad about not adhering to it.

-------------------------------------------

The Collusion Toward Moral Mediocrity (Sep 1, 2022)

Aiming for Moral Mediocrity (Res Philosophica, 2019)

Image: Dall-E 2 "oil painting of a woman apologizing to an eggplant"

Thursday, January 12, 2023

Further Methodological Troubles for the Moralometer

[This post draws on ideas developed in collaboration with psychologist Jessie Sun.]

If we want to study morality scientifically, we should want to measure it. Imagine trying to study temperature without a thermometer or weight without scales. Of course indirect measures are possible: We can't put a black hole on a scale, but we can measure how it bends the light that passes nearby and thereby infer its mass.

Last month, I raised a challenge for the possibility of developing a "moralometer" (a device that accurately measure's a person's overall morality). The challenge was this: Any moralometer would need to draw on one or more of four methods: self-report, informant report, behavioral measures, or physiological measures. Each one of these methods has serious shortcomings as a basis for general moral measurement of one's overall moral character.

This month, I raise a different (but partly overlapping) set of challenges, concerning how well we can specify the target we're aiming to measure.

Problems with Flexible Measures

Let's call a measure of overall morality flexible if it invites a respondent to apply their own conception of morality, in a flexible way. The respondent might be the target themselves (in self-report measures of morality) or they might be a peer, colleague, acquaintance, or family member of the target (in informant-report measures of morality). The most flexible measures apply "thin" moral concepts in Bernard Williams' sense -- prompts like "Overall, I am a morally good person" [responding on an agree/disagree scale] or "[the target person] behaves ethically".

While flexible measures avoid excessive rigidity and importing researchers' limited and possibly flawed understandings of morality into the rating procedure, the downsides are obvious if we consider how people with noxious worldviews might rate themselves and others. The notorious Nazi Adolf Eichmann, for example, appeared to have thought highly of his own moral character. Alexander "the Great" was admired for millennia, including as a moral exemplar of personal bravery and spreader of civilization, despite his main contribution being conquest through aggressive warfare, including the mass slaughter and enslavement of at least one civilian population.

I see four complications:

Relativism and Particularism. Metaethical moral relativists hold that different moral standards apply to different people or in different cultures. While I would reject extreme relativist views according to which genocide, for example, doesn't warrant universal condemnation, a moderate version of relativism has merit. Cultures might reasonably differ, for example, on the age of sexual consent, and cultures, subcultures, and social groups might reasonably differ in standards of generosity in sharing resources with neighbors and kin. If so, then flexible moralometers, employed by raters who use locally appropriate standards, will have an advantage over inflexible moralometers which might inappropriately import researchers' different standards. However, even flexible moralometers will fail in the face of relativism if they are employed by raters who employ the wrong moral standards.

According to moral particularism, morality isn't about applying consistent rules or following any specifiable code of behavior. Rather, what's morally good or bad, right or wrong, frequently depends on particular features of specific situations which cannot be fully codified in advance. While this isn't the same as relativism, it presents a similar methodological challenge: The farther the researcher or rater stands from the particular situation of the target, the more likely they are to apply inappropriate standards, since they are likely to be ignorant of relevant details. It seems reasonable to accept at least moderate particularism: The moral quality of telling a lie, stealing $20, or stopping to help a stranger, might often depend on fine details difficult to know from outside the situation.

If the most extreme forms of moral relativism or particularism (or moral skepticism) are true, then no moralometer could possibly work, since there won't be stable truths about people's morality, or the truths will be so complicated or situation dependent as to defy any practical attempt at measurement. Moderate relativism and particularism, if correct, provide reason to favor flexible standards as judged by self-ratings or the ratings of highly knowledgeable peers sensitive to relevant local details; but even in such cases all of the relevant adjustments might not be made.

Incommensurability. Goods are incommensurable if there is no fact of the matter about how they should be weighed against each other. Twenty dollar bills and ten dollar bills are commensurable: Two of the latter are worth exactly one of the former. But it's not clear how to weigh, for example, health against money or family versus career. In ethics, if Steven tells a lie in the morning and performs a kindness in the afternoon, how exactly ought these to be weighed against each other? If Tara is stingy but fair, is her overall moral character better, worse, or the same as that of Nicholle, who is generous but plays favorites? Combining different features of morality into a single overall score invites commensurability problems. Plausibly, there's no single determinately best weighting of different factors.

Again, I favor a moderate view. Probably in many cases there is no single best weighting. However, approximate judgments remain possible. Even if health and money can't be precisely weighed against each other, extreme cases permit straightforward decisions. Most of us would gladly accept a scratch on a finger for the sake of a million dollars and would gladly pay $10 to avoid stage IV cancer. Similarly, Stalin was morally worse than Martin Luther King, even if Stalin had some virtues and King some vices. Severe sexual harassment of an employee is worse than fibbing to your spouse to get out of washing the dishes.

Moderate incommensurability limits the precision of any possible moralometer. Vices and virtues, and rights and wrongs of different types will be amenable only to rough comparison, not precise determination in a single common coin.

Moral error. If we let raters reach independent judgments about what is morally good or bad, right or wrong, they might simply get it wrong. As mentioned above, Eichmann appears to have thought well of himself, and the evidence suggests that he also regarded other Nazi leaders as morally excellent. Raters will disagree about the importance of purity norms (such as norms against sexual promiscuity), the badness of abortion, and the moral importance, or not, of being vegetarian. Bracketing relativism, then at least some of these raters must be factually mistaken about morality, on one side or another, adding substantial error into their ratings.

The error issue is enormously magnified if ordinary people's moral judgments are systematically mistaken. For example, if the philosophically discoverable moral truth is that the potential impact of your choices on future generations morally far outweighs the impact you have on the people around you (see my critiques of "longtermism" here and here), then the person who is an insufferable jerk to everyone around them but donates $5000 to an effective charity might be in fact far morally better than a personally kind and helpful person who donates nothing to charity -- but informants' ratings might very well suggest the reverse. Similar remarks would apply to any moral theory that is sharply at odds with commonsense moral intuition.

Evaluative bias. People are, of course, typically biased in their own favor. Most people (not all!) are reluctant to think of themselves as morally below average, as unkind, unfair, or callous, even if they in fact are. Social desirability bias is the well-known phenomenon that survey respondents will tend to respond to questions in a manner that presents them in a good light. Ratings of friends, family, and peers will also tend to be positively biased: People tend to view their friends and peers positively, and even when not they might be reluctant to "tell on" them to researchers. If the size of evaluative bias were consistent, it could be corrected for, but presumably it can vary considerably from case to case, introducing further noise.

Problems with Inflexible Measures

Given all these problems with flexible measures of morality, it might seem best to build our hypothetical moralometer instead around inflexible measures. Assuming physiological measures are unavailable, the most straightforward way to do this would be to employ researcher-chosen behavioral measures. We could try to measure someone's honesty by seeing whether they will cheat on a puzzle to earn more money in a laboratory setting. We could examine publicly available criminal records. We could see whether they are willing to donate a surprise bonus payment to a charity.

Unfortunately, inflexible measures don't fully escape the troubles that dog flexible measures, and they bring new troubles of their own.

Relativism and particularism. Inflexible measures probably aggravate the problems with relativism and particularism discussed above. With self-report and informant report, there's at least an opportunity for the self or the informant to take into account local standards and particulars of the situation. In contrast, inflexible measures will ordinarily be applied equally to all without adjustment for context. Suppose the measure is something like "gives a surprise bonus of $10 to charity". This might be a morally very different decision for a wealthy participant than for a needy participant. It might be a morally very different decision for a participant who would save that $10 to donate it to a different and maybe better charity than for a participant who would simply pocket the $10. But unless those other factors are being measured, as they normally would not be, they cannot be taken account of.

Incommensurability. Inflexible measures also won't avoid incommensurability problems. Suppose our moralometer includes one measure of honesty, one measure of generosity, and one measure of fairness. The default approach might be for a summary measure simply to average these three, but that might not accurately reflect morality: Maybe a small act of dishonesty in an experimental setting is far less morally important than a small act of unfairness in that same experimental setting. For example, getting an extra $1 from a researcher by lying in a task that transparently appears to demand a lie (and might even be best construed as a game in which telling untruths is just part of the task, in fact pleasing the researcher) might be approximately morally neutral while being unfair to a fellow participant in that same study might substantially hurt the other's feelings.

Sampling and ecological validity. As mentioned in my previous post on moralometers, fixed behavioral measures are also likely to have severe methodological problems concerning sampling and ecological validity. Any realistic behavioral measure is likely to capture only a small and perhaps unrepresentative part of anyone's behavior, and if it's conducted in a laboratory or experimental setting, behavior in that setting might not correlate well with behavior with real stakes in the real world. How much can we really infer about a person's overall moral character from the fact that they give their monetary bonus to charity or lie about a die roll in the lab?

Moral authority. By preferring a fixed measure, the experimenter or the designer of the moralometer takes upon themselves a certain kind of moral authority -- the authority to judge what is right and wrong, moral or immoral, in others' behavior. In some cases, as in the Eichmann case, this authority seems clearly preferable to deferring to the judgment of the target and their friends. But in other cases, it is a source of error -- since of course the experimenter or designer might be wrong about what is in fact morally good or bad.

Being wrong while taking up, at least implicitly, this mantle of moral authority has at least two features that potentially make it worse than the type of error that arises by wrongly deferring to mistaken raters. First, the error is guaranteed to be systematic. The same wrong standards will be applied to every case, rather than scattered in different (and perhaps partly canceling) directions as might be the case with rater error. And second, it risks a lack of respect: Others might reasonably object to being classified as "moral" or "immoral" by an alien set of standards devised by researchers and with which they disagree.

In Sum

The methodological problems with any potential moralometer are extremely daunting. As discussed in December, all moralometers must rely on some combination of self-report, informant report, behavioral measure, or physiological measure, and each of these methods has serious problems. Furthermore, as discussed today, a batch of issues around relativism, particularism, disagreement, incommensurability, error, and moral authority dog both flexible measures of morality (which rely on raters' judgments about what's good and bad) and inflexible measures (which rely on researchers' or designers' judgments).

Coming up... should we even want a moralometer if we could have one? I discussed the desirability or undesirability of a perfect moralometer in December, but I want to think more carefully about the moral consequences of the more realistic case of an imperfect moralometer.

Thursday, December 22, 2022

The Moral Measurement Problem: Four Flawed Methods

[This post draws on ideas developed in collaboration with psychologist Jessie Sun.]

So you want to build a moralometer -- that is, a device that measures someone's true moral character? Yes, yes. Such a device would be so practically and scientifically useful! (Maybe somewhat dystopian, though? Careful where you point that thing!)

You could try to build a moralometer by one of four methods: self-report, informant report, behavioral measurement, or physiological measurement. Each presents daunting methodological challenges.

Self-report moralometers

To find out how moral a person is, we could simply ask them. For example, Aquino and Reed 2002 ask people how important it is to them to have various moral characteristics, such as being compassionate and fair. More directly, Furr and colleagues 2022 have people rate the extent to which they agree with statements such as "I would say that I am a good person" and "I tend to act morally".

Could this be the basis of a moralometer? That depends on the extent to which people are able and willing to report on their overall morality.

People might be unable to accurately report their overall morality.

Vazire 2010 has argued that self-knowledge of psychological traits tends to be poor when the traits are highly evaluative and not straightforwardly observable (e.g., "intelligent", "creative"), since under those conditions people are (typically) motivated to see themselves favorably and -- due to low observability -- not straightforwardly confronted with the unpleasant news they would prefer to deny.

One's overall moral character is evaluatively loaded if anything is. Nor is it straightforwardly observable. Unlike height or talkativeness, someone motivated not to see themselves as, say, unfair or a jerk can readily find ways to explain away the evidence (e.g., "she deserved it", "I'm in such a hurry").

Furthermore, it sometimes requires a certain amount of moral insight to distinguish morally good from morally bad behavior. Part of being a sexist creep is typically not seeing anything wrong with the kinds of things that sexist creeps typically do. Conversely, people who are highly attuned to how they are treating others might tend to beat themselves up over relatively small violations. We might thus expect a moral Dunning-Kruger effect: People with bad moral character might disproportionately overestimate their moral character, so that people's self-opinions tend to be undiagnostic of the actual underlying trait.

Even to the extent people are able to report their overall morality, people might be unwilling to report it.

It's reasonable to expect that self-reports of moral character would be distorted by socially desirable responding, the tendency for questionnaire respondents to answer in a manner that they believe will reflect well on them. To say that you are extremely immoral seems socially undesirable. We would expect that people (e.g., Sam Bankman-Fried) would tend to want to portray themselves as morally above average. On the flip side, to describe oneself as "extremely moral" (say, 100 on a 0-100 scale from perfect immorality to perfect morality) might come across as immodest. So even people who believe themselves to be tip-top near-saints might not frankly express their high self-opinions when directly asked.

Reputational moralometers

Instead of asking people to report on their own morality, could we ask other people who know them? That is, could we ask their friends, family, neighbors, and co-workers? Presumably, the report would be less distorted by self-serving or ego-protective bias. There's less at stake when judging someone else's morality than when judging your own. Also, we could aggregate across multiple informants, combining several different people's ratings, possibly canceling out some sources of noise and bias.

Unfortunately, reputational moralometers -- while perhaps somewhat better than self-report moralometers -- also present substantial methodological challenges.

The informant advantage of decreased bias could be offset by a corresponding increased in ignorance.

Informants don't observe all of the behavior of the people whose morality they are judging, and they have less access to the thoughts, feelings, and motivations that are relevant to the moral assessment of behavior. Informant reports are thus likely to be based only on a fraction of the evidence that self-report would be based on. Moreover, people tend to hide their immoral behaviors, and presumably some people are better at doing so than others. Also, people play different roles in our lives, and romantic partners, coworkers, friends, and teachers will typically only see us in limited, and perhaps unrepresentative, contexts. A good moralometer would require the correct balancing of a range of informants with complementary patches of ignorance, which is likely to be infeasible.

Informants are also likely to be biased.

Informant reports may be contaminated not by self-serving bias but by "pal-serving bias" (Leising et al 2010). If we rely on people to nominate their own informants, they are likely to nominate people who have a positive perception of them. Furthermore, the informants might be reluctant "tell on" or badly evaluate their friends, especially in contexts (like personnel selection) where the rating could have real consequences for the target. The ideal informant would be someone who knows the target well but isn't positively biased toward you. In reality, however, there's likely a tradeoff between knowledge and bias, so that those who are most likely to be impartial are not the people who know you best.

Positivity bias could in principle be corrected for if every informant was equally biased, but it's likely that some targets will have informants who are more biased than others.

Behavioral moralometers

Given the problems with self-report and informant report, direct behavioral measures might seem promising. Much of my own work on the morality of professional ethicists and the effectiveness of ethics instruction has depended on direct behavioral measures such as courteous and discourteous behavior at philosophy conferences, theft of library books, meat purchases on campus (after attending a class on the ethics of eating meat), charitable giving, and choosing to join the Nazi party in 1930s Germany. Others have measured behavior in dictator games, lying to the experimenter in laboratory settings, criminal behavior, and instances of comforting, helping, and sharing.

Individual behaviors are only a tiny and possibly unrepresentative sample.

Perhaps the biggest problem with behavioral moralometers is that any single, measurable behavior will inevitably be a minuscule fraction of the person's behavior, and might not be at all representative of the person's overall morality. The inference from this person donated $10 in this instance or this person committed petty larceny two years ago to this person's overall moral character is good or bad is a giant leap from a single observation. Given the general variability and inconstancy of most people's behavior, we shouldn't expect a single observation, or even a few related observations, to provide an accurate picture of the person overall.

Although self-report and informant report are likely to be biased, they aggregate many observations of the target into a summary measure, while the typical behavioral study does not.

There is likely a tradeoff between feasibility and validity.

There are some behaviors that are so telling of moral character that a single observation might reveal a lot: If someone commits murder for hire, we can be pretty sure they're no saint. If someone donates a kidney to a stranger, that too might be highly morally diagnostic. But such extreme behaviors will occur at only tiny rates in the general population. Other substantial immoral behaviors, such as underpaying taxes by thousands of dollars or cheating on one's spouse, might occur more commonly, but are likely to be undetectable to researchers (and perhaps unethical to even try to detect).

The most feasible measures are laboratory measures, such as misreporting the roll of a die to an experimenter in order to win a greater payout. But it's unclear what the relationship is between laboratory behaviors for minor stakes and overall moral behavior in the real world.

Individual behaviors can be difficult to interpret.

Another advantage of self-report and to some extent informant report have over direct behavioral measures is that there's an opportunity for contextual information to clarify the moral value or disvalue of behaviors: The morality of donating $10 or the immorality of not returning a library book might depend substantially on one's motives or financial situation, which self-report or informant report can potentially account for but which would be invisible in a simple behavioral measure. (Of course, on the flip side, this flexibility of interpretation is part of what permits bias to creep in.)

[a polygraph from 1937]

Physiological moralometers

A physiological moralometer would attempt to measure someone's morality by measuring something biological like their brain activity under certain conditions or their genetics. Given the current state of technology, no such moralometer is likely to arise soon. The best known candidate might be the polygraph or lie detector test, which is notoriously unreliable and of course doesn't purport to be a general measure of honesty much less of overall moral character.

Any genetic measure would of course omit any environmental influences on morality. Given the likelihood that environmental influences play a major role in people's moral development, no genetic measure could have a high correlation with a person's overall morality.

Brain measures, being potentially closer to measuring the mental states that underlie morality, don't have a similar ceiling accuracy, but currently look less promising than behavioral measures, informant report measures, and probably even self-report measures.

The Inaccuracy of All Methods

It thus seems likely that there is no good method for accurately measuring a person's overall moral character. Self-report, informant report, behavioral measures, and physiological measures all face large methodological difficulties. If a moralometer is something that accurately measures an individual person's morality, like a thermometer accurately (accurately enough) measures a person's body temperature, there's little reason to think we could build one.

It doesn't follow that we can't imprecisely measure someone's moral character. It's reasonable to expect the existence of small correlations between some potential measures and a person's real underlying overall moral character. And maybe such measures could be used to look for trends aggregated across groups.

Now, this whole post has been premised on the idea that it make sense to talk of a person's overall morality as something that could be captured, at least in principle, by a number such as 0 to 100 or -1 to +1. There are a few reasons to doubt this, including moral relativism and moral incommensurability -- but more on that in a future post.

The Splintered Mind