Thursday, December 22, 2022

The Moral Measurement Problem: Four Flawed Methods

[This post draws on ideas developed in collaboration with psychologist Jessie Sun.]

So you want to build a moralometer -- that is, a device that measures someone's true moral character? Yes, yes. Such a device would be so practically and scientifically useful! (Maybe somewhat dystopian, though? Careful where you point that thing!)

You could try to build a moralometer by one of four methods: self-report, informant report, behavioral measurement, or physiological measurement. Each presents daunting methodological challenges.

Self-report moralometers

To find out how moral a person is, we could simply ask them. For example, Aquino and Reed 2002 ask people how important it is to them to have various moral characteristics, such as being compassionate and fair. More directly, Furr and colleagues 2022 have people rate the extent to which they agree with statements such as "I would say that I am a good person" and "I tend to act morally".

Could this be the basis of a moralometer? That depends on the extent to which people are able and willing to report on their overall morality.

People might be unable to accurately report their overall morality.

Vazire 2010 has argued that self-knowledge of psychological traits tends to be poor when the traits are highly evaluative and not straightforwardly observable (e.g., "intelligent", "creative"), since under those conditions people are (typically) motivated to see themselves favorably and -- due to low observability -- not straightforwardly confronted with the unpleasant news they would prefer to deny.

One's overall moral character is evaluatively loaded if anything is. Nor is it straightforwardly observable. Unlike height or talkativeness, someone motivated not to see themselves as, say, unfair or a jerk can readily find ways to explain away the evidence (e.g., "she deserved it", "I'm in such a hurry").

Furthermore, it sometimes requires a certain amount of moral insight to distinguish morally good from morally bad behavior. Part of being a sexist creep is typically not seeing anything wrong with the kinds of things that sexist creeps typically do. Conversely, people who are highly attuned to how they are treating others might tend to beat themselves up over relatively small violations. We might thus expect a moral Dunning-Kruger effect: People with bad moral character might disproportionately overestimate their moral character, so that people's self-opinions tend to be undiagnostic of the actual underlying trait.

Even to the extent people are able to report their overall morality, people might be unwilling to report it.

It's reasonable to expect that self-reports of moral character would be distorted by socially desirable responding, the tendency for questionnaire respondents to answer in a manner that they believe will reflect well on them. To say that you are extremely immoral seems socially undesirable. We would expect that people (e.g., Sam Bankman-Fried) would tend to want to portray themselves as morally above average. On the flip side, to describe oneself as "extremely moral" (say, 100 on a 0-100 scale from perfect immorality to perfect morality) might come across as immodest. So even people who believe themselves to be tip-top near-saints might not frankly express their high self-opinions when directly asked.

Reputational moralometers

Instead of asking people to report on their own morality, could we ask other people who know them? That is, could we ask their friends, family, neighbors, and co-workers? Presumably, the report would be less distorted by self-serving or ego-protective bias. There's less at stake when judging someone else's morality than when judging your own. Also, we could aggregate across multiple informants, combining several different people's ratings, possibly canceling out some sources of noise and bias.

Unfortunately, reputational moralometers -- while perhaps somewhat better than self-report moralometers -- also present substantial methodological challenges.

The informant advantage of decreased bias could be offset by a corresponding increased in ignorance.

Informants don't observe all of the behavior of the people whose morality they are judging, and they have less access to the thoughts, feelings, and motivations that are relevant to the moral assessment of behavior. Informant reports are thus likely to be based only on a fraction of the evidence that self-report would be based on. Moreover, people tend to hide their immoral behaviors, and presumably some people are better at doing so than others. Also, people play different roles in our lives, and romantic partners, coworkers, friends, and teachers will typically only see us in limited, and perhaps unrepresentative, contexts. A good moralometer would require the correct balancing of a range of informants with complementary patches of ignorance, which is likely to be infeasible.

Informants are also likely to be biased.

Informant reports may be contaminated not by self-serving bias but by "pal-serving bias" (Leising et al 2010). If we rely on people to nominate their own informants, they are likely to nominate people who have a positive perception of them. Furthermore, the informants might be reluctant "tell on" or badly evaluate their friends, especially in contexts (like personnel selection) where the rating could have real consequences for the target. The ideal informant would be someone who knows the target well but isn't positively biased toward you. In reality, however, there's likely a tradeoff between knowledge and bias, so that those who are most likely to be impartial are not the people who know you best.

Positivity bias could in principle be corrected for if every informant was equally biased, but it's likely that some targets will have informants who are more biased than others.

Behavioral moralometers

Given the problems with self-report and informant report, direct behavioral measures might seem promising. Much of my own work on the morality of professional ethicists and the effectiveness of ethics instruction has depended on direct behavioral measures such as courteous and discourteous behavior at philosophy conferences, theft of library books, meat purchases on campus (after attending a class on the ethics of eating meat), charitable giving, and choosing to join the Nazi party in 1930s Germany. Others have measured behavior in dictator games, lying to the experimenter in laboratory settings, criminal behavior, and instances of comforting, helping, and sharing.

Individual behaviors are only a tiny and possibly unrepresentative sample.

Perhaps the biggest problem with behavioral moralometers is that any single, measurable behavior will inevitably be a minuscule fraction of the person's behavior, and might not be at all representative of the person's overall morality. The inference from this person donated $10 in this instance or this person committed petty larceny two years ago to this person's overall moral character is good or bad is a giant leap from a single observation. Given the general variability and inconstancy of most people's behavior, we shouldn't expect a single observation, or even a few related observations, to provide an accurate picture of the person overall.

Although self-report and informant report are likely to be biased, they aggregate many observations of the target into a summary measure, while the typical behavioral study does not.

There is likely a tradeoff between feasibility and validity.

There are some behaviors that are so telling of moral character that a single observation might reveal a lot: If someone commits murder for hire, we can be pretty sure they're no saint. If someone donates a kidney to a stranger, that too might be highly morally diagnostic. But such extreme behaviors will occur at only tiny rates in the general population. Other substantial immoral behaviors, such as underpaying taxes by thousands of dollars or cheating on one's spouse, might occur more commonly, but are likely to be undetectable to researchers (and perhaps unethical to even try to detect).

The most feasible measures are laboratory measures, such as misreporting the roll of a die to an experimenter in order to win a greater payout. But it's unclear what the relationship is between laboratory behaviors for minor stakes and overall moral behavior in the real world.

Individual behaviors can be difficult to interpret.

Another advantage of self-report and to some extent informant report have over direct behavioral measures is that there's an opportunity for contextual information to clarify the moral value or disvalue of behaviors: The morality of donating $10 or the immorality of not returning a library book might depend substantially on one's motives or financial situation, which self-report or informant report can potentially account for but which would be invisible in a simple behavioral measure. (Of course, on the flip side, this flexibility of interpretation is part of what permits bias to creep in.)

[a polygraph from 1937]

Physiological moralometers

A physiological moralometer would attempt to measure someone's morality by measuring something biological like their brain activity under certain conditions or their genetics. Given the current state of technology, no such moralometer is likely to arise soon. The best known candidate might be the polygraph or lie detector test, which is notoriously unreliable and of course doesn't purport to be a general measure of honesty much less of overall moral character.

Any genetic measure would of course omit any environmental influences on morality. Given the likelihood that environmental influences play a major role in people's moral development, no genetic measure could have a high correlation with a person's overall morality.

Brain measures, being potentially closer to measuring the mental states that underlie morality, don't have a similar ceiling accuracy, but currently look less promising than behavioral measures, informant report measures, and probably even self-report measures.

The Inaccuracy of All Methods

It thus seems likely that there is no good method for accurately measuring a person's overall moral character. Self-report, informant report, behavioral measures, and physiological measures all face large methodological difficulties. If a moralometer is something that accurately measures an individual person's morality, like a thermometer accurately (accurately enough) measures a person's body temperature, there's little reason to think we could build one.

It doesn't follow that we can't imprecisely measure someone's moral character. It's reasonable to expect the existence of small correlations between some potential measures and a person's real underlying overall moral character. And maybe such measures could be used to look for trends aggregated across groups.

Now, this whole post has been premised on the idea that it make sense to talk of a person's overall morality as something that could be captured, at least in principle, by a number such as 0 to 100 or -1 to +1. There are a few reasons to doubt this, including moral relativism and moral incommensurability -- but more on that in a future post.

9 comments:

Howard said...

The boy or girl is father or mother to the man or woman
Children are moral agents
Children are more easily observable than adults
Try measuring morality in children (or teens) first

Also, you'd have to take an anthropological approach to a handful of people the way that business executives (I know a few) are given intense testing for the sake of teamwork and efficiency
You'd have to study someone the way great men and women are subjects of biographies
We can say whether Humphry Bogart or Ronald Reagan or Sally Ride are moral or at least debate it

Howard said...

Or if you prefer, investigative journalism would provide a better metaphor

Howie said...

Plus, last idea of the night, shrinks who work intimately with patients have an idea of their morality though two caveats: people seeking counsel are not representative and therapists might be biased and get an oblique if intimate glance

Arnold said...

Via philosophy of physiology with psychology...

A less flawed moral meter method might be, the removal of one's organs or organs to save to give to another...from varied circumstances...

If this were done in clinical reporting settings, to measure the different attitudes of the donors, recipients and scientist, before and after the procedures...
...would this help understanding relationships of ethics and morality as always physical...

chinaphil said...

Another couple of potential measurement difficulties:
The moralometer might measure behaviour, but moral quality could be an internal state. Or vice versa.
Moral quality might be measured by consequences, but the consequences might take years or centuries to play out, so it may be impossible to get a reading during someone's lifetime.

Anonymous said...

I think an interesting way to construct a moralometer in the future might be through use of immersive virtual reality. One might be able test a person’s moral inclinations by having them engage in realistic-seeming scenarios within a virtual world. Because the tests would feel more like real life for subjects than most lab tasks, they might therefore be somewhat more generalizable.

Chris said...

One other methodological perspective not mentioned, related to but not the same as reputational and behavioral, is the network effects of how they have impacted events, people and circumstances in their (local) world. Is the community better off for their presence overall, or do they seem to leave a cloud of toxicity behind them wherever they go? How do they "rub off" on whatever they are part of? How would their absence be felt?This is different than measuring individual behaviors or opinions of character; it involves more global assessment of causal effects and practical impacts on others at individual, group and community levels.

Eric Schwitzgebel said...

Thanks for the comments, everyone!

Howard: Interesting idea to start with teens.

Arnold: Are you think of living kidney donors or people who tick the "organ donor" option on their driver's licenses? The first are very rare! The second are more common, but the relationship to overall morality is probably pretty weak.

Chinaphil: Yes, I basically agree -- though with caveats that actual current behavior is important (and not just internal states) and long-term consequences are at most a small piece of the picture for most people.

Anon Dec 24: Yes, interesting idea.

Chris: Thanks. I like that alternative perspective. It's maybe not wholly distinct from behavioral measurement or informant report, but it's more global, and not the usual angle on those.

Arnold said...

I was thinking a meter for "The Inaccuracy of All (moral measuring) Methods"...
...including for 'the most rare to the most common responses'...

like already on going data tracking; it is probably being tracked right now since its on our California driver license and noted legally in hospital death bed-occurrences...

Your moral 'meter posts' infer to me...philosophy/psychology/physiology to start tracking relationships of ethics morality emotions feelings conscience...
...to mind/body/self questioning...thanks

looking forward to more about "not the usual angle" to Chris...