Thursday, January 12, 2023

Further Methodological Troubles for the Moralometer

[This post draws on ideas developed in collaboration with psychologist Jessie Sun.]

If we want to study morality scientifically, we should want to measure it. Imagine trying to study temperature without a thermometer or weight without scales. Of course indirect measures are possible: We can't put a black hole on a scale, but we can measure how it bends the light that passes nearby and thereby infer its mass.

Last month, I raised a challenge for the possibility of developing a "moralometer" (a device that accurately measure's a person's overall morality). The challenge was this: Any moralometer would need to draw on one or more of four methods: self-report, informant report, behavioral measures, or physiological measures. Each one of these methods has serious shortcomings as a basis for general moral measurement of one's overall moral character.

This month, I raise a different (but partly overlapping) set of challenges, concerning how well we can specify the target we're aiming to measure.

Problems with Flexible Measures

Let's call a measure of overall morality flexible if it invites a respondent to apply their own conception of morality, in a flexible way. The respondent might be the target themselves (in self-report measures of morality) or they might be a peer, colleague, acquaintance, or family member of the target (in informant-report measures of morality). The most flexible measures apply "thin" moral concepts in Bernard Williams' sense -- prompts like "Overall, I am a morally good person" [responding on an agree/disagree scale] or "[the target person] behaves ethically".

While flexible measures avoid excessive rigidity and importing researchers' limited and possibly flawed understandings of morality into the rating procedure, the downsides are obvious if we consider how people with noxious worldviews might rate themselves and others. The notorious Nazi Adolf Eichmann, for example, appeared to have thought highly of his own moral character. Alexander "the Great" was admired for millennia, including as a moral exemplar of personal bravery and spreader of civilization, despite his main contribution being conquest through aggressive warfare, including the mass slaughter and enslavement of at least one civilian population.

I see four complications:

Relativism and Particularism. Metaethical moral relativists hold that different moral standards apply to different people or in different cultures. While I would reject extreme relativist views according to which genocide, for example, doesn't warrant universal condemnation, a moderate version of relativism has merit. Cultures might reasonably differ, for example, on the age of sexual consent, and cultures, subcultures, and social groups might reasonably differ in standards of generosity in sharing resources with neighbors and kin. If so, then flexible moralometers, employed by raters who use locally appropriate standards, will have an advantage over inflexible moralometers which might inappropriately import researchers' different standards. However, even flexible moralometers will fail in the face of relativism if they are employed by raters who employ the wrong moral standards.

According to moral particularism, morality isn't about applying consistent rules or following any specifiable code of behavior. Rather, what's morally good or bad, right or wrong, frequently depends on particular features of specific situations which cannot be fully codified in advance. While this isn't the same as relativism, it presents a similar methodological challenge: The farther the researcher or rater stands from the particular situation of the target, the more likely they are to apply inappropriate standards, since they are likely to be ignorant of relevant details. It seems reasonable to accept at least moderate particularism: The moral quality of telling a lie, stealing $20, or stopping to help a stranger, might often depend on fine details difficult to know from outside the situation.

If the most extreme forms of moral relativism or particularism (or moral skepticism) are true, then no moralometer could possibly work, since there won't be stable truths about people's morality, or the truths will be so complicated or situation dependent as to defy any practical attempt at measurement. Moderate relativism and particularism, if correct, provide reason to favor flexible standards as judged by self-ratings or the ratings of highly knowledgeable peers sensitive to relevant local details; but even in such cases all of the relevant adjustments might not be made.

Incommensurability. Goods are incommensurable if there is no fact of the matter about how they should be weighed against each other. Twenty dollar bills and ten dollar bills are commensurable: Two of the latter are worth exactly one of the former. But it's not clear how to weigh, for example, health against money or family versus career. In ethics, if Steven tells a lie in the morning and performs a kindness in the afternoon, how exactly ought these to be weighed against each other? If Tara is stingy but fair, is her overall moral character better, worse, or the same as that of Nicholle, who is generous but plays favorites? Combining different features of morality into a single overall score invites commensurability problems. Plausibly, there's no single determinately best weighting of different factors.

Again, I favor a moderate view. Probably in many cases there is no single best weighting. However, approximate judgments remain possible. Even if health and money can't be precisely weighed against each other, extreme cases permit straightforward decisions. Most of us would gladly accept a scratch on a finger for the sake of a million dollars and would gladly pay $10 to avoid stage IV cancer.  Similarly, Stalin was morally worse than Martin Luther King, even if Stalin had some virtues and King some vices. Severe sexual harassment of an employee is worse than fibbing to your spouse to get out of washing the dishes.

Moderate incommensurability limits the precision of any possible moralometer. Vices and virtues, and rights and wrongs of different types will be amenable only to rough comparison, not precise determination in a single common coin.

Moral error. If we let raters reach independent judgments about what is morally good or bad, right or wrong, they might simply get it wrong. As mentioned above, Eichmann appears to have thought well of himself, and the evidence suggests that he also regarded other Nazi leaders as morally excellent. Raters will disagree about the importance of purity norms (such as norms against sexual promiscuity), the badness of abortion, and the moral importance, or not, of being vegetarian. Bracketing relativism, then at least some of these raters must be factually mistaken about morality, on one side or another, adding substantial error into their ratings.

The error issue is enormously magnified if ordinary people's moral judgments are systematically mistaken. For example, if the philosophically discoverable moral truth is that the potential impact of your choices on future generations morally far outweighs the impact you have on the people around you (see my critiques of "longtermism" here and here), then the person who is an insufferable jerk to everyone around them but donates $5000 to an effective charity might be in fact far morally better than a personally kind and helpful person who donates nothing to charity -- but informants' ratings might very well suggest the reverse. Similar remarks would apply to any moral theory that is sharply at odds with commonsense moral intuition.

Evaluative bias. People are, of course, typically biased in their own favor. Most people (not all!) are reluctant to think of themselves as morally below average, as unkind, unfair, or callous, even if they in fact are. Social desirability bias is the well-known phenomenon that survey respondents will tend to respond to questions in a manner that presents them in a good light. Ratings of friends, family, and peers will also tend to be positively biased: People tend to view their friends and peers positively, and even when not they might be reluctant to "tell on" them to researchers. If the size of evaluative bias were consistent, it could be corrected for, but presumably it can vary considerably from case to case, introducing further noise.

Problems with Inflexible Measures

Given all these problems with flexible measures of morality, it might seem best to build our hypothetical moralometer instead around inflexible measures. Assuming physiological measures are unavailable, the most straightforward way to do this would be to employ researcher-chosen behavioral measures. We could try to measure someone's honesty by seeing whether they will cheat on a puzzle to earn more money in a laboratory setting. We could examine publicly available criminal records. We could see whether they are willing to donate a surprise bonus payment to a charity.

Unfortunately, inflexible measures don't fully escape the troubles that dog flexible measures, and they bring new troubles of their own.

Relativism and particularism. Inflexible measures probably aggravate the problems with relativism and particularism discussed above. With self-report and informant report, there's at least an opportunity for the self or the informant to take into account local standards and particulars of the situation. In contrast, inflexible measures will ordinarily be applied equally to all without adjustment for context. Suppose the measure is something like "gives a surprise bonus of $10 to charity". This might be a morally very different decision for a wealthy participant than for a needy participant. It might be a morally very different decision for a participant who would save that $10 to donate it to a different and maybe better charity than for a participant who would simply pocket the $10. But unless those other factors are being measured, as they normally would not be, they cannot be taken account of.

Incommensurability. Inflexible measures also won't avoid incommensurability problems. Suppose our moralometer includes one measure of honesty, one measure of generosity, and one measure of fairness. The default approach might be for a summary measure simply to average these three, but that might not accurately reflect morality: Maybe a small act of dishonesty in an experimental setting is far less morally important than a small act of unfairness in that same experimental setting. For example, getting an extra $1 from a researcher by lying in a task that transparently appears to demand a lie (and might even be best construed as a game in which telling untruths is just part of the task, in fact pleasing the researcher) might be approximately morally neutral while being unfair to a fellow participant in that same study might substantially hurt the other's feelings.

Sampling and ecological validity. As mentioned in my previous post on moralometers, fixed behavioral measures are also likely to have severe methodological problems concerning sampling and ecological validity. Any realistic behavioral measure is likely to capture only a small and perhaps unrepresentative part of anyone's behavior, and if it's conducted in a laboratory or experimental setting, behavior in that setting might not correlate well with behavior with real stakes in the real world. How much can we really infer about a person's overall moral character from the fact that they give their monetary bonus to charity or lie about a die roll in the lab?

Moral authority. By preferring a fixed measure, the experimenter or the designer of the moralometer takes upon themselves a certain kind of moral authority -- the authority to judge what is right and wrong, moral or immoral, in others' behavior. In some cases, as in the Eichmann case, this authority seems clearly preferable to deferring to the judgment of the target and their friends. But in other cases, it is a source of error -- since of course the experimenter or designer might be wrong about what is in fact morally good or bad.

Being wrong while taking up, at least implicitly, this mantle of moral authority has at least two features that potentially make it worse than the type of error that arises by wrongly deferring to mistaken raters. First, the error is guaranteed to be systematic. The same wrong standards will be applied to every case, rather than scattered in different (and perhaps partly canceling) directions as might be the case with rater error. And second, it risks a lack of respect: Others might reasonably object to being classified as "moral" or "immoral" by an alien set of standards devised by researchers and with which they disagree.

In Sum

The methodological problems with any potential moralometer are extremely daunting. As discussed in December, all moralometers must rely on some combination of self-report, informant report, behavioral measure, or physiological measure, and each of these methods has serious problems. Furthermore, as discussed today, a batch of issues around relativism, particularism, disagreement, incommensurability, error, and moral authority dog both flexible measures of morality (which rely on raters' judgments about what's good and bad) and inflexible measures (which rely on researchers' or designers' judgments).

Coming up... should we even want a moralometer if we could have one?  I discussed the desirability or undesirability of a perfect moralometer in December, but I want to think more carefully about the moral consequences of the more realistic case of an imperfect moralometer.


Howard said...

What is the ontological status of morality? Where does it exist? A comparison can be drawn to literature. Where does literature exist? In the reader, in a book, in ae ethereal plane?
Morality is a set of behaviors which share approval or disapproval. So it is a qualia that is a quality of behavior. It would in my guess have to be measured statistically- but how? Not sure.

Arnold said...

Wouldn't a morality meter need to reflect all of morality itself...

Not in separate quanta quantum experiences...
...but as it is; in transition to us...

That measuring morality would be good bad and between to observation...
...perhaps (my) understanding Value could be more then..

Howard said...

Hi Eric

You're measuring moral behavior akin to how psychologists measure traits; you make at least two assumptions: first that morality is measurable in behaviors, while that can be contentious and subjective and second that we can decide how people behave.
Perhaps we'd be more cautious when we look at the law. The law is a morality meter of sorts and it is very fraught and contentious.
We'd have to think this one through

Philosopher Eric said...

I must agree that the methodological troubles regarding the potential development of an effective moralometer are, to say the very least, extreme. This makes sense to me since I consider morality (as popularly conceived) to be a confused and anthropocentric notion. Beyond possibly sorting out the issues presented in this post in ways that make sense both in themselves and when combined together (whether this is done well or not), and then going on to decide the morality of having such a moralometer (oh the irony of that!), I’m going to propose a competing narrative. Questions and comments are of course welcome!

Back on New Year’s Day here I mentioned my belief that intrinsic value resides by means of sentience exclusively. So let’s run with this idea for a moment to see if our various moral notions may effectively be reduced back to human sentience. Observe that existence here would be completely valueless “personally” until the emergence of certain brains that create sentience. Thus not just robotic code based instruction (as in the case for GPT-3), but also something that could feel good to bad.

From this perspective consider the influence of evolution given the need for effective parenting. Certain kinds of parents should have gained a sensitivity to their offspring so that they’d feel good/bad based upon their perception of offspring welfare. Why? So they’d parent their young consciously rather than just algorithmically. Thus sentience as intrinsic value should explain the emergence of “care” in the animal kingdom, and not just for offspring since there should have been other uses for this trait as well.

Next consider a measure of personal social health among certain social creatures. Here perceptions of how a given creature is thought of socially should have made it feel good to bad in this regard. I’ll call this influence “respect”. In a given society this might be felt positively by individuals that were stronger or meaner than others, or sometimes even nicer given what specifically was adaptive for that type of creature.

It seems to me that the human institution of morality may effectively be reduced back to these two traits. This is to say that rightness and wrongness do not exist in themselves, and thus the search for an effective moralometer should never bear much fruit. We should however be able to develop effective ways to measure sentience as intrinsic value. Once scientists begin determining the brain physics by which sentience emerges, truly objective ways to measure this should be found. Furthermore because reality itself can be repugnant, I presume that scientists here will discover things that we’d rather not be true. Thus I suspect that some philosophers will continue trying to understand what’s moral, and even when they do so in vein.

Eric Schwitzgebel said...

Thanks for the comments, folks! I agree that the questions here aren't fully separable from metaethical questions concerning the nature of morality itself. My own view isn't *that* far from P Eric's, I think. Although I'm a moral realist in a certain sense -- I think that there really are moral facts -- those facts are grounded in fact about patterns of human reaction which are in turn grounded in evolutionary and social history (compare facts about like "cherries are sweet"). Actually, that somewhat simplifies because in truth I'd ground the facts more broadly than that, in terms of what intelligent, long-lived social creatures (rather than just humans in particular) would tend to value given conditions favorable to patient reflection. A meter that accurately tracks *that* -- well, of course that's going to be complicated-to-impossible.

Philosopher Eric said...

It’s good to hear that you consider us relatively aligned on this professor. I think so too. In fact for my part I recently stopped calling myself a moral anti realist (or actually any kind of realist to anti realist). My broader position of nominalism seems to render this question moot. I seek inclusion here — as you suggest there should be various sensible ways to define the term “moral”.

I’ll now go a bit further in case I can interest you or others with a far more ambitious project.

I believe that science today suffers because it has no broad founding principles of metaphysics, epistemology, or axiology from which to work. But I’m also sensitive to the concern that science not be permitted expand where it has no business expanding, commonly referred to as “scientism”. Here the ancient institution of philosophy could continue to be celebrated for its indeterminacy as an art for patrons to potentially appreciate. Then I’d like a new society of professionals to emerge which has the singular purpose of developing various effective and agreed upon metaphysical, epistemological, and axiological principles from which to found science more effectively than it has been founded so far. In order to help leave philosophy relatively undisturbed these professionals might be referred to as “meta scientists”.

Furthermore I suspect that this community would eventually agree that intrinsic value exists as sentience itself. Thus philosophers could continue dickering with moral issues just as they did in the days Socrates. Conversely this new community would explore the value of existing itself. Here I’d expect the still quite soft science of psychology to then harden up by adopting same utility based premise used by the relatively successful behavioral science of economics.

Chris said...
This comment has been removed by the author.
Chris said...

I'm late to this thread and probably missed the boat on this discussion, but there doesn't seem to be any engagement at the new Substack so I thought maybe I should just post here? Not sure what your preference is.
Along with the question you raised at the very end about whether we should even want such an moralmeter and possible risks, and other metaethical considerations mentioned, I'm wondering what exactly our goal would be (morally or otherwise) in trying to measure morality in this way. What do we plan to do with these measurements? If they are solely quantitative (e.g. amount or degree of moral character), why and when do we care *how much* someone is moral in general as opposed to why, or when, or how they managed to be moral this time but not another time, or what it felt like to interact with them? Where do qualitative measures fit in here? Even in a formal setting like a court of law where we are determining someone's degree of culpability, we need very local and contextualized understanding of how their personal character bears on the situation.
The fact that resolving all these contradictions and issues appears so insurmountable is for me a red flag that perhaps it's not meant to be measured at all, if this means amount and combining such different contexts. As an alternative to "measure," I volunteer "judgment" and "evaluation" as terms that would be more appropriate to the moral nature of the thing we want to know and decide more about. The biggest problem I see is that we're already assessing others' morality all the time, and in doing so our purpose isn't simply to know how moral someone is for the sake of knowledge or some practical end. It's a mode of self-expression and self-affirmation, as we apply our own values, judgment. personal agency. even identity. This may lead to all sort of problems like self-deception and polarization, but the activity of deciding how we feel about someone is very precious to us, and people aren't going to want to outsource that to a methodological instrument even if it could somehow be empirically proven to be more accurate than individual and collective determinations. It seems to me a major purpose of assessing morality is not to get it right, but to go through the experience of figuring out to what how, why, in which ways and to what extent we think someone is good based on any number of situated contexts, experiences, testimony etc. And then, to see how this bears on other aspects of our life or development.
So I'm suggesting the need to scrutinize not just the risks and flaws of a moralmeter, but what it buys us over and above what we're already doing ourselves. Maybe you could even call this a virtue ethics perspective on the limitations of measuring character? Or a transformative experience perspective? I'm not sure, just brainstorming here.

chinaphil said...

I wonder if the analogy with scientific measurement can take us any further. All of the problems you describe with both the operationalization of the concept of morality, and its measurement, suggest that maybe morality isn't a thing that can be measured directly. I don't think that means it's necessarily an unscientific concept. But if we assume that morality *can be* a scientific concept, then the fact that it doesn't seem possible to measure it puts some limits on what *kind of* a scientific concept it is. I.e. it's not a basic quantity; it's some higher-level theoretical construct.
So the next question might be: if morality itself isn't a directly measurable basic quantity, then what measurements might be a part of the science of morality? And what kind of theoretical construct is it?
Hmm. I was hoping that the scientific angle might inspire some new thoughts, but I feel like I'm actually just reinventing the wheel of metaethics... I'll keep mulling.

Arnold said... humankinds struggle for Value...
Are Mr. Einstein's "Now" and Mr. Freud's "Here" becoming "Now-Here" everyday experiences...