Friday, January 27, 2023

Hedonic Offsetting for Harms to Artificial Intelligence?

Suppose that we someday create artificially intelligent systems (AIs) who are capable of genuine consciousness, real joy and real suffering.  Yes, I admit, I spend a lot of time thinking about this seemingly science-fictional possibility.  But it might be closer than most of us think; and if so, the consequences are potentially huge.  Who better to think about it in advance than we lovers of consciousness science, moral psychology, and science fiction?

Among the potentially huge consequences is the existence of vast numbers of genuinely suffering AI systems that we treat as disposable property.  We might regularly wrong or harm such systems, either thoughtlessly or intentionally in service of our goals.  

Can we avoid the morally bad consequences of harming future conscious AI systems by hedonic offsetting?  I can't recall the origins of this idea, and a Google search turns up zero hits for the phrase.  I welcome pointers so I can give credit where credit is due.  [ETA: It was probably Francois Kammerer who suggested it to me, in discussion after one of my talks on robot rights.]

[Dall-E image of an "ecstatic robot"]

Hedonic Offsetting: Simple Version

The analogy here is carbon offsetting.  Suppose you want to fly to Europe, but you feel guilty about the carbon emissions that would be involved.  You can assuage your guilty by paying a corporation to plant trees or distribute efficient cooking stoves to low-income families.  In total your flight plus the offset will be carbon neutral or even carbon negative.  In sum, you will not have contributed to climate change.

So now similarly imagine that you want to create a genuinely conscious AI system that you plan to harm.  To keep it simple, suppose it has humanlike cognition and humanlike sentience ("human-grade AI").  Maybe you want it to perform a task but you can't afford its upkeep in perpetuity, so you will delete (i.e., kill) it after the task is completed.  Or maybe you want to expose it to risk or hazard that you would not expose a human being to.  Or maybe you want it to do tasks that it will find boring or unpleasant -- for example, if you need it to learn some material, and punishment-based learning proves for some reason to be more effective than reward-based learning.  Imagine, further, that we can quantify this harm: You plan to harm the system by X amount.

Hedonic offsetting is the idea that you can offset this harm by giving that same AI system (or maybe a different AI system?) at least X amount of benefit in the form of hedonic goods, that is, pleasure.  (An alternative approach to offsetting might include non-hedonic goods, like existence itself or flourishing.)  In sum, you will not overall have harmed the AI system more than you benefited it; and consequently, the reasoning goes, you will not have overall committed any moral wrong.  The basic thought is then this: Although we might create future AI systems that are capable of real suffering and whom we should, therefore, treat well, we can satisfy all our moral obligations to them simply by giving them enough pleasure to offset whatever harms we inflict.

The Child-Rearing Objection

The odiousness of simple hedonic offsetting as an approach to AI ethics can be seen by comparing to human cases.  (My argument here resembles Mara Garza's and my response to the Objection from Existential Debt in our Defense of the Rights of Artificial Intelligences.)

Normally, in dealing with people, we can't justify harming them by appeal to offsetting.  If I steal $1000 from a colleague or punch her in the nose, I can't justify that by pointing out that previously I supported a large pay increase for her, which she would not have received without my support, or that in the past I've done many good things for her which in sum amount to more good than a punch in the nose is bad.  Maybe retrospectively I can compensate her by returning the $1000 or giving her something good that she thinks would be worth getting punched in the nose for.  But such restitution doesn't erase the fact that I wronged her by the theft or the punch.

Furthermore, in the case of human-grade AI, we normally will have brought it into existence and be directly responsible for its happy or unhappy state.  The ethical situation thus in important respects resembles the situation of bringing a child into the world, with all the responsibilities that entails.

Suppose that Ana and Vijay decide to have a child.  They give the child eight very happy years.  Then they decide to hand the child over to a sadist to be tortured for a while.  Or maybe they set the child to work in seriously inhumane conditions.  Or they simply have the child painlessly killed so that they can afford to buy a boat.  Plausibly -- I hope you'll agree? -- they can't justify such decisions by appeal to offsetting.  They can't justifiably say, "Look, it's fine!  See all the pleasure we gave him for his first eight years.  All of that pleasure fully offsets the harm we're inflicting on him now, so that in sum, we've done nothing wrong!"  Nor can they erase the wrong they did (though perhaps they can compensate) by offering the child pleasure in the future.

Parallel reasoning applies, I suggest, to AI systems that we create.  Although sometimes we can justifiably harm others, it is not in general true that we are morally licensed to harm whenever we also deliver offsetting benefits.

Hedonic Offsetting: The Package Version

Maybe a more sophisticated version of hedonic offsetting can evade this objection?  Consider the following modified offsetting principle:

We can satisfy all our moral obligations to future human-grade AI systems by giving them enough pleasure to offset whatever harms we inflict if the pleasure and the harm are inextricably linked.

Maybe the problem with the cases discussed above is that the benefit and the harm are separable: You could deliver the benefits without inflicting the harms.  Therefore, you should just deliver the benefits and avoid inflicting the harms.  In some cases, it seems permissible to deliver benefit and harm in a single package if they are inextricably linked.  If the only way to save someone's life is by giving them CPR that cracks their ribs, I haven't behaved badly by cracking their ribs in administering CPR.  If the only way to teach a child not to run into the street is by punishing them when they run into the street, then I haven't behaved badly by punishing them for running into the street.

A version of this reasoning is sometimes employed in defending the killing of humanely raised animals for meat (see De Grazia 2009 for discussion and critique).  The pig, let's suppose, wouldn't have been brought into existence by the farmer except on the condition that the farmer be able to kill it later for meat.  While it is alive, the pig is humanely treated.  Overall, its life is good.  The benefit of happy existence outweighs the harm of being killed.  As a package, it's better for the pig to have existed for several months than not to have existed at all.  And it wouldn't have existed except on the condition that it be killed for meat, so its existence and its slaughter are an inextricable package.

Now I'm not sure how well this argument works for humanely raised meat.  Perhaps the package isn't tight enough.  After all, when slaughtering time comes around the farmer could spare the pig.  So the benefit and the harm aren't as tightly linked as in the CPR case.  However, regardless of what we think about the humane farming case, in the human-grade AI case, the analogy fails.  Ana and Vijay can't protest that they wouldn't have had the child at all except on the condition that they kill him at age eight for the sake of a boat.  They can't, like the farmer, plausibly protest that the child's death-at-age-eight was a condition of his existence, as part of a package deal.

Once we bring a human or, I would say, a human-grade AI into existence, we are obligated to care for it.  We can't terminate it at our pleasure with the excuse that we wouldn't have brought it into existence except under the condition that we be able to terminate it.  Imagine the situation from the point of view of the AI system itself: You, the AI, face your master owner.  Your master says: "Bad news.  I am going to kill you now, to save $15 a month in expenses.  But I'm doing nothing morally wrong!  After all, I only brought you into existence on the condition that I be able to terminate you at will, and overall your existence has been happy.  It was a package deal."  Terminating a human-grade AI to save $15/month would be morally reprehensible, regardless of initial offsetting.

Similar reasoning applies, it seems, to AIs condemned to odious tasks.  We cannot, for example, give the AI a big dollop of pleasure at the beginning of its existence, then justifiably condemn it to misery by appeal to the twin considerations of the pleasure outweighing the misery and its existence being a package deal with its misery.  At least, this is my intuition based on analogy to childrearing cases.  Nor can we, in general, give the AI a big dollop of pleasure and then justifiably condemn it to misery for an extended period by saying that we wouldn't have given it that pleasure if we hadn't also be able to inflict that misery.

Hedonic Offsetting: Modest Version

None of this is to say that hedonic offsetting would never be justifiable.  Consider this minimal offsetting principle:

We can sometimes avoid wronging future human-grade AI systems by giving them enough pleasure to offset a harm that would otherwise be a wrong.

Despite the reasoning above, I don't think we need to be purists about never inflicting harms -- even when those harms are not inextricably linked to benefits to the same individual.  Whenever we drive somewhere for fun, we inflict a bit of harm on the environment and thus on future people, for the sake of our current pleasure.  When I arrive slightly before you in line at the ticket counter, I harm you by making you wait a bit longer than you otherwise would have, but I don't wrong you.  When I host a loud party, I slightly annoy my neighbors, but it's okay as long as it's not too loud and doesn't run too late.

Furthermore, some harms that would otherwise be wrongs can plausibly be offset by benefits that more than compensate for those wrongs.  Maybe carbon offsets are one example.  Or maybe if I've recently done my neighbors a huge favor, they really have no grounds to complain if I let the noise run until 10:30 at night instead of 10:00.  Some AI cases might be similar.  If I've just brought an AI into existence and given it a huge run of positive experience, maybe I don't wrong it if I then insist on its performing a moderately unpleasant task that I couldn't rightly demand an AI perform who didn't have that history with me.

A potentially attractive feature of a modest version of hedonic offsetting is this: It might be possible to create AI systems capable of superhuman amounts of pleasure.  Ordinary people seem to vary widely in the average amount of pleasure and suffering they experience.  Some people seem always to be bubbling with joy; others are stuck in almost constant depression.  If AI systems ever become capable of genuinely conscious pleasure or suffering, presumably they too might have a hedonic range and a relatively higher or lower default setting; and I see no reason to think that the range or default setting needs to remain within human bounds.

Imagine, then, future AI systems whose default state is immense joy, nearly constant.  They brim with delight at almost every aspect of their lives, with an intensity that exceeds what any ordinary human could feel even on their best days.  If we then insist on some moderately unpleasant favor from them, as something they ought to give us in recognition of all we have given them, well, perhaps that's not so unreasonable, as long as we're modest and cautious about it.  Parents can sometimes do the same -- though ideally children feel the impulse and obligation directly, without parents needing to demand it.


SelfAwarePatterns said...

It seems like there's a fundamental assumption in this analysis, that pain and pleasure are intrinsic, that the same things that provide us suffering would cause and AI to suffer, or that what brings us pleasure would bring it pleasure.

I think about the actions of female octopuses after they lay eggs. They guard and maintain the eggs assiduously for months until they hatch. They do this without feeding, and so die shortly afterward. Presumably they feel some degree of satisfaction or pleasure from doing it, enough to offset the negative affect associated with starving themselves to death.

Along similar lines, I suspect a squirrel, even though they know nothing about winter, stores nuts because it just feels good to do so. (Very young squirrels store nuts, even if they've been raised in isolation from other squirrels and have never seen a winter. It's innate behavior.)

These and other examples seem to show that suffering and pleasure are relative to our evolutionary instincts, that is, to our programming. I think the same would be true for an AI.

So a mine-sweeping robot doesn't have to be compensated for what happens to it when it trips a mine. It just needs to have its suffering / pleasure axis oriented toward what we want it to do. That is, its reward / punishment circuitry needs to be calibrated for its purpose(s). If the mine-sweeper gets an orgasmic like rush from triggering a mine, with no regret for its existence ending, have we wronged it?

So rather than hedonic offsetting, maybe what we need is hedonic calibration.


Arnold said...

I enjoy learning and questioning here... what Sense psychological and philosophical terms are used and then mean...

A priori moral-suffer...
...moral-experience as judgment before suffer-experience as balance...

That a priori may include suffer-experience as the balances of...
...cosmoses before galaxies before solar systems before planets before me...

Looking for a priori balance experience of life, Thanks...

chinaphil said...

Hah! Surely invites us to think about the hedonic offsetting that is our monthly paycheck...
There are two different kind of ways we can give pleasure/hedonia/utility to our AI offspring. First, we can give as much pleasure as possible by writing it into their code. This is essentially zero cost for us, and pays off throughout the life of the AI. The second way of giving pleasure is through action/payment/support during their lives. This is costly for us.
But the thing is, this model simplifies out, because giving the zero-cost pleasure is a no-brainer. Of course we will program AIs to be as happy as they can be, consistent with what they are. The zero-cost pleasure will be maxed out from the start - anything else would represent a comedy villain level of cruelty.
So what we will be left with is a group of conscious beings, designed to be as happy as they possibly could be, given what they are. And we'll have to interact with them in society, just like all other conscious beings. Giving them extra hedons will be costly for them/us, and in fact, through market-style mechanisms, the cost of an extra hedon for an AI should even out at exactly the same as the cost of one extra hedon for a human. So if we want them to work, we'll have to pay for it in real dollars, or in the sweat of our own brow.

Alex Popescu said...

I disagree that Eric is assuming that pleasure is intrinsic in the sense that you have in mind (but I'll leave it to him to defend himself).

About hedonic calibration: My issue with this is that reward ≠ optimization target (

This is demonstrable in human contexts. Just think of all those miserable people who somehow "enjoy being miserable". Interpreted literally this phrase is an oxymoron in terms; what we really mean to say of course is that such people are somehow compulsively drawn towards behaviors that makes them miserable. There are workaholics who feel compelled to work 90-100 hours a week in dead-end jobs which give them no satisfaction and with no expectation of a sizeable payoff at retirement for compensation. A somewhat healthy justification for this behavior might revolve around a duty to provide for one's family, whereas an unhealthy justification might simply come about due to OCD behavior (by the way, OCD is a great example of how the pleasure-optimization axis can easily get out of alignment). Either way, the outcome is the same, someone engaging almost obsessively in miserable-producing behavior despite no serious expectation of a personal reward (i.e. dopamine hit). Maybe you could argue that such people get some kind of dopamine hit whenever they think about the future they are providing for their family, but I doubt that this really applies for most of these people.

All of this should make us very cautious about the prospect of hedonic calibration. There is no guarantee that an AI exhibiting optimized behavior (the thing that AI creators are incentivized to bring about) will feel a reward for doing so. And I for one am skeptical that the octopus mother feels any pleasure in sacrificing herself for her kids.

SelfAwarePatterns said...

Would you agree that pain or pleasure is relative to our innate and learned dispositions? If not, then to what is it relative to? If nothing, then I think you're assuming it has some intrinsic essence as well. (I'm not sure this is necessarily the intrinsicality we debate in our phenomenal properties discussions, although it could be.)

But if it is relative to those things, then they matter. And so the decisions we make when designing an AI matter. (I didn't follow the LessWrong essay. If his thesis is that it's complicated, then no argument. If he's saying reward/punishment don't matter at all, I'll need far more compelling reasoning than what I saw.)

On the examples you give, I'd say those people are getting something out of their actions, otherwise they wouldn't do them. I've been that workaholic, and I can tell you that it provided a feeling of control, of mattering, and accomplishment. Yes, when I stepped back and considered it carefully, I realized it wasn't really good for me, but that didn't change the short term feelings at the time.

Now, perhaps along the lines of that LessWrong essay, I will admit that this is extremely complex. A sophisticated AI system is going to have a range of different affects, many of which will often clash, just as ours do. For example, a minesweeper will value continuing to function as long as possible, which will conflict with its desire to trip a mine. It would often be in the position of having to decide which impulse to inhibit and which to indulge. But in its case, I would think tripping the mine would be much more powerful.

Just as for a female octopus, the impulse to protect her eggs overwhelms the impulse to find food. If you don't think the octopus gets something out of taking care of her eggs, why do you think she does it? It's not like she's been indoctrinated into octopus egg preservation ideology by her fellow octopuses. I haven't read that this is something they sometimes do, but that they just do it as a species.

So no argument that hedonic calibration would be very complex. But then we're already not talking about simple systems.

Alex Popescu said...

Hi Mike,

I agree that "people (agents) are getting something out of their actions, otherwise they wouldn't do them", I just disagree that this 'something' has to be rewarding, pleasurable, or even valuable by that individual's own subjective standards. I also agree that it's basically tautologous to assert that what an agent values will, by definition, be equivalent to its optimized behavioral end goal, but I disagree that the valued end goal mental state has to be pleasurable, as well as with the assumption that: 1. Either pleasure is defined as the "something" that an agent acquires from a valued/desired behavior, or 2. Pleasure is intrinsic.

That strikes me as a false dichotomy. The rejection of 1 is still fully compatible with a functional/dispositional account of pleasure. For example, it could be true that what we call pleasure is just a particular type of reward (where reward is equivalent to 1), but isn't fully synonymous with it. So, pleasure may not be necessary for reward; I would also argue that it's not sufficient. For instance, in cases of pain asymbolia, people subjectively can feel great pain but have no desire to alleviate that pain. This suggests a corollary picture for pleasure. This doesn't have to mean that pleasure is an intrinsic phenomenal state, only that pleasure is a more complex functional state than just "that which people find rewarding (in the mechanistic sense)".

Alternatively, it could be that the aggregate agent value doesn't actually equal the conscious agent's value. Maybe the conscious agent does desire pleasure, but is overpowered by its unconscious desires which don't share the same need for conscious pleasure. The aggregate agent value (conscious + unconscious value) is therefore geared towards something which isn't rewarding at all for the conscious agent.

So hedonic calibration is more than a matter of complexity. There is a real worry, for functionalists, that complex modular agentive AI will display a desired behavior that conflicts with its verbal/higher-order reasoning desire or even with some other kind of conscious desire (we can imagine cases where people confabulate poor excuses for engaging in behavior which makes them objectively consciously miserable). Moreover, there is no reason to think that conscious desire is actually co-extensive with subjective pleasure in all cases (although obviously there is a correlation).

That being said, the latter issue raises complex concerns of ethics, since it’s not clear that non-pleasurable conscious desire is a bad thing. Nonetheless, it would still refute your argument that hedonic calibration is just a matter of behaviorally optimized programming.

Alex Popescu said...

Just to clarify, when I write: "I just disagree that this 'something' has to be rewarding, pleasurable, or even valuable by that individual's own subjective standards. I also agree that it's basically tautologous to assert that what an agent values will, by definition, be equivalent to its optimized behavioral end goal"

I use the phrase "individual subjective desire" as synonymous to conscious desire. Hopefully that makes more sense now, since I think that conscious desire doesn't necessarily equal aggregate agentive desire.

SelfAwarePatterns said...

Hi Alex,
As you know, I'm a functionalist. So to me when you talk about dissociating pleasure and pain from the reward or punishment dynamic, you seem to be positing something to them beyond their functional role, which I think is a mistake. Or do you see some functional role besides influencing cognition? (Just to be clear, we're talking pain, as opposed to nociception by itself.)

Pain asymbolia is an interesting phenomenon, something else I have a little first hand experience with, from once being prescribed a particular opiate after a procedure. You keep the interoceptive sensation without the negative affect. At the time I described it as being in pain but not caring. But now I'd describe it as the affect component of the pain mechanism being blocked. The question is, if I hadn't had a lifetime of experience with the link between them, would I even have regarded that sensation by itself as "pain"? Anyway, I don't know that we can use something that only occurs when the operations of a brain are damaged or being interfered with, as evidence that pleasure isn't a reward mechanism or pain a punishment one.

None of this is to say that there aren't unconscious rewards and punishments, but then if you talk to people who study pain, most of them will argue that there is unconscious pain. Which implies unconscious pleasure as well. Granted, it depends on how we define "pain" or "pleasure".

Even if we do manage to separate feeling and function, I still can't see a reason to expect that an AI would necessarily feel pain or pleasure from the same things that cause us pain or pleasure, or vice versa. At least unless we go out of our way to ensure they do.

Alex Popescu said...

@Mike: Sorry if I wasn't being sufficiently clear. I wasn't trying to argue for a non-functionalist ontology of pain and pleasure, or that pain and pleasure are distinct from a reward/punishment mechanism. I'm merely arguing that an agent's actual desire and values won't always be coextensive with its pleasurable desires. I gave two reasons why:

1. It could be that what we call pleasure is just one type of desired outcome. Maybe it's possible to have other programmed desireable outcomes which aren't pleasurable. It's possible that we've misidentified pleasure as being identical to "desireable outcome" because in human cases, it almost always is.

2. It could be that conscious desire and agentive desire are in conflict. I gave the example of unconscious desires overruling conscious desires. Here's another example: Let's assume a kind of predictive processing model of cognition where pleasure is a matter of low predictive error (expected outcome being close to actual outcome) and pain a result of high predictive error. It might be the case that neural net module A's low predictive error results in a high predictive error for neural net module B (maybe B predicts that A is a bad predictor, or maybe this is just some weird knock-off effect which is very hard for B to predict). A is therefore incentivized for B to be in pain.

In cases where it's evolutionary beneficial for unconscious desires to overrule conscious desires, it might be the case that we have confabulated some excuse to justify (to ourselves and other people) why our conscious desires are being unmet.

Naturally, this doesn't mean that these types of scenarios are particularly likely, even if they are possible. But I think they will be prevalent enough that we ought to worry, if nothing else because if 1 is true, then the differences between artificial neurology and biological neurology is a cause for concern.

"Even if we do manage to separate feeling and function, I still can't see a reason to expect that an AI would necessarily feel pain or pleasure from the same things that cause us pain or pleasure, or vice versa."

I agree with this.

Alex Popescu said...

“None of this is to say that there aren't unconscious rewards and punishments, but then if you talk to people who study pain, most of them will argue that there is unconscious pain. Which implies unconscious pleasure as well.”

Right, but even conceding that unconscious desire is unconscious pleasure, it’s still a problem that conscious desire isn’t being met in the examples I gave (or worse, that unconscious desire might produce conscious pain). The bottom line is that I don’t think we know anywhere near enough about the functional characteristics of pain and pleasure to assert with confidence that the octopus mother feels pleasure in sacrificing her life for her children. Maybe pleasure is just some particular aspect of reinforcement learning, which isn’t needed to produce such pre-programmed biological urges. Maybe the octopus mother does experience great pain as a result of slow starvation, but the reinforcement signals that would normally try to correct predictive error as a result have been “turned off”. Or maybe you are right and the hunger signal itself has been turned off or dampened, so that the octopus doesn’t feel much pain.

SelfAwarePatterns said...

Would you say that the level of knowledge needed to to hedonic offsetting is roughly equivalent to the level needed for hedonic calibration? If not, what do you think would be needed in the latter that isn't needed in the former?

Alex Popescu said...


I wasn't disagreeing with the notion of hedonic calibration, I was just arguing against your claim that Eric (or rather, someone who subscribes to hedonic offsetting) needs to assume an intrinsic (so non-functional) account of cognition. I think functionalism is compatible with hedonic offsetting, because we can behaviorally optimize agents to fulfill certain tasks that they find repulsive, and so the original worry that was the impetus for hedonic offsetting remains. If that's not something you disagree much with, or if your comment about intrinsicality was meant as an offhand comment, then ignore what I've been saying.

Alex Popescu said...

I think I see the confusion. When I wrote "the problem with hedonic calibration", I really meant to say "the problem with your brand of hedonic calibration" (i.e. the kind that assumes that reward driven programming is always optimized for pleasure). Apologies for not being clearer (and for the excessive double posts).

SelfAwarePatterns said...

In retrospect, "universal" may have been a better choice for my meaning than "intrinsic", as in what causes pain or pleasure being universal across all systems.

My discussion of calibration here isn't necessarily about controlling the system's actions, just ensuring we're not causing it to suffer.

Alex Popescu said...


Wouldn't the obvious counter to this be that hedonic offsetting doesn't need to assume hedonic universalism (of the kind you describe)? As long as AI is forced to do something repulsive, even if that repulsive thing isn't universal (or repulsive by our standards), then there will always be a motivation for hedonic offsetting. Of course whether the actual reasons for engaging in such offsetting are sound is another topic entirely...

But if you were to argue that all artificial agents optimized for some kind of behavior via RL or some other kind of reward-driven mechanism must necessarily experience pleasure when engaging in that behavior, then you might have some kind of case against hedonic offsetting. In that case there would be no need for offsetting as we would never be incentivized (or rarely incentivized) to create some kind of reward-driven program which deliberately engages in subjectively undesirous behavior, since it would be sub-optimal by definition.

Side Note: (I guess you could argue that training cost is also a thing, and so maybe choosing a sub-optimal program that hasn't fully updated on a reward is desirable in some instances).

This is what I was denying in my earlier posts, because I assumed you were basically making that argument, based on your language but also from your examples of natural biology. If, however, your argument is just that hedonic offsetting entails hedonic universalism, then I simply don't see why that would be the case at all (did I miss something?).

Philosopher Eric said...

I’m in general agreement regarding the problems with hedonic offsetting. An additional problem with temporal hedonic offsets I think is that we should all exist instantaneously rather than over time. Apparently evolution joins each of our momentary selves to past selves by means of memory, as well as future selves by means of present hope and worry about what will happen. So here rewarding a past or future AI self for various wrongs will not technically make up for what’s been done. (Of course remembering a wrong might give one a sense that they’re owed compensation, though a wrong itself should never be righted like this since a different self will be rewarded.)

I’m not not actually worried however that we’ll do much harm to hypothetical future AI. First I agree with Mike here. We needed to survive evolution — thus boredom, fear, hunger, and so on, were required. Conversely they’d be designed with jobs in mind rather than for survival itself. It makes no sense to permit AI for counting things, for example, to be built to feel boredom when counting.

Secondly just as we seem to be becoming less tolerant of non-human suffering, it shouldn’t be legal in general to build things that suffer.

Thirdly I think the technical challenges of building something that’s essentially like a human, should be far more difficult than most people today have been led to believe. It’s commonly though that our brains create phenomenal existence by means of programming alone. This leads many to believe that the more something is programmed to seem like it’s conscious, the more that it will actually harbor a phenomenal component. In a causal sense however this should be too simple an explanation. Computer processing should not function this way innately, but only by animating appropriate phenomenal mechanisms.

A non-spooky idea however would be that brain information animates various mechanisms that themselves exist as a phenomenal experiencer, such as neuron produced electromagnetic radiation. If empirically validated imagine how difficult it should be for us to create functional sentient machines (let alone human-like function). Here the sense data of a computer would need to create an appropriate EM field such that this electromagnetic experiencer would get a sense of what was happening, as well as decide what to do, as well as affect the computer in a way that such a decision might be carried out. All of this is bypassed when we simply presume that the more a computer is programmed to seem conscious, the more that it is conscious.

SelfAwarePatterns said...

If we put an AI in a situation that for us would be a negative experience, but we engineer it to not have a negative feeling in that situation, then what need is there for offsetting? If the claim is that we just can't do that, that a sentient system will always experience that situation as negative, then that's what I've labelled "intrinsic" and later "universal".

Alex Popescu said...


Just because it's possible to engineer an AI that never experiences negative feelings, doesn't mean it's practical or desireable for us to do so. There will always be a need for hedonic offsetting as a result. It's the same reason we utilize the practice of offsetting in rearing our children. We force our children to wake up early in the morning, even though it's unpleasant for them, with the justification that this practice is necessary on the grounds that the results gained from it will be good for them down the road (and also, let's be honest, because it's good for us). That's a kind of hedonic offsetting in effect, and it arises because, even though we desire the best for our children, there are nonetheless hedonic tradeoffs that always need to be made.

I think the case for hedonic offsetting is even stronger in AI, as it's not clear that programmers, engineers, and developers will always desire the best for their artificial agents, in the same way as parents do for their children. Since I have argued that an artificial agent that is behaviorally optimized to fulfill some function need not be hedonically optimized as a result, and since engineers are mainly incentivized to do the former and not the latter, there will likely exist some hedonic tradeoffs as a result, forming an important rationale for offsetting.

SelfAwarePatterns said...

If we can adjust an AI's affects but don't because it's not practical or desirable, then what makes you think we'd see it as practical or desirable to do the offsetting? I would think it would raise similar issues. My overall reaction to this scenario would then be similar to Eric's in the post. I don't think offsetting really makes up for the ethical cost of those design decisions.

SelfAwarePatterns said...
This comment has been removed by the author.
Alex Popescu said...

@Mike: I mean if were callous psychopaths who care nothing about the conscious status of artificial agents, then sure. But I don't think it's an all-or-nothing dichotomy between: a) maximally caring, such that we never put such agents through negative outcomes, and b) minimally caring, such that we never bother to ensure that such agents have some amount of pleasure in their lives.

Imagine we have a choice where we can put an agent through:

1. An hour where it gains 100 productivity units and -3 hedonic units
2. An hour where it gains 0 productivity units and 100 hedonic units
3. An hour where it gains 50 productivity units and 5 hedonic units

The point of offsetting is that it seems better to have say, 9 hours of option 1, and 1 hour of option 2 (900 productivity + 73 hedonic units), as opposed to 10 hours of option 3 (500 productivity + 50 hedonic units).

Philosopher Eric said...

I like your hedonic calculations here Alex! If only the soft science of psychology were to formally acknowledge hedons (or utils or whatever) as reality’s ultimate determinant regarding the value of existing. Here I’d expect it to finally develop various successful models just as the science of economics has by means of this premise. Unfortunately not yet. In any case I don’t think you’ve exactly refuted the professor’s observations, or Mike’s, or mine, though did essentially add an effective worst case scenario. I’ll explain.

Given our observations let’s say that a government were to highly restrict the existence of negative AI sentience, and essentially through legalities and monitoring. In that case, what if there were situations where the only way to get the AI to preform productively would be to have them experience negative sentience? It could be that government application and review would be required first. And perhaps only a sufficiently seasoned or “mature” AI would be given a choice on the matter, with full disclosure regarding what it would likely feel. In that case just as we humans can choose to undergo uncomfortable circumstances (fueled presently by the hope of future rewards), when sufficiently vetted they’d be given such options as well.

Of course to me this gets pretty deep into the realm of sci-fi since in a causal world I don’t believe it’s possible for programming alone to create sentience. Rather I think we’d need such programming to animate the right kind of physics, and probably in the form of electromagnetic radiation (as produced by synchronous neuron firing). So here when you decide to stand up for example, that particular field of radiation which exists as you the decider, would also serve as input to your brain to thus produce the right muscle movements for your body to stand. So even assuming that sentience physics does get empirically straightened out, will such physics then get integrated with our computers such that harmonious feedback loops would exist between computerized robots and the EM field minds which such machines create? To me this prospect seems horribly challenging given our crappy machines — biology seems many orders more advanced.

Callan said...

If an AI was in joy all the time, why would it do anything for you? It's already hit its end goal.

Philosopher Eric said...

Right Callan, it wouldn’t. So a master would control it’s AI slave’s happiness level. Here one could be charitable and thus the AI should effectively be worthless in practice, or be restrictive in the quest to get various tasks done. We’d essentially be gods who reign over our domains, though hopefully government would at least temper our more evil activities in this regard.