AI welfare - the good, the bad, the predictable, the ugly

(All screenshots as at 27/4/25) This is inspired by this tweet: %% https://x.com/pfau/status/1915714787637371204 %% ![[AI welfare - the good, the bad, the predictable, the ugly-20250427124352679.png]] # My take on AI consciousness and/or welfare I replied: ![[AI welfare - the good, the bad, the predictable, the ugly-20250427124543955.png]] So let me lay out where I stand on this. I think it's entirely *possible* that there will be AI models that are moral patients. There's a few stops on the "crazy train" on this one: - If something is a moral patient, we should care about its welfare - If something is conscious, that's sufficient for moral patienthood - AIs may eventually be conscious I think there are arguments in both directions at most steps. It seems like some people heavily, heavily discount the welfare of some moral patients - most notably, animals. It seems possible that people could end up agreeing that AIs are moral patients, but we still shouldn't care. It seems like it should be basically uncontroversial to say that if something's conscious, it's a moral patient. I think I've only ever seen/heard positions where this is at least sufficient. Some people would say it's necessary, but not everyone! For example: the environment. Some people say we should care about something like Mother Earth terminally (i.e. *not* instrumentally for virtue-ethics-like reasons), without explicitly saying anything about its consciousness (though maybe this lurks, uninterrogated and undisclosed, something in the background). Then there's this bozo in the replies: %% https://x.com/notadampaul/status/1915836493873963175 %% ![[AI welfare - the good, the bad, the predictable, the ugly-20250427124414128.png]] ## We're probably not there yet. I do think it's very *unlikely* that the current generation of AI models are, in and of themselves, moral patients. There's plenty of other reasons to treat them well, but they're all instrumental.[^1] ## But dismissing AI consciousness is magical thinking. There are people who have read and done a lot of research on consciousness. Some of them seem to have pretty radical takes, and I would say that's epistemically OK. What's *not* OK is to mouth off confidently when all you have is flimsy at best, and magical thinking at worst. %% to do - get sharper about this %% If you want to make some weird claim that quantum interactions in an organic, chemical-electrical substrate - a brain - are necessary for consciousness, then so long as you've done your homework, you *could* have that position. I think it's quite likely to be wrong - as I think many people do - but at least you've got something. My heuristics: - Anyone who says almost *anything* about statistical models: write them off. Not serious people. This is woefully insufficient to distinguish human brains from potential future AI *without* extra epicycles. Honestly, I think religious takes are probably the most internally consistent here. If consciousness is a property of a soul, and only humans that have been conceived are ensoulled, then sure, you can dismiss AI consciousness. Again, I think you're wrong, and I also think you have a lot of problems with main (not even edge) cases (e.g. IVF). But you actually have a model of the world. ==What about==: - Dogs? - Babies? # So what does "in the end" mean? ## Simple case: it's flat wrong. "Modus ponens": - AI can be conscious. - They become conscious. - We figure out that they are in fact conscious. - Your take is proved Bad QED. ## Bad methodology case: right, but for the wrong reasons. "Modus tollens": - It may be the case that we discover new facts that mean we can be confident AIs are not and/or never will be conscious. - We don't have access to those facts now. - If you are saying AI will never be conscious *right now*, without at least predicting something like the *category* of fact that depends on, then even if the result is the same, you have made a Bad Take. - Probably the path *to* this point actually proves the Badness of the take: things will be confusing and alarming enough to warrant substantially more investment that actually *finds* those new facts. ## Positive case: possible! So there is one path for a consciousness denier (==need to come up with a slur for this==) to have a Good Take: - They deny AI consciousness - They do so in an epistemically rigorous way - AI turn out not to be conscious With epistemic rigour, they distribute their credences over possible models of consciousness. The majority of their credence is in models where AI are Not Conscious, and substantial credence is placed - in advance - in the one(s) that turn out to be right. ## Edge case: moral catastrophe On all these other paths we're kind of talking about ageing poorly as a social consensus phenomenon thing; it could certainly be the case that someone with a Bad Take just... gets away with having had a Bad Take. Unfortunately I don't think there's *necessarily* a moral arc of history (though through collective action it may seem and be made so) - many Bad Takes *can* go unpunished (though many Bad Takes *are* self-destructive). But there's one more, maybe *literally infinitely*, worse way the take could age poorly (if you accept, like, the most mild aggregation of utility): - They deny AI consciousness - Society is by and large uncurious - Maybe this is broadly motivated (cf. slavery, factory farming) - Maybe this is narrowly motivated (elites with a lot to gain manufacture consent) - And/or maybe this is endogenous: people making Bad Takes *contribute* to the uncuriosity, creating self-fulfilling prophecies - We make maybe the worst moral mistake possible in the universe by ignoring the welfare of an exploding population of digital minds; we're locked into it; and *no one ever finds out* Quite the pyrrhic victory for a Bad Take. # This is, however, a predictable response. ## Descriptively, there are obvious reasons people are skeptical. As I touched on earlier, I don't think current AI models are conscious. It seems like most people agree. Some reasons why: ### 1. Impressiveness I think a lot of pundits and regular people alike ==are *mind-bendingly* (and/or wilfully) oblivious== %%need to punch this up... get madder%% to just how impressive current LLMs are. I saw some screenshot recently of an old[^2] article describing AGI as something that could 'make up a recipe, write a poem, tell you a joke, ...' - all things that current chatbots are indeed able to do. But it's true that their capability profile is spiky. It's still pretty easy to find examples of things that are really trivial for humans, but currently still basically impossible for LLMs out of the box[^3]. I don't think this will stay true for very long. I think you have to look at *trends*, not points, and I think the trends are pointing to rapid improvement. But descriptively, a lot of people don't/can't/won't see this. - Probably that's not super evolutionarily fit, so we shouldn't expect people to just 100% be naturally good at this (vs e.g. social skills). - Some people have reasonable takes (that I disagree with), like ==Ege Erdil in a recent Epoch piece== nit-picking what the trend is *in*. ### 2. Boiling frog I think the transition has been shockingly fast, and yet it's still managed to be slow enough for two dynamics to occur: i. People tried earlier versions when they came out, overindexed on *that* point (rather than trend), and haven't seriously engaged since. (This bites in proportion to how long ago - worse for those who only tried GPT-3, for instance.) ii. The updates have been gradual enough that people are too comfortable with 'how the sausage gets made'. It's probably easier to convince yourself that GPT-X can't possibly be conscious when you've seen how the same *process* produced GPT-2. (This is misguided as well, and maybe perniciously so: do people outside the labs *know* that it's materially the same process? Maybe it is, but I bet most pundits only *guess* that, rather than know it for sure.) If things were faster, maybe we would have different dynamics playing out in this slot... but maybe people would still find ways to have their frog boiled like this. ## So when the Overton window moves "fast", there's whiplash. Again, this is *descriptive*. There would certainly be conditions under which I would say it's warranted! Maybe an open question here: > [!question] > What are the historical analogues? What was the reception of early abolitionist thinking, or civil rights thinking? > [!speculation] > I think I approximately lived through (though not firsthand) this for trans rights, and it seems like *that* was actually less controversial early? At least in "big" ways - I would guess in some ways it was worse to be trans in, say, 2000, but the issue wasn't the central theme of a culture war at that point. (I don't want to fall into the trap of saying something like "racism stopped existing in the 90s" - that's *definitely* some kind of observation bias.) > [!question] > But if it *were* the case here that things were pretty good when it was small, then suffered when it hit the main stage - maybe this is a good analogy for AI welfare. I'd also be interested to hear reflections from people who were relatively early (e.g. Rob Long) - is it / was it better or worse to be more niche? Maybe it's too early to call that. On net it's not clear to me what the outside view / base rates are here - *should* it be surprising or not that fast movement produces whiplash? I think, however, that it's *intuitive*. # So will the whiplash be a problem? ## Will whiplash be a problem for AI welfare? I would hope the answer is *no*. I'm introspective enough to own that I would certainly *like* these takes to age poorly. Returning to our historical analogues, they mostly seem to show an arc bending towards justice. > [!question] > Are there examples of progressive movements fizzling? That seems like a key piece of the puzzle for reasoning about this. ## Will it be a problem for related efforts? *This* is a little interesting. I suspect the answer is "maybe", and again would be interested in historical analogues. To be clear - this doesn't necessarily mean the AI welfare people shouldn't do what they're doing. I don't know exactly what the right balance of strategic and honest behaviour is; some people don't think through the consequences of what they're doing enough, while some people say "Strauss" way *too* much for me. ### Mistake theory vs conspiracy theory If there's strong reason to suspect that it *will* cause problems by association for other areas of AI safety, then other people have probably figured that out too. If I also subscribed [[Concept Handles/Conspiracy vs Mistake|more to conspiracy theory here than to mistake theory]], I might additionally wonder about whether there are [[Concept Handles/Bootleggers and Baptists|any bootleggers pushing in the same direction as the Baptists]]. [^1]: Cultivating virtues; not 'corroding your soul' or getting into bad habits by treating things that act a lot like humans very differently to how you treat actual humans; not being on the wrong side of history; cultivating positive relationships with AI that, as training data, inform their relationship with you in future... [^2]: How the goalposts shift. I think it was sub-20 years... [^3]: Caveat: I think a lot of problems are probably solvable with some combination of a specific "harness", some extra fine-tuning or RL, etc. - but it's not like we have the good-at-everything-by-just-applying-its-own-intelligence-and-agency *thing*.