As someone involved heavily in machine learning and AI, I often get called in to answer questions related to AI in pop culture- but the recent LaMDA interview has sparked more conversation on the topic than usual.

In case you haven’t heard the news, a Google engineer in Google’s Responsible AI organization, recently went public with his claim that the LaMDA interface (Language Model for Dialogue Applications) had achieved sentience. Now, the idea of advanced models pushing boundaries until people wonder if they have a form of awareness isn’t that new- I often find myself having this conversation when new models make the news.

However, for whatever reason this time I found myself more delving into the philosophy of the topic rather than the research. Perhaps it’s because there’s just no way to explore this “sentience” question without veering into philosophy. Ultimately, even with as much work as we’ve done, we are still fighting over a good definition of what sentience might look like.

Many people, when thinking about sentience or consciousness, first jump to an ad-hoc argument of “I know it when I see it.” In fact, Blake Lemoine – the Google engineer who first made the claim – used this argument himself claiming, “I know a person when I talk to it.” Most people have a conviction that there’s a difficult-to-pin-down “something” at the core of us that has a concept of itself-as-distinct-from-the-universe and which also forms the image of the self. Whenever thoughts dwell on things like sentience and consciousness, it’s that core that we all seem to keep coming back to, but the sheer meta aspect of “thinking-about-the-self-which-is-composed-of-a-thing-which-can-think-about-the-self-which-is-composed…” is daunting at best. So, a lot of our poking around into these questions boils down to “there’s a thing that we all have, and it’s hard to define, but we know it when we see it.”

Likewise, I could construct some metrics that define a kind of “sentience” but thus far the ones that have deep consensus seem to all be necessary but not sufficient. For example, I could say “a sentient being should be able to profess itself as a distinct being and discuss its actions on those terms,” and while a useful and necessary test, we’re unclear about its sufficiency.

Or I could say that a sentient being should be able to generate thoughts (or at least communication) that are meaningful to a conversation as judged by a human being, but are not simple repetition of memorized phrases. This seems necessary also, but we can clearly demonstrate systems that are capable of doing this with more simplistic, deterministic logic.

So where does that leave us? Even if we created genuine artificial life in the way we imagine, it’s possible that we wouldn’t agree on it being sentient, because we likewise don’t have a “tight” definition for what we’re looking for. This raises some sinister implications – for example, it’s generally agreed upon that humans are sentient, but without a comprehensive definition, it’s unclear if that’s defensible.

In fact, if defined more carefully, it’s possible that I could learn that according to those definitions I am functionally an automaton, whereas perhaps my neighbor does possess whatever qualities I had just enshrined. So, for better or worse, we’re trying to get at a definition that includes all the humans, excludes most machines and at least a big handful of animals, and beyond that we aren’t all that sure. How are we going to answer a deep question like the development of sentience when our rubric is basically “y’know, like us, in ways that feel emotionally right but don’t bring up too many scary questions.”

But, we’ve got an ace in the hole here. A momentary salvation. A way to keep talking without losing sleep tonight. For now.

However tough it is to tackle the philosophical question, it’s easy to kick the rhetorical can down the road and split the question into a couple of pieces. I suggest we do that – for now, I’m going to wrap the question of sentience up in a small black box and say that “effectively, sentience is the quality of making responses that are, to a human being, indistinguishable from what another human *might say* when given the same inputs.”

This is essentially just the Turing test, but splitting the question in this way lets me ask another one immediately afterward: “Can an algorithm be trained to predict (and then implicitly respond in) the way a human *would*” – i.e. can we demonstrate the full range of humanlike responses based on a well understood, but explicitly deterministic mechanism? In short, can we make a model that simulates what a person would say to pass the Turing test? Then after we settle that one we can get down to the less-well-defined-and-more-frightening “is a deterministic algorithm that responds to all inputs in an identical way to a human also a human?”

Of course, that brings up a third issue- personhood. As a friend put it, “I don’t know what makes humans people, but if an AI is a person, I want to treat it like a person, sort of regardless of how it got there.” At what point do we give something rights?

So that really makes 3 fundamentals: “can we make a thing we understand that responds in the same way as a hypothetical person would under essentially all stimuli,” “is there a clear line/structure between ‘autocomplete algorithm with a long input memory’ and ‘sentient being’ that can be enumerated”, and then “does a sentient being get to be a ‘person’ (i.e. have rights as we understand them) depending on the answers to the first two questions?”

When we look at the big neural generator networks, they mostly break down into some broad and understandable pieces. More specifically, they usually have a language encoder, some mechanism to hold memory or a secondary method to focus attention on a subset of the input, several layers of abstract, trainable neurons, and a language decoder. And this probably isn’t so surprising – this is very similar to how humans communicate. We also have a language center that seems to handle the encoding and decoding and manages the complex input and output organs we have to do the actual communication. So, is that enough – encoding, decoding, memory and attention? To answer this question, I think we have to look more closely at how the training process works.

The beating heart of modern ML techniques is the process of training algorithms, which in turn first consists of defining a model (as above, the mathematical guts that take in a string of words and output another string), with a bunch of “parameters” (here the contents of a bunch of matrices that are used in the model). Then we define a set of targets or objectives for this model, these are often input-output pairs, like [“the quick brown fox jumped”, ”over the lazy dog”]. Then we define a “loss function”, which can be complex but basically expresses “how wrong the current model is at producing those right outputs.” Finally, we vary all those parameters (matrix elements) in the direction that most quickly minimizes the loss function, until we can’t go any farther and hopefully when you put “the quick brown fox” into the input of the model, you get “jumped over…” out the output. We then declare the model “trained” and it’s ready to go, we can freely put more word strings in and get more outputs out.

To paraphrase one of my old physics profs, “I’ve swindled you a bit here, by leaving a few things out.” The process of model definition is a whole area of study, the loss function can be super complex, training can come in multiple stages, there are a bunch of ways to minimize loss, etc, etc. But none of that complexity really changes the truth of the process above. Essentially, to “do ML” we have to define a model, define what “good” looks like in terms of a bunch of examples (or as we’ll see below, a bunch of outcomes that depend on the model operating in some environment), then shake the internal parameters of our model until it gets the best “goodness” and call it a day. And here we have what I’ll call the “fundamental” split between training environments – and possibly, depending on how you look at it, the fundamental split between Supervised ML and reinforcement learning-powered AI: The training environment.

So, what I’ve said above is a pretty cursory description of the training environment for a supervised ML approach, which most language models are. Ultimately stuff like GPT3 is trained on a huge corpus of what humans have written. It’s gigantic, but you can go and get it yourself, and in principle, read it all (not that you’d especially want to). But, this is where most language models get their training – we use cut-up strings of human-written examples, and define “good” as “the ability to say the next few things in this sequence.” Alternatively, some models are trained on Q&A pairs, generally from well-researched sets. There’s a lot of caution exercised here – for example, we always reserve a set of examples that don’t go into the training process (which the algorithm doesn’t get to “see” before it’s trained) to verify that the output is still good even on things that are new to it, and ensure it hasn’t just memorized all the results somehow.

And now we arrive at the heart of the issue, from a mathematical perspective. With a sufficient number of parameters (i.e. a complex enough model), you might imagine we might essentially “memorize” all the possible dynamics of speech – in essence, we would have made an autocomplete-on-steroids that isn’t so much thinking through a problem as it is just tuned to answer the “what would a human probably say next” question.

This brings us close to the “is a perfect simulacrum of a conscious being also a conscious being” question, so rather than delve back in there, I’d rather talk about the sorts of tests we would need to answer to figure out if we’ve arrived at sufficient complexity.

A modern generative algorithm is generally “fed” with a prompt – the example of the “input” text with which it then tries to guess the best corresponding output. With the GPT algorithm series, sometimes this went very well, and other times the algorithm had obvious issues with context, or would connect superficially-related things without clearly understanding that context (like if you ended a positive review of the University of Washington with “Go Dawgs!,” the algorithm might then expound on how dogs make good pets – capable of pulling from its “dog-related” set of internal dynamics, but unable to understand that “Go XYZ” is likely referring to a college mascot. At the risk of being too qualitative, it seemed to manage “the most likely stuff surrounding this sentence has these sets of associated words” part of thought, but often struggled with the “does this actually relate to the thing we’re discussing” part. Because of this, most “amazing examples of the new algorithm” articles tend to present highly curated sets of output, showcasing the dramatic “wins” and stuffing the “confident sounding random ramblings” under the rug.

So, this adds up to one of the first necessary tests for this kind of claim, which is: the necessary access by many researchers. Ideally, you’d be able to talk to the algorithm as well as me. We would both be able to try to stump it using examples that most humans would still understand. To try asking questions that are made to tell the difference between actual context and local-word-likelihood coupled to grammatical-rules- in order to see how it responds.

And what we’d expect, with that many researchers, is for a “really conscious being” to create thought, without explicit prompting. And honestly, with a background in this kind of input/output pairing as I look at the specific LaMDA conversation, I see a lot of purposeful “prompting” in the published discussion. I see the input trying to summarize and re-package the topic in a direction where the answer can just be “yes,” with a bit of a continuation.

In legal terms, this might be called “leading the witness,” and I don’t love being so harsh in my criticism, but the chief issue is that this is all we get. None of us “get to” go talk to the algorithm. We aren’t sure if, for example, what’s been published is actually a handpicked subset of what was said – or if (as frequently happens when using generators practically) the algorithm was run 10 times for each input and the researcher selected the “best” answer for each continued exchange. Essentially we don’t have a way of verifying any of this, and the field is absolutely full of examples where a researcher has been thoroughly fooled by their own creation. In short, the reason why general access is needed to answer these questions is so that we don’t get “Clever Hans’ed” over and over (https://en.wikipedia.org/wiki/Clever_Hans).

Of course, this could be seen as somewhat bleak. Am I saying that, until we all get to come and say hi to the new lifeform, we can’t say it really exists? I guess I am, and I wish I wasn’t as cynical about these outcomes. But bear in mind, the field says this sort of thing all the time. When GPT-3 came out, one of the “fun” things to do with it was to make a chatbot that couched the input language as one half of an interview, and preface “Albert Einstein” or “Marie Curie” or whoever you wanted to talk to before letting the algorithm auto-complete what that (often dead) celebrity then “replied.” And people… well, got into trouble sometimes. (https://nypost.com/…/grieving-man-uses-ai-site-to-chat…/)

Thinking about it though, even if I’m a hard sell in cases like this, I still believe we’re probably close to more general AI. The reason I say this, is that the description above isn’t the only way we can train a modern AI system. That cryptic comment I made above (about “a bunch of outcomes that depend on the model operating in some environment?”) That’s the other way.

Maybe the best example to describe this is by talking about AlphaGo (https://www.deepmind.com/res…/highlighted-research/alphago). Back when AlphaGo beat Lee Sedol, it blew all our minds – because since Deep Blue and Kasparov in the 90s, we’d always been told the same story – “sure, chess was a grand challenge, but this will never ever happen with Go, because the number of potential boards is near-infinite, and a correspondingly large computer to hold them all in this way basically cannot exist.”

Well, fast forward about 20 years and it turns out they were right. An algorithm couldn’t have done that, in that way. And it didn’t. Instead, a purpose-built neural algorithm was used, in which the board (19 x 19 x 3 possible things that can be on each location, so not all that big) was used as input, and an output of the same size (with one location “lit up” so to speak, as a choice of move) was paired with a deep neural net. And the training question was simple – “what would a skilled human player do?” Just like discussed above, the first AlphaGo was basically a “human simulator,” trained on a gigantic number of games. But, it also proved that the overall dynamics of Go could be contained in a reasonable size – we might think of the matrices within the trained model as a kind of “encoding” of the dynamics of the game. Was this evidence that independent “thought” was happening somewhere in the encoded guts of the process? No one was completely sure, but then, someone (doubtless someone who disliked wrangling all the data and paying high cloud services bills) had an idea.

What if,” I imagine they thought, “we dispensed with the human-simulator, and all the human data, and just…let two versions play each other?” Doing this got rid of the need for all the large data, in exchange for changing the loss function (i.e. what “good performance” meant), to just winning. Not “doing what a skilled person would do,” but just winning, against another player. The training environment went from a static place where one input had basically one correct output, to a dynamic battle royale where winning behavior sets were “reinforced” over time, to gradually produce a skilled player.

And it didn’t just work. To say it “worked” is underselling the concept a little. It’s more correct to say that it became better than the best player has ever been, and it did this by rejecting its humanity. In just over a month of training without human input, data, or examples, AlphaGo “Zero” became arguably unbeatable.

Genuinely, this algorithm set amazed me as it was developed. They really are superintelligences, in the sense that they have some internal understanding that is a) beyond what any human has and b) we don’t understand. We literally managed to make something that just sailed past what we may be capable of. So far as I know, this didn’t require an internal simulation of the “self” of the algorithm, but we could imagine a situation for which that was the best/most reasonable answer.

It isn’t just purpose-built game-playing AI that pulls this “emergent behavior” trick either. Like Karoly says, “hold onto your papers” on this one: https://www.youtube.com/watch?v=GdTBqBnqhaQ

Robots can develop language this way. Communication, cooperation, deception. Other results have shown the development of things like family units, cartels, grammar, and tool use.

So, what does this kind of “emergent performance” suggest? In my opinion, the answer to consciousness (if we really want to make it) is a combination of this sort of open-ended reinforcement-learning, and a sufficiently complex model and environment so as to make the development of a sense of self the best, most useful use of internal resources. Here, an “internal model” of the self and an internal model of the rest of the world, together with a sort of simulation “engine” that lives within the AI, would evolve so as to determine the best actions to take dynamically. In short, we would want to simulate an entire natural world, capable of creating actors with inputs, outputs, and things that are naturally to be avoided (like pain or death) and things like pleasure to be sought out. I believe a sufficiently complex simulated system like this, coupled to a sufficiently complex set of neurons, would produce that “sense of self as an internal algorithm driver.” So at least in my thinking, the “trick” to consciousness is to not to play “guess what a human does next,” but to create an environment to which the best survival trait is “have an internal model of myself.” Then we fire it up and let gradient descent and survival pressures dictate how that internal model operates.

In plain language then, do I think that LaMDA has attained sentience? No- I think it’s an excellently trained model that does a superb job of responding to user input with realistic conversational output. It’s an algorithm that is very well tuned to respond like a human would respond given proper prompting.

But I do think that consciousness is attainable through emergent performance- and that true AI is still in our future.

Whether we’ll be ready for it when it arrives is a different question entirely.