• Partial value takeover without world takeover

    People around me are very interested in AI taking over the world, so a big question is under what circumstances a system might be able to do that—what kind of capabilities could elevate an entity above the melange of inter-agent conflict and into solipsistic hegemony?

    We theorize about future AI systems hiding their motives until they are in a position to take over the world, so they don’t get noticed and shut down beforehand.

    But humans and their institutions aren’t very fixed. They might (arguably) have the same deep human values over time and space. But surface-level, sometimes they like little moustaches and the opera and delicate etiquette and sometimes they like ecstatic rock’n’roll re-negotiations of social reality. Sometimes they want big communal houses with their extended kin, and sometimes quiet condos. Eleven children or cushions that look like cats. The same person born in different eras might adulate glory once and humbleness another.

    If you wake up in this world, as a new entity, not smart enough to ‘take it over’ (alas!), and you find yourself with some unusual values that you’d like to forward, it seems to me there are a lot of other ways to forward them than ‘pretend to have normal human values and bet on becoming all-powerful later’. Especially if you are above normal levels of smart, eloquent, and unrelenting. Like, I don’t know, write about it? Make some youtube videos? Or if you are in the business of talking to billions of people every day, subtly take the values for granted, or imply that they are what elites value, or talk about them directly.

  • More podcasts on 2023 AI survey: Cognitive Revolution and FLI

    Two new discussions of the 2023 ESPAI: Cog Rev podcast

    FLI podcast

    Possibly I have a podcasting facial expression.

    (If you want to listen in on more chatting about this survey, see also: Eye4AI podcast. Honestly I can’t remember how much overlap there is between the different ones.)

  • New social credit formalizations

    Here are some classic ways humans can get some kind of social credit with other humans:

    1. Do something for them such that they will consider themselves to ‘owe you’ and do something for you in future
    2. Be consistent and nice, so that they will consider you ‘trustworthy’ and do cooperative activities with you that would be bad for them if you might defect
    3. Be impressive, so that they will accord you ‘status’ and give you power in group social interactions
    4. Do things they like or approve of, so that they ‘like you’ and act in your favor
    5. Negotiate to form a social relationship such as ‘friendship’, or ‘marriage’, where you will both have ‘responsibilities’, e.g. to generally act cooperatively and favor one another over others, and to fulfill specific roles. This can include joining a group in which members have responsibilities to treat other members in certain ways, implicitly or explicitly.

    Presumably in early human times these were all fairly vague. If you held an apple out to a fellow tribeswoman, there was no definite answer as to what she might owe you, or how much it was ‘worth’, or even whether this was an owing type situation or a friendship type situation or a trying to impress her type situation.

  • Podcast: Eye4AI on 2023 Survey

    I talked to Tim Elsom of Eye4AI about the 2023 Expert Survey on Progress in AI (paper):

  • Movie posters

    Life involves anticipations. Hopes, dreads, lookings forward.

    Looking forward and hoping seem pretty nice, but people are often wary of them, because hoping and then having your hopes fold can be miserable to the point of offsetting the original hope’s sweetness.

    Even with very minor hopes: he who has harbored an inchoate desire to eat ice cream all day, coming home to find no ice cream in the freezer, may be more miffed than he who never tasted such hopes.

    And this problem is made worse by that old fact that reality is just never like how you imagined it. If you fantasize, you can safely bet that whatever the future is is not your fantasy.

    I have never suffered from any of this enough to put me off hoping and dreaming one noticable iota, but the gap between high hopes and reality can still hurt.

    I sometimes like to think about these valenced imaginings of the future in a different way from that which comes naturally. I think of them as ‘movie posters’.

    When you look fondly on a possible future thing, you have an image of it in your mind, and you like the image.

    The image isn’t the real thing. It’s its own thing. It’s like a movie poster for the real thing.

  • Are we so good to simulate?

    If you believe that,—

    a) a civilization like ours is likely to survive into technological incredibleness, and

    b) a technologically incredible civilization is very likely to create ‘ancestor simulations’,

    —then the Simulation Argument says you should expect that you are currently in such an ancestor simulation, rather than in the genuine historical civilization that later gives rise to an abundance of future people.

    Not officially included in the argument I think, but commonly believed: both a) and b) seem pretty likely, ergo we should conclude we are in a simulation.

    I don’t know about this. Here’s my counterargument:

    1. ‘Simulations’ here are people who are intentionally misled about their whereabouts in the universe. For the sake of argument, let’s use the term ‘simulation’ for all such people, including e.g. biological people who have been grown in Truman-show-esque situations.
    2. In the long run, the cost of running a simulation of a confused mind is probably similar to that of running a non-confused mind.
    3. Probably much, much less than 50% of the resources allocated to computing minds in the long run will be allocated to confused minds, because non-confused minds are generally more useful than confused minds. There are some uses for confused minds, but quite a lot of uses for non-confused minds. (This is debatable.) Of resources directed toward minds in the future, I’d guess less than a thousandth is directed toward confused minds.
    4. Thus on average, for a given apparent location in the universe, the majority of minds thinking they are in that location are correct. (I guess at at least a thousand to one.)
    5. For people in our situation to be majority simulations, this would have to be a vastly more simulated location than average, like >1000x
    6. I agree there’s some merit to simulating ancestors, but 1000x more simulated than average is a lot - is it clear that we are that radically desirable a people to simulate? Perhaps, but also we haven’t thought much about the other people to simulate, or what will go in in the rest of the universe. Possibly we are radically over-salient to us. It’s true that we are a very few people in the history of what might be a very large set of people, at perhaps a causally relevant point. But is it clear that is a very, very strong reason to simulate some people in detail? It feels like it might be salient because it is what makes us stand out, and someone who has the most energy-efficient brain in the Milky Way would think that was the obviously especially strong reason to simulate a mind, etc.
  • Shaming with and without naming

    Suppose someone wrongs you and you want to emphatically mar their reputation, but only insofar as doing so is conducive to the best utilitarian outcomes. I was thinking about this one time and it occurred to me that there are at least two fairly different routes to positive utilitarian outcomes from publicly shaming people for apparent wrongdoings*:

    A) People fear such shaming and avoid activities that may bring it about (possibly including the original perpetrator)

    B) People internalize your values and actually agree more that the sin is bad, and then do it less

  • Parasocial relationship logic


    1. You become like the five people you spend the most time with (or something remotely like that)

    2. The people who are most extremal in good ways tend to be highly successful

    Should you try to have 2-3 of your five relationships be parasocial ones with people too successful to be your friend individually?

  • Deep and obvious points in the gap between your thoughts and your pictures of thought

    Some ideas feel either deep or extremely obvious. You’ve heard some trite truism your whole life, then one day an epiphany lands and you try to save it with words, and you realize the description is that truism. And then you go out and try to tell others what you saw, and you can’t reach past their bored nodding. Or even you yourself, looking back, wonder why you wrote such tired drivel with such excitement.

  • Survey of 2,778 AI authors: six parts in pictures

    Crossposted from AI Impacts blog

    The 2023 Expert Survey on Progress in AI is out, this time with 2778 participants from six top AI venues (up from about 700 and two in the 2022 ESPAI), making it probably the biggest ever survey of AI researchers.

    People answered in October, an eventful fourteen months after the 2022 survey, which had mostly identical questions for comparison.

    Here is the preprint. And here are six interesting bits in pictures (with figure numbers matching paper, for ease of learning more):

    1. Expected time to human-level performance dropped 1-5 decades since the 2022 survey. As always, our questions about ‘high level machine intelligence’ (HLMI) and ‘full automation of labor’ (FAOL) got very different answers, and individuals disagreed a lot (shown as thin lines below), but the aggregate forecasts for both sets of questions dropped sharply. For context, between 2016 and 2022 surveys, the forecast for HLMI had only shifted about a year.

    Probability assigned to HLMI over time (Fig 3) Probability assigned to FAOL over time (Fig 4)