Crossposted from The AI Impacts blog.
This is going to be a list of holes I see in the basic argument for existential risk from superhuman AI systems1.
To start, here’s an outline of what I take to be the basic case2:
I. If superhuman AI systems are built, any given system is likely to be ‘goal-directed’
Reasons to expect this:
- Goal-directed behavior is likely to be valuable, e.g. economically.
- Goal-directed entities may tend to arise from machine learning training processes not intending to create them (at least via the methods that are likely to be used).
- ‘Coherence arguments’ may imply that systems with some goal-directedness will become more strongly goal-directed over time.
II. If goal-directed superhuman AI systems are built, their desired outcomes will probably be about as bad as an empty universe by human lights
Reasons to expect this:
- Finding useful goals that aren’t extinction-level bad appears to be hard: we don’t have a way to usefully point at human goals, and divergences from human goals seem likely to produce goals that are in intense conflict with human goals, due to a) most goals producing convergent incentives for controlling everything, and b) value being ‘fragile’, such that an entity with ‘similar’ values will generally create a future of virtually no value.
- Finding goals that are extinction-level bad and temporarily useful appears to be easy: for example, advanced AI with the sole objective ‘maximize company revenue’ might profit said company for a time before gathering the influence and wherewithal to pursue the goal in ways that blatantly harm society.
- Even if humanity found acceptable goals, giving a powerful AI system any specific goals appears to be hard. We don’t know of any procedure to do it, and we have theoretical reasons to expect that AI systems produced through machine learning training will generally end up with goals other than those they were trained according to. Randomly aberrant goals resulting are probably extinction-level bad for reasons described in II.1 above.
III. If most goal-directed superhuman AI systems have bad goals, the future will very likely be bad
That is, a set of ill-motivated goal-directed superhuman AI systems, of a scale likely to occur, would be capable of taking control over the future from humans. This is supported by at least one of the following being true:
- Superhuman AI would destroy humanity rapidly. This may be via ultra-powerful capabilities at e.g. technology design and strategic scheming, or through gaining such powers in an ‘intelligence explosion‘ (self-improvement cycle). Either of those things may happen either through exceptional heights of intelligence being reached or through highly destructive ideas being available to minds only mildly beyond our own.
- Superhuman AI would gradually come to control the future via accruing power and resources. Power and resources would be more available to the AI system(s) than to humans on average, because of the AI having far greater intelligence.
Below is a list of gaps in the above, as I see it, and counterarguments. A ‘gap’ is not necessarily unfillable, and may have been filled in any of the countless writings on this topic that I haven’t read. I might even think that a given one can probably be filled. I just don’t know what goes in it.
This blog post is an attempt to run various arguments by you all on the way to making pages on AI Impacts about arguments for AI risk and corresponding counterarguments. At some point in that process I hope to also read others’ arguments, but this is not that day. So what you have here is a bunch of arguments that occur to me, not an exhaustive literature review.
I’ve been making predictions in a spreadsheet for the last four years, and I recently got to a thousand resolved predictions. Some observations:
I’m surprisingly well calibrated for things that mostly aren’t my own behavior1. Here’s the calibration curve for 630 resolved predictions in that class:
I have a column where I write context on some predictions, which is usually that they are my own work goal, or otherwise a prediction about how I will behave. This graph excludes those, but keeps in some own-behavior prediction which I didn’t flag for whatever reason.) ↩
When I have an overwhelming number of things to do, and insufficient native urge to do them, I often arrange them into a kind of game for myself. The nature and appeal of this game has been relatively stable for about a year, after many years of evolution, so this seems like a reasonable time to share it. I also play it when I just want to structure my day and am in the mood for it. I currently play something like two or three times a week.
The basic idea is to lay out the tasks in time a bit like obstacles in a platformer or steps in Dance Dance Revolution, then race through the obstacle course grabbing them under consistently high-but-doable time pressure.
Here’s how to play:
- Draw a grid with as many rows as there are remaining hours in your hoped for productive day, and ~3 columns. Each box stands for a particular ~20 minute period (I sometimes play with 15m or 30m periods.)
- Lay out the gameboard: break the stuff you want to do into appropriate units, henceforth ‘items’. An item should fit comfortably in the length of a box, and it should be easy enough to verify completion. (This can be achieved through house rules such as ‘do x a tiny bit = do it until I have a sense that an appropriate tiny bit has been done’ as long as you are happy applying them). Space items out a decent amount so that the whole course is clearly feasible. Include everything you want to do in the day, including nice or relaxing things, or break activities. Drinks, snacks, tiny bouts of exercise, looking at news sites for 5 minutes, etc. Design the track thoughtfully, with hard bouts followed by relief before the next hard bout.
- To play, start in the first box, then move through the boxes according to the time of day. The goal in playing is to collect as many items as you can, as you are forced along the track by the passage of time. You can collect an item by doing the task in or before you get to the box it is in. If it isn’t done by the end of the box, it gets left behind. However if you clear any box entirely, you get to move one item anywhere on the gameboard. So you can rescue something from the past, or rearrange the future to make it more feasible, or if everything is perfect, you can add an entirely new item somewhere.
You can now read or subscribe to this blog via world spirit sock stack, a Substack mirror of this site. I expect to see comments at wsss similarly often to wssp (with both being more often than at various other places this crossposts, e.g. LessWrong).
The main topics were the survey of ML folk I recently ran, and my thoughts on moving more slowly on potentially world-threatening AI research (which is to say, AI research in general, according to the median surveyed ML researcher…). I also bet him a thousand dollars to his hundred that AI would not make blogging way more efficient in two years, if I recall. (I forget the exact terms, and there’s no way I’m listening to myself talk for that long to find out. If anyone else learns, I’m curious what I agreed to.)
For completeness of podcast reporting: I forgot to mention that I also talked to Daniel Filan on AXRP, like a year ago. In other old news, I am opposed to the vibe of time-sensitivity often implicit in the public conversation.
- If you write a question that seems clear, there’s an unbelievably high chance that any given reader will misunderstand it. (Possibly this applies to things that aren’t survey questions also, but that’s a problem for another time.)
- A better way to find out if your questions are clear is to repeatedly take a single individual person, and sit down with them, and ask them to take your survey while narrating the process: reading the questions aloud, telling you what they think the question is asking, explaining their thought process in answering it. If you do this repeatedly with different people until some are not confused at all, the questions are probably clear.
- If you ask people very similar questions in different sounding ways, you can get very different answers (possibly related to the above, though that’s not obviously the main thing going on).
Crossposted from AI Impacts
AI Impacts just finished collecting data from a new survey of ML researchers, as similar to the 2016 one as practical, aside from a couple of new questions that seemed too interesting not to add.Continue reading →
Spoiler warning: spoilers for The Passenger by Lisa Lutz, and In the Cart, by Anton Chekhov.
I took up this book looking for a page-turner. It was, but hours before the end I thought its main contribution to my mental life would be the visceral knowledge that page-turners can be undelicious. It felt cold, and getting into its world felt bad. The protagonist slunk around dark and uncomfortable places, killing people, scheming harshly, perceiving low beams as dangers to the heads of tall men, and that sort of thing. With some amount of fretting about what she was becoming. I wanted to turn the pages, but I also kind of wanted it to end, and for me to read something more squarely enjoyable next time.Continue reading →
- What are you looking forward to this week?
- What do you think of as the most important thing going on in the world right now?
- If you had a spare half hour right now, what would you do with it?
- What is something you changed your mind about recently?
- What in life is more important than other people realize?
- If someone gave you $5 right now, what would you do with it?
- Who is someone you think of as a hero?
- Are you paying attention to the US election?
- What was the biggest news story this year?
I don’t recall any notable constraints other than the location requirement, but I barely remember doing this.
What does it mean to be “groupstruck”? How does groupstruck-ness differ from the bystander effect, normalcy bias, and other related cognitive biases? How do we break people out of being groupstruck? What does it mean to be a “bounded” person? How can we build up better decision-making heuristics? What sorts of decisions do people usually not quantify but should (and vice versa)? How can we make rational relationship decisions without coming across as “calculating” or cold? How does anthropic reasoning affect our hypotheses about the nature of the universe and life within it (i.e., the Fermi paradox, the simulation hypothesis, etc.)?
You can listen or read the transcript here.
EVERYTHING — WORLDLY POSITIONS — METEUPHORIC
Previous << Page: 1 of 17 >> Next