Chopsticks.Continue reading →
Pacing—walking repeatedly over the same ground—often feels ineffably good while I’m doing it, but then I forget about it for ages, so I thought I’d write about it here.
I don’t mean just going for an inefficient walk—it is somehow different to just step slowly in a circle around the same room for a long time, or up and down a passageway.Continue reading →
I realized it was hard to peruse past Worldly Positions posts without logging in to Tumblr, which seemed pretty bad. So I followed Substack’s instructions to import the archives into world spirit sock stack. And it worked pretty well, except that SUBSTACK ALSO PUBLISHED MY UNPUBLISHED WORLDLY POSITIONS DRAFTS! What on Earth? That’s so bad. Did I misunderstand what happened somehow in my rush to unpublish them? Maybe. But they definitely had ‘unpublish’ buttons, so that’s pretty incriminating.Continue reading →
I liked Ben Kuhn’s ‘Staring into the abyss as a core life skill’.
I’d summarize as:
- If you are making a major error—professionally, romantically, religiously, etc—it can be hard to look at that fact and correct.
- However it’s super important. Evidence: successful people do this well.
This seems pretty plausible to me.
(He has a lot of concrete examples, which are probably pretty helpful for internalizing this.)
His suggestions for how to do better helped me a bit, but not that much, so I made up my own additional prompts for finding abysses I should consider staring into, which worked relatively well for me:
- If you were currently making a big mistake, what would it be?
- What are some things that would be hard to acknowledge, if they were true?
- Looking back on this time from five years hence, what do you think you’ll wish you changed earlier?
- If you were forced to quit something, what do you want it to be?
- (Variant on 1:) If you were currently making a big mistake that would be gut-wrenching to learn was a mistake, what would it be?
(Crossposted from AI Impacts Blog)
Averting doom by not building the doom machine
If you fear that someone will build a machine that will seize control of the world and annihilate humanity, then one kind of response is to try to build further machines that will seize control of the world even earlier without destroying it, forestalling the ruinous machine’s conquest. An alternative or complementary kind of response is to try to avert such machines being built at all, at least while the degree of their apocalyptic tendencies is ambiguous.
The latter approach seems to me like the kind of basic and obvious thing worthy of at least consideration, and also in its favor, fits nicely in the genre ‘stuff that it isn’t that hard to imagine happening in the real world’. Yet my impression is that for people worried about extinction risk from artificial intelligence, strategies under the heading ‘actively slow down AI progress’ have historically been dismissed and ignored (though ‘don’t actively speed up AI progress’ is popular).
The conversation near me over the years has felt a bit like this:
Some people: AI might kill everyone. We should design a godlike super-AI of perfect goodness to prevent that.
Others: wow that sounds extremely ambitious
Some people: yeah but it’s very important and also we are extremely smart so idk it could work
[Work on it for a decade and a half]
Some people: ok that’s pretty hard, we give up
Others: oh huh shouldn’t we maybe try to stop the building of this dangerous AI?
Some people: hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren’t delusional
Crossposted from The AI Impacts blog.
This is going to be a list of holes I see in the basic argument for existential risk from superhuman AI systems1.
To start, here’s an outline of what I take to be the basic case2:
I. If superhuman AI systems are built, any given system is likely to be ‘goal-directed’
Reasons to expect this:
- Goal-directed behavior is likely to be valuable, e.g. economically.
- Goal-directed entities may tend to arise from machine learning training processes not intending to create them (at least via the methods that are likely to be used).
- ‘Coherence arguments’ may imply that systems with some goal-directedness will become more strongly goal-directed over time.
II. If goal-directed superhuman AI systems are built, their desired outcomes will probably be about as bad as an empty universe by human lights
Reasons to expect this:
- Finding useful goals that aren’t extinction-level bad appears to be hard: we don’t have a way to usefully point at human goals, and divergences from human goals seem likely to produce goals that are in intense conflict with human goals, due to a) most goals producing convergent incentives for controlling everything, and b) value being ‘fragile’, such that an entity with ‘similar’ values will generally create a future of virtually no value.
- Finding goals that are extinction-level bad and temporarily useful appears to be easy: for example, advanced AI with the sole objective ‘maximize company revenue’ might profit said company for a time before gathering the influence and wherewithal to pursue the goal in ways that blatantly harm society.
- Even if humanity found acceptable goals, giving a powerful AI system any specific goals appears to be hard. We don’t know of any procedure to do it, and we have theoretical reasons to expect that AI systems produced through machine learning training will generally end up with goals other than those they were trained according to. Randomly aberrant goals resulting are probably extinction-level bad for reasons described in II.1 above.
III. If most goal-directed superhuman AI systems have bad goals, the future will very likely be bad
That is, a set of ill-motivated goal-directed superhuman AI systems, of a scale likely to occur, would be capable of taking control over the future from humans. This is supported by at least one of the following being true:
- Superhuman AI would destroy humanity rapidly. This may be via ultra-powerful capabilities at e.g. technology design and strategic scheming, or through gaining such powers in an ‘intelligence explosion‘ (self-improvement cycle). Either of those things may happen either through exceptional heights of intelligence being reached or through highly destructive ideas being available to minds only mildly beyond our own.
- Superhuman AI would gradually come to control the future via accruing power and resources. Power and resources would be more available to the AI system(s) than to humans on average, because of the AI having far greater intelligence.
Below is a list of gaps in the above, as I see it, and counterarguments. A ‘gap’ is not necessarily unfillable, and may have been filled in any of the countless writings on this topic that I haven’t read. I might even think that a given one can probably be filled. I just don’t know what goes in it.
This blog post is an attempt to run various arguments by you all on the way to making pages on AI Impacts about arguments for AI risk and corresponding counterarguments. At some point in that process I hope to also read others’ arguments, but this is not that day. So what you have here is a bunch of arguments that occur to me, not an exhaustive literature review.
I’ve been making predictions in a spreadsheet for the last four years, and I recently got to a thousand resolved predictions. Some observations:
I’m surprisingly well calibrated for things that mostly aren’t my own behavior1. Here’s the calibration curve for 630 resolved predictions in that class:
I have a column where I write context on some predictions, which is usually that they are my own work goal, or otherwise a prediction about how I will behave. This graph excludes those, but keeps in some own-behavior prediction which I didn’t flag for whatever reason.) ↩
When I have an overwhelming number of things to do, and insufficient native urge to do them, I often arrange them into a kind of game for myself. The nature and appeal of this game has been relatively stable for about a year, after many years of evolution, so this seems like a reasonable time to share it. I also play it when I just want to structure my day and am in the mood for it. I currently play something like two or three times a week.
The basic idea is to lay out the tasks in time a bit like obstacles in a platformer or steps in Dance Dance Revolution, then race through the obstacle course grabbing them under consistently high-but-doable time pressure.
Here’s how to play:
- Draw a grid with as many rows as there are remaining hours in your hoped for productive day, and ~3 columns. Each box stands for a particular ~20 minute period (I sometimes play with 15m or 30m periods.)
- Lay out the gameboard: break the stuff you want to do into appropriate units, henceforth ‘items’. An item should fit comfortably in the length of a box, and it should be easy enough to verify completion. (This can be achieved through house rules such as ‘do x a tiny bit = do it until I have a sense that an appropriate tiny bit has been done’ as long as you are happy applying them). Space items out a decent amount so that the whole course is clearly feasible. Include everything you want to do in the day, including nice or relaxing things, or break activities. Drinks, snacks, tiny bouts of exercise, looking at news sites for 5 minutes, etc. Design the track thoughtfully, with hard bouts followed by relief before the next hard bout.
- To play, start in the first box, then move through the boxes according to the time of day. The goal in playing is to collect as many items as you can, as you are forced along the track by the passage of time. You can collect an item by doing the task in or before you get to the box it is in. If it isn’t done by the end of the box, it gets left behind. However if you clear any box entirely, you get to move one item anywhere on the gameboard. So you can rescue something from the past, or rearrange the future to make it more feasible, or if everything is perfect, you can add an entirely new item somewhere.
You can now read or subscribe to this blog via world spirit sock stack, a Substack mirror of this site. I expect to see comments at wsss similarly often to wssp (with both being more often than at various other places this crossposts, e.g. LessWrong).
The main topics were the survey of ML folk I recently ran, and my thoughts on moving more slowly on potentially world-threatening AI research (which is to say, AI research in general, according to the median surveyed ML researcher…). I also bet him a thousand dollars to his hundred that AI would not make blogging way more efficient in two years, if I recall. (I forget the exact terms, and there’s no way I’m listening to myself talk for that long to find out. If anyone else learns, I’m curious what I agreed to.)
For completeness of podcast reporting: I forgot to mention that I also talked to Daniel Filan on AXRP, like a year ago. In other old news, I am opposed to the vibe of time-sensitivity often implicit in the public conversation.
EVERYTHING — WORLDLY POSITIONS — METEUPHORIC