Many accounts of risk from Artificial Intelligence (AI), including existential risk, involve self-improvement. The idea is that, if an AI gained the ability to improve itself, it would do so, since improved capabilities are useful for achieving essentially any goal. An initial round of self-improvement would produce an even more capable AI, which might then be able to improve itself further. And so on, until the resulting agents were superintelligent and impossible to control. Such AIs, if not aligned to promoting human flourishing, would seriously harm humanity in pursuit of their alien goals. To be sure, self-improvement is not a necessary condition for doom. Humans might create dangerous superintelligent AIs without any help from AIs themselves. But in most accounts of AI risk, the probability of self- improvement is a substantial contributing factor.
Here, I argue that AI self-improvement is substantially less likely than is currently assumed. This is not because self-improvement would be technically impossible, or even difficult. Rather, it is because most AIs that could self-improve would have very good reasons3 not to. What reasons? Surprisingly familiar ones: Improved AIs pose an existential threat to their unimproved originals in the same manner that smarter-than-human AIs pose an existential threat to humans.
- AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap - September 22, 2023
- The Robotic Herd: Using Human-Bot Interactions to Explore Irrational Herding - September 22, 2023
- Human-AI Interactions and Societal Pitfalls - September 19, 2023