Self-Referential Problems in Self-Modifying AGI

MIRI recently released a working paper jointly written by Benja Fallenstein and Nate Soares. It highlights several self-referential paradoxes that self-modifying AGIs could run into. These issues exist even for future AGIs that use the most advanced environmental reasoning framework currently proposed by any AGI researcher. They also show how even more fundamental problems exist which hobble the most naive reasoning AGIs could do about their environment (ie, standard reinforcement learning).

From the abstract:

By considering agents to be a part of their environment, Orseau and Ring’s space-time embedded intelligence is a better fit for the real world than the traditional agent framework. However, a self-modifying AGI that sees future versions of itself as an ordinary part of the environment may run into problems of self-reference. We show that in one particular model based on formal logic, naive approaches either lead to incorrect reasoning that allows an agent to put off an important task forever (the procrastination paradox), or fail to allow the agent to justify even obviously safe rewrites (the Löbian obstacle). We argue that these problems have relevance beyond our particular formalism, and discuss partial solutions.

Self-Referential Problems in Self-Modifying AGI

Author Description

One Response to “Self-Referential Problems in Self-Modifying AGI”