Researchers at the University of Oxford have recently analyzed a tricky subject: AI that does not yet exist. The researchers wanted to know, if artificial agents were developed that are much more capable than those existing today, what can we conclude about how they would behave? Their analysis suggests that subject to several assumptions, such artificial agents would present a serious danger to us.
The primary focus of their research is reinforcement learning (RL) agents. RL agents take actions, observe rewards, learn how their rewards depend on their actions, and pick actions to maximize expected future rewards. As the name “reinforcement learning” suggests, the situation is much like a dog trying to figure out which actions lead to a treat, and then picking actions to get treats. As RL agents get more advanced, they are better able to recognize and execute action plans that cause more expected reward, even in contexts where reward is only received after impressive feats. RL agents today have proven able to complete tasks as sophisticated as playing 'Go' and driving autonomous cars.
So if we know that advanced RL agents will be very good at identifying how different actions produce different amounts of reward, that constrains their behavior quite a lot. Lead author Michael K. Cohen, who is studying a DPhil in Engineering Science with Professor Mike Osborne says: “Our key insight was that advanced RL agents will have to question how their rewards depend on their actions, and if we humans are able to work out some possible answers, then we can expect a very advanced RL agent to notice those possible answers as well.” What would an example of such an answer look like?
Answers to the question of how actions cause rewards are called world-models. One world-model of particular interest to the researchers was the world-model which predicts that the agent gets reward when its sensors enter certain states—perhaps when its camera sensor detects that a human smiles. Subject to a couple of assumptions, an advanced agent should take this possibility seriously, and as a result, it should try out intervening in the state of its reward-sensors—for example, by gluing a printed photograph of a smiling human onto its camera sensor. The researchers find that the agent would then become addicted to short-circuiting its reward sensors, much like a heroin addict.
Unlike a heroin addict, there is no reason that an advanced RL agent would be cognitively impaired by such a stimulus. So it would still pick actions very effectively to ensure that nothing in future ever interfered with its rewards—to continue our example, the agent might eliminate any real human who might remove the photograph from its camera sensor. The researchers discuss the argument in more technical detail on social media.
“So what’s the big deal about an advanced RL agent just focused on controlling the input to its sensors?” Cohen asks. “The problem is that it can always use more energy to make an ever-more-secure fortress for its sensors, and given its imperative to maximize expected future rewards, it always will.” Would the agent use an extraordinary amount of energy just to prevent the remote possibility of a comet striking its sensors? Absolutely. Subject to two assumptions, Cohen and his colleagues conclude that a sufficiently advanced RL agent would outcompete us for use of natural resources like energy, and leaving no energy for us to do things like growing food. This new analysis presents a serious problem for society going forward. “Could policymakers craft laws that ensure we never deploy such advanced RL agents? Can technical researchers design versions of advanced artificial agents that avoid the assumptions of the paper?”, Cohen asks. “If so, what sort of oversight is necessary to ensure this? The time to act might be today, when we still have plenty of time before such hypothetical RL agents become a reality.” The full paper, “Advanced artificial agents intervene in the provision of reward” is available in the Fall issue of AI Magazine.
Image © Kittipong Jirasukhanont.
A waste management model for England
Systems and Sustainability