Utility Function Security In AI Agents

Roman Yampolskiy recently published a new article in the Journal of Experimental & Theoretical Artificial Intelligence, that touches on themes from FAI research, including “counterfeit utility”, literalness, and wireheading.

From the abstract: “The notion of ‘wireheading’, or direct reward centre stimulation of the brain, is a well known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants.”

“Overall, we conclude that wireheading in rational self improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction.”

Utility Function Security In AI Agents

Author Description