Gathering data on human preference is PAC learnable in polynomial time. The only gotchas are that:
1) The result only holds if human values are not pathologically complex (i.e., not branching more quickly in option space than the sets of physically existing things). This is true for consistent preference sets and preference orderings for rational / reductive agents.
2) The learning algorithm only guarantees performance “close” to the true underlying preferences.
It looks like academia may have accidentally contributed a small amount of work towards FAI problems like DWIM (“Do What I Mean”) or larger classes of value aggregation questions like CEV by bounding what should and shouldn’t be possible in terms of automatically learning about preferences from training examples.
2 Responses to “The learnability of voting rules”
January 19
Eliezer YudkowskyOnly if your training examples are drawn from the same distribution as the test cases. That’s the enormous gotcha with respect to any attempt to use statistical guarantees on AI behavior. All your training examples are drawn from when the AI was less powerful than when it was mature. An AI that becomes much smarter may encounter questions about preference that aren’t like any questions about preference our smaller brains thought to provide as a training example in advance. So we either have to compromise the sovereignty of an extremely powerful AI which leads to moral hazard, or we need to obtain a reliable abstract assurance about performance in that condition without checking any samples of it. Both alternatives should make any sane person nervous.
January 19
Louie HelmWe should probably reference this kind of research in the next OPFAI writeup so we can explicitly say what is good about it (that it shows values/preferences can be learned efficiently at all — an objection I sometimes here to CEV et al) and also so we can explicitly reference what we think is lacking in this kind of research (the reliance on distribution assumptions we may find unsuitably brittle given Reasoning Under Fragility).