The paper explores a number of fascinating issues, including:
- Ethical issues that might arise in AI software
- Building AI that operates safely
- Whether a program can have moral status
- Differences in the ethical assessment of software
- Whether software can become more ethical than humans
The paper opens with an intriguing hypothetical situation:
Imagine, in the near future, a bank using a machine learning algorithm to recommend mortgage applications for approval. A rejected applicant brings a lawsuit against the bank, alleging that the algorithm is discriminating racially against mortgage applicants. The bank replies that this is impossible, since the algorithm is deliberately blinded to the race of the applicants. Indeed, that was part of the bank’s rationale for implementing the system. Even so, statistics show that the bank’s approval rate for black applicants has been steadily dropping. Submitting ten apparently equally qualified genuine applicants (as determined by a separate panel of human judges) shows that the algorithm accepts white applicants and rejects black applicants. What could possibly be happening?
Finding an answer may not be easy. If the machine learning algorithm is based on a complicated neural network, or a genetic algorithm produced by directed evolution, then it may prove nearly impossible to understand why, or even how, the algorithm is judging applicants based on their race.
The closest I've come to diagnosing behaviors of this sort is in working with the query optimizers of modern relational database systems; these systems tend to use complex strategies to choose query execution plans for user-provided queries, and when you don't get the query plan that you want, it can be quite challenging to figure out why. Modern DBMS implementations provide extremely powerful tools for studying such problems, but it is still quite hard.
Later in the paper, the authors consider the question of whether it is possible for humans to write a computer program which is more ethical than any human. This is kind of a strange question, as it begs the whole topic of anthropomorphism, but the discussion is quite interesting regardless. They consider a (perhaps) similar question, which is whether it is possible for humans to write a computer program which is better at playing chess than any human. This has clearly been done. If you accept that the two questions are similar, then you can follow their reasoning which leads them to suggest that the same thing should be possible in the realm of ethics:
Firstly, they note that the approach to this should not be merely to describe to the computer what we humans consider to be the best ethics, since that approach did not work for chess:
if the programmers had manually input what they considered a good move in each possible situation, the resulting system would not have been able to make stronger chess moves than its creators. Since the programmers themselves were not world champions, such a system would not have been able to defeat Garry Kasparov.
- Secondly, they note that we will need to improve our own knowledge about ethics in order to be able to describe to the computer what we mean:
Perhaps the question we should be considering, rather, is how an AI programmed by Archimedes, with no more moral expertise than Archimedes, could recognize (at least some of) our own civilization’s ethics as moral progress as opposed to mere moral instability. This would require that we begin to comprehend the structure of ethical questions in the way that we have already comprehended the structure of chess.
I enjoyed reading this paper; it is approachable and intriguing. If the subject interests you, the authors also provide many other references to follow to learn more. Although this is far-off-in-the-distance stuff, it is still entertaining and rewarding to read.