Approaches to Artificial Moral Agents
While Delphi’s failings were a major setback in machine ethics, they also highlight a deeper problem with how we approach creating ethical AI systems. The Delphi experiment was not the first attempt to design an “artificial moral agent,” or AMA. The idea of creating machines capable of making ethical decisions has been around for decades. In 2005, a symposium convened by machine ethicists explored how AMAs could be developed to aid humans in decision-making in contexts like medicine, law, and the military. Many of these efforts have been focused on specialized domains, such as autonomous vehicles or robots, where ethical decision-making is crucial to ensuring the safety and well-being of humans.
The real challenge is building a general-purpose moral agent that can make ethical judgments in multiple contexts, just like humans. The field’s goal has always been to ensure that AI systems can make decisions that align with human values. Yet, Delphi’s failures should serve as a cautionary tale about how not to approach this task.
One of the most famous fictional attempts to address AI ethics comes from Isaac Asimov’s “Robot” novels, where he introduced the “Three Laws of Robotics.” These laws set a framework for ethical behavior for robots: (1) A robot may not injure a human or, through inaction, allow a human to come to harm; (2) A robot must obey human orders; and (3) A robot must protect its existence, as long as it doesn’t conflict with the first two rules. The first of these laws is a basic ethical principle known as “non-maleficence,” or “don’t harm others,” which is central to many ethical traditions, including medical ethics. In practice, this could be actualized in a robot with sensors that prevent it from getting too close to humans, much like a safety protocol. As Asimov acknowledged, even this straightforward rule can quickly become complicated when ethical dilemmas arise. While non-maleficence can guide us to avoid causing harm, it does not help us decide when harm is necessary to prevent greater harm. For AI systems like Delphi to make ethical judgments, they need to go beyond basic rules and develop a deeper understanding of context.
Ethicists have been trying to formalize ethical principles into decision-making procedures AI systems can use. Some have tried to operationalize the principle of utilitarianism for AMAs, which calls for maximizing the greatest good for the greatest number. Others have turned to Kantian ethics for AMAs, where actions must be universalizable, meaning they should be logically coherent if everyone were to act in the same way. There are also contractarian approaches for AMAs, which rely on mutual agreements between rational agents in ideal bargaining conditions.
These approaches are not without their challenges. They fall into two broad categories: top-down approaches that apply principles to guide decision-making, and bottom-up approaches, like the Delphi experiment, where AI learns from real-world examples and user input. The failure of Delphi demonstrates the dangers of relying too heavily on the bottom-up approach, which risks inheriting flawed human biases and intuitions.
Moving Forward
Most machine ethicists agree that the best way forward may be a combination of both methods. For example, Anthropic’s LLM, Claude, uses reinforcement learning to train the model based on ethical principles drawn from sources like the U.N. Declaration of Human Rights and Apple’s content moderation principles. This approach incorporates top-down ethical reasoning and bottom-up learning from real-world examples, leading to more consistent and thoughtful ethical judgments.
Delphi’s failure should be a valuable lesson for future AI ethicists and developers. Relying on crowdsourced moral and ethical judgments and unrefined data is insufficient for creating ethical machines. In their postmortem analysis of Delphi, philosophers Colin Allen and Brett Karlan correctly observe:
…the problems [with Delphi] go deeper than a lack of hindsight and foresight. Delphi demonstrates a kind of disciplinary hubris and lack of basic scholarship that afflicts too much (but not all) of the work by computer scientists in the expanding arena of ‘AI ethics.
Allen and Karlan note that the authors “display no awareness of the arguments for why neither approach alone will be enough,” and are generally “dismissive of the expertise outside their own field.” The authors are correct, and I encourage those designing AMAs to learn from the history of machine ethics. Basing AMAs on human moral intuitions and judgments alone will lead to disaster. Instead, it is necessary to incorporate rigorous moral principles to avoid the problems of Delphi and build more sophisticated AMAs in the future.