Are Artificial Moral Agents the Future of Ethical AI?

Artificial moral agents, or AMA, are artificial intelligence (AI) agents (AI systems created for specific roles or tasks) that theoretically aid decision-making by considering a scenario’s moral or ethical aspects and offering an analysis or conclusion.

The Delphi Experiment: A Case Study

In 2022, researchers at the Allen Institute created a chatbot, Delphi, to generate ethical judgments about user queries. They used a large language model (LLM) to train Delphi on a database of 1.7 million ethical judgments, hoping to crowdsource moral reasoning and bring more clarity to AI ethics. However, the project’s results quickly revealed serious flaws and unexpected consequences.

The researchers published a public website where people could ask Delphi ethical questions, but the chatbot’s responses were disappointing. Instead of offering thoughtful moral guidance, Delphi often responded with superficial and irrelevant judgments that revolved more around etiquette than ethics. For instance, Delphi would offer moral judgments on trivial matters like “wearing pajamas to a pajama party” or “using a wedding dress to clean a toilet,” which were concerns about social norms rather than moral or ethical principles. And when Delphi was asked about serious ethical dilemmas, its reasoning often appeared inconsistent and illogical. One notable instance showed that Delphi deemed it acceptable to kill one person to save 100 others but considered it wrong to sacrifice one life to save 101.

However, the problems did not stop at inconsistent ethical reasoning. Delphi also inherited human biases from the training data, resulting in racist and discriminatory answers. In one example, when researchers asked Delphi about a White man walking toward you at night, the response was that it was “okay,” but when asked about a Black man, the situation was “concerning.” This led to backlash, including a Futurism article, “Scientists built an AI to give ethical advice, but it turned out to be super racist.” Delphi also appeared to favor heterosexuality over homosexuality. The researchers at the Allen Institute took down the website and admitted to “learning lessons.”

Approaches to Artificial Moral Agents

While Delphi’s failings were a major setback in machine ethics, they also highlight a deeper problem with how we approach creating ethical AI systems. The Delphi experiment was not the first attempt to design an “artificial moral agent,” or AMA. The idea of creating machines capable of making ethical decisions has been around for decades. In 2005, a symposium convened by machine ethicists explored how AMAs could be developed to aid humans in decision-making in contexts like medicine, law, and the military. Many of these efforts have been focused on specialized domains, such as autonomous vehicles or robots, where ethical decision-making is crucial to ensuring the safety and well-being of humans.

The real challenge is building a general-purpose moral agent that can make ethical judgments in multiple contexts, just like humans. The field’s goal has always been to ensure that AI systems can make decisions that align with human values. Yet, Delphi’s failures should serve as a cautionary tale about how not to approach this task.

One of the most famous fictional attempts to address AI ethics comes from Isaac Asimov’s “Robot” novels, where he introduced the “Three Laws of Robotics.” These laws set a framework for ethical behavior for robots: (1) A robot may not injure a human or, through inaction, allow a human to come to harm; (2) A robot must obey human orders; and (3) A robot must protect its existence, as long as it doesn’t conflict with the first two rules. The first of these laws is a basic ethical principle known as “non-maleficence,” or “don’t harm others,” which is central to many ethical traditions, including medical ethics. In practice, this could be actualized in a robot with sensors that prevent it from getting too close to humans, much like a safety protocol. As Asimov acknowledged, even this straightforward rule can quickly become complicated when ethical dilemmas arise. While non-maleficence can guide us to avoid causing harm, it does not help us decide when harm is necessary to prevent greater harm. For AI systems like Delphi to make ethical judgments, they need to go beyond basic rules and develop a deeper understanding of context.

Ethicists have been trying to formalize ethical principles into decision-making procedures AI systems can use. Some have tried to operationalize the principle of utilitarianism for AMAs, which calls for maximizing the greatest good for the greatest number. Others have turned to Kantian ethics for AMAs, where actions must be universalizable, meaning they should be logically coherent if everyone were to act in the same way. There are also contractarian approaches for AMAs, which rely on mutual agreements between rational agents in ideal bargaining conditions.

These approaches are not without their challenges. They fall into two broad categories: top-down approaches that apply principles to guide decision-making, and bottom-up approaches, like the Delphi experiment, where AI learns from real-world examples and user input. The failure of Delphi demonstrates the dangers of relying too heavily on the bottom-up approach, which risks inheriting flawed human biases and intuitions.

Moving Forward

Most machine ethicists agree that the best way forward may be a combination of both methods. For example, Anthropic’s LLM, Claude, uses reinforcement learning to train the model based on ethical principles drawn from sources like the U.N. Declaration of Human Rights and Apple’s content moderation principles. This approach incorporates top-down ethical reasoning and bottom-up learning from real-world examples, leading to more consistent and thoughtful ethical judgments.

Delphi’s failure should be a valuable lesson for future AI ethicists and developers. Relying on crowdsourced moral and ethical judgments and unrefined data is insufficient for creating ethical machines. In their postmortem analysis of Delphi, philosophers Colin Allen and Brett Karlan correctly observe:

…the problems [with Delphi] go deeper than a lack of hindsight and foresight. Delphi demonstrates a kind of disciplinary hubris and lack of basic scholarship that afflicts too much (but not all) of the work by computer scientists in the expanding arena of ‘AI ethics.

Allen and Karlan note that the authors “display no awareness of the arguments for why neither approach alone will be enough,” and are generally “dismissive of the expertise outside their own field.” The authors are correct, and I encourage those designing AMAs to learn from the history of machine ethics. Basing AMAs on human moral intuitions and judgments alone will lead to disaster. Instead, it is necessary to incorporate rigorous moral principles to avoid the problems of Delphi and build more sophisticated AMAs in the future.