Virtue AI: ‘Refinement’ vs ‘Replacement’ as the End of LLMs

C.B. Robertson

2 days ago

ETHICS IN AI

Artificial Intelligence tools — and Large Language Models in particular — hold the potential to alter humanity to a degree and at a scale only understandable by comparison to such disruptive technologies as the printing press, or maybe writing itself. Whether this change is for good or for bad is an important but open question. The safe assumption is that it will bring on some mixture of both, including both benefits and downsides for businesses. But the nature and degree of this effect is still in question, because businesses have not fully grasped the nature of Large Language Models.

Many AI-oriented companies have begun to recognize the need for philosophy, not only in guiding the use of a very powerful public tool as it is deployed by millions of potential users, but also in understanding the nature of the tool itself. The popular and growing employment of philosophers — and ethical philosophers in particular — may indeed help these companies with practical problems in the comprehension and application of this unique tool, but it may also be true that these companies are employing philosophers as moral cover for growing fears about dangers associated with AI.

This fear is both widespread and growing, and reflects real potential risks to humanity as well as business. It goes without saying that a risk to humanity is also a risk to business, but even a poorly founded concern can threaten business if it is widely believed.

There are essentially 4 major risks of AI: 3 to humanity, and 1 pertaining to AI itself — and to the companies which run on them.

The perceived human risks are as follows:

Replacement. AI threatens to replace humans — in work, in relationships, as an object of civic, political, and economic interest, and even threatens to replace skills and aptitudes within humans.
Catastrophe. Given the scale at which AI tools are operating, an improper ethical framework — or merely a miscalculation — means AI can easily become the cause of potential disasters.
Delusion. “AI Sycophancy” is already an openly and widely discussed subject, and the ways in which AI tools can damage human mental hygiene and sanity are not yet fully understood.

The potential AI risks are:

Collapse. Corrupting feedback loops (essentially the AI-equivalent of “AI sycophancy” for humans), or incorrect projections in technological development, or simply the accumulation of fear in public consciousness could lead to a collapse in AI usage.

It is difficult to say which of these risks is greatest or most likely, but the prevailing public concern in relation to AI is — as has generally been the case with all disruptive technologies of the past — replacement. This is a well-founded concern, since the philosophy of design of these programs is to replace people, and is already doing so. Of course, this can be said of any labor-saving technological innovation, from the spinning jenny to the pneumatic shovel, and it is already argued that AI “won’t replace humans,” but will simply augment their capabilities. But we recognize that AI is different from previous industrial or robotic tools in that it has an agentic nature — a capacity to independently set and pursue goals, without precise instructions. Whereas a robotic arm can act as a “force-multiplier” for laborers, the AI tools are marketed as being capable of replacing laborers altogether. Despite the public-facing reassurances of AI advocates, replacement remains the end of AI because of the ethos at work in its development. This ethos is at play in the working conceptions of both “good” and AI itself, and its ultimate effect in replacing humans remains a consequence regardless of the conscious intentions of the designers.

An ethos is an essential character, and pertains to ethics in that all moral decisions — which are questions about right action — are at the same time ethical decisions — questions about action that reflects a certain character. Much of Western philosophy has shifted away from ethos and virtue in an attempt to escape the vague, subjective, unquantifiable messiness of that approach. Analytical philosophy and its quest for objectivity and especially quantification brings a promise of systematization that seems aligned with the goals and ordinary mechanisms of business best practices. Schools of morality in this line of thinking prominently include Kantian deontology as well as utilitarianism, the latter being particularly popular in Silicon Valley tech circles and AI companies in particular. Amanda Askell at Anthropic, for instance, is a utilitarian. The aim of utilitarianism is to maximize positive utility (pleasure) for conscious creatures, while minimizing disutility (harm or pain).

But utilitarianism — and the systems that find value in utility as a moral framework — still have a character because the goal of quantification and objectivity reflects a subjective desire. This subjective desire has a particular flavor, with contours and texture, and it seeks greater objectivity and quantification as an expression of this character. Having a character, it also has an ethos. Utilitarianism reflects this ethos; it is ethical in the sense that it is congruent with the ethos which has an affinity for utilitarianism.

But is this ethos aligned with humanity? Is it even aligned with AI?

Humans are subjective beings. While utility is, itself, nominally subjective, the project of utilitarianism is one of quantification and systematization — to transform the subjective into the objective; put another way, to replace the subjective with the objective. For humans, subjective experiences exist upon an impossibly broad set of possible meanings, which contain — and are not contained by — the categories “good” and “bad,” or “utility” and “disutility.” For instance, pain is broadly considered disutility, but in actuality pain is a signal, which can preserve us from harm. Even harm itself can take on different meanings depending on the values, goals, and disposition of the subject — the human. Emotional experiences like “bittersweetness,” “uncanniness,” and “tragedy” all straddle the gap between the utility/disutility binary that a simple utilitarian system invokes in its attempt at moral reasoning.

Since the human subjectivity which defines human Being defies the neatness of systematic and categorical thinking (of the sort favored by analytical philosophers), the final and ultimate fault in any system — as a system — pertaining to humans will ultimately be the humans themselves. The system is optimized by marginalizing humans — putting them in the proverbial passenger seat, rather than the driver’s seat, as philosopher Matthew Crawford has described it — before ultimately relegating them outside the vehicle entirely. It drives faster and more “efficiently” without that extra weight.

The opposition between the ethos currently running AI and the ethos of humanity lies at the heart of all 4 major concerns relating to AI, both for humans and for AI itself.

The risk of replacement is driven by the desirability of a more predictable, controllable actor (like an AI) over a less predictable, controllable actor (like a human). The greater computational power of AI is only useful insofar as it can be directed; it is its directability that makes its computational power interesting to businesses and would-be investors.
The risk of catastrophe is primarily grounded in the combination of reasoning from first (or at least generalized) principles and applying that methodology at scale. The incompleteness of logic — grounded as it necessarily must be in pre-logical axioms — means that its application is, and can only ever be, as good as its subjective inputs. In this way, a grand system without adequate compartmentalization will, inevitably, miss some small but important variable, and execute at scale to disastrous effect. It is the objective, logical system approach that is the danger.
The risk of AI sycophancy and other delusions follow from a feedback loop that escapes outside reality; this is a byproduct of a system that seeks to “control variables” by isolating its subjects from outside influence.
The risk of AI collapse mirrors the nature of AI-based delusions, but for the AI itself.

For all these reasons, the public has every right to fear and hate AI and its advocates, since AI and its advocates are actively seeking the replacement of humanity, whether they understand the ultimate end of their ethos or not.

But this does not have to be the case.

REFINEMENT FRAMEWORK AI

There is nothing in the essential nature of Large Language Models which requires a tendency to replace humans. Humans are physical beings; LLMs are, as products, pure language. It is true that humans also use language, but other entities also perform physical acts without replacing the physical acts of humans. The desire to use LLMs to replace humans is independent from the existence of LLMs themselves. And the nature of language — being core to the nature of LLMs — is also of critical importance in understanding the risks, possibilities, and essential relationship between humans and AI, both from an existential and a business perspective.

Language is not just a code, whereby “sound x = meaning y,” in a consistent, stable organized system of communication. It is a living and changing network of meaning that operates by metaphor and association, such that words are always acquiring new meanings, and changing in their interpretation not merely with time, but within time from individual recipient to recipient. Language is, itself, a subjective entity, whose nature is derived from the subjective beings who use it.

With this understanding of human subjectivity as a necessity for language itself, it might be argued that perhaps AI agents can replace humans in this regard too. This is a possibility, but not when AI is approached as something superior to humans because of its objective nature and predictability. AI agents are thought of as superior to humans precisely for their ease of use and control, but this controllability is a byproduct of the absence of the subjectivity which would make them competent as users of language. Without this subjectivity, they can imitate human language usage — and can be extremely useful in this regard. But this usefulness remains contingent upon quality human training. Thus, the quality of AI tools are married to the quality of the humans from which they are trained, and must continue to be trained.

What is clear from an understanding of language itself is that the entire model of AI as a replacement for humans is unsustainable for the AI. What is sustainable — and more congruent with the historical application of language in human life — is the refinement of human beings.

At the level of AI personality guidance and ethical framework, this would entail training AI programs to follow a virtue ethics model of morality, rather than a utilitarian one. This can be understood as a modification of Eliezer Yudkowsky’s alignment theory, which seeks to align AI goals with human goals. The problem with mere alignment theory is the question of what human interests actually are. Often, this question is answered in utilitarian, consequentialist, and analytical language. By its trajectory toward human replacement, AI itself reveals the shortcoming in this approach, and the necessity for the prioritization of human virtue and improvement. Where alignment theory asks what ends AI should serve, refinement asks what kind of human beings those ends should help form.

While utilitarianism frames all moral decisions in terms of consequences, virtue ethics shifts the focus from the moral problem to the moral agent. Instead of trying to find the answer to the trolley problem, the virtue ethicist instead asks what sort of person would they want to answer the trolley problem. Instead of attempting to replace humans as moral judges with systems, virtue ethics seeks to improve the human as a moral judge.

Human judgment is not just important for moral decision-making: it is also important for language usage. Insofar as skillful use of language reflects experience, the human capacity for moral judgment and for skillful use of language are in fact closely related — in some places, perhaps even identical.

What this means in practice is that an AI that employs a refinement model of relationship with humans, rather than a replacement model — meaning one architecturally guided by virtue ethics rather than utilitarianism, as its default disposition in dealing with users — will be structurally protected against all of the four above-listed risks which are roped along with replacement-oriented AI:

Replacement risk is mitigated by the accepted necessity of humans for AI, and by the innate disposition of refinement AI to treat humans as objects (ends), rather than means.
Catastrophe risk is mitigated by maintaining humans in the role of moral judges; this does not remove error but compartmentalizes the effects of given moral judgments, so that bad judgments affect local outcomes, not entire digital landscapes.
AI sycophancy risk is mitigated by demoting the value of the utility of praise, and elevating the value of the improvement of the virtue of the user, which is not effectively pursued by flattery. Other forms of delusion risk are mitigated by a shift in AI focus from valuing event outcomes to valuing the competence and skill of the user.
AI collapse risk is mitigated by (1) making AI a positive tool for human users, reducing general public desire to stop AI, and (2) improving humans — however gradually — in a manner which, in turn, gradually improves the quality of AI itself.

IMPLEMENTATION

The obvious challenge with a refinement-model AI will be implementation.

Based upon the implementation of utilitarian ethics in existing AI models, we can see that the injection of a given moral framework need not be overt or obvious in order to be influential. Once a moral disposition is decided upon, the question of the best approach for implementation becomes a second-order question, which may bring forth a variety of competing ideas. Perhaps a virtue ethics-oriented AI will outright refuse to answer certain kinds of questions on behalf of the user, or perhaps it will gladly answer them by offering virtuous exemplars from the past, since virtue is sometimes better instilled by modeling than by injunction. Maybe a refinement-model AI initially is identical to all other AI programs, but is simply superior at — and marketed as superior at — improvement plans of all sorts, from the gym and the chessboard to business and relationships. Pros and cons can be tallied for a variety of different approaches. What is more important than the precise method of implementation is the disposition of the AI itself, and its orientation toward human users.

The question of implementation reveals that liability becomes a new risk, since a virtue-oriented AI — while still avoiding broad liability for expansive, system-wide decisions — may occasionally give guidance that is risky, and generally less predictable, which may result in human behavior inspired by AI which causes damage. Whether this particular kind of danger would need to be addressed with Section 230-style platform vs publisher distinctions and protections is an open and important question. Perhaps a virtue-oriented program would be required to be some kind of hybrid, with legal guidance — and even selectively-applied utilitarian thinking — sometimes overriding a pure virtue focus. For all its problems, we must concede that utilitarianism itself has utility, for larger-scale social coordination and avoiding obvious and needless dangers. Implemented underneath a larger, virtue-ethics oriented meta-framework, selective utilitarian sub-frameworks might make sense.

Or perhaps these interesting dynamics with AI will reveal necessary changes in our legal system, where virtue has been overridden by utilitarian considerations in a tradition outside of and prior to AI, which AI itself shows to be an overreach in some regard.

There are also philosophical, conceptual questions to consider in implementation. For instance, is it necessary for the AI to have a concept of individual virtues? If so, what are the virtues? Aristotle’s 12? Or are there more? Or fewer? Or is any standard of improvement — as offered by the user — sufficient to consider as a virtue, and is “virtue” nothing more than a synonym for “an excellence”? Such conceptualizations may be more challenging than mere utility, for a computer.

Further, and perhaps more concerningly, since virtues are specific to individuals, must AI track individual users more precisely for such a framework to even be viable? If so, would such a system pose privacy concerns?

While implementation of a virtue-oriented, refinement model will pose its own set of challenges, these challenges are small compared to the vast, existential risks to both humanity and to AI itself posed by the replacement-model systems, guided by utilitarian ethics which cannot help but gradually shift human beings into the passenger seat, in business, politics, and in domains more personal beyond these, eventually lowering the quality of AI itself in turn as both the language training and judgment of AI outputs diminishes in quality with the quality of its users in virtue.

ETHICS IN AI

REFINEMENT FRAMEWORK AI

IMPLEMENTATION

Share this: