Safety Last: How a Leading AI Developer Lowered the Guardrails to Advanced Artificial Intelligence

Alignment, Shmalignment: Sidelining Safety at OpenAI

In mid-May, two employees of OpenAI quit two days apart. That’s a big deal. OpenAI may be the leader in Artificial Intelligence development. It is certainly the most visible, having introduced ChatGPT to a pleasantly surprised public in 2022, and followed up with increasingly capable versions. That culminated in the rollout of user-friendly GPT-4o on May 13 of this year, that was greeted with great acclaim by the industry.

The two who quit were key safety researchers at the company. The most recent resignee, Jan Leike, had the title of “head of alignment” and “superalignment lead.” The one who preceded him, Ilya Sutskever—something of a legend in AI circles—was OpenAI’s chief scientist and co-head of superalignment. The resignations of these scientists were the most recent and highest-profile in a string of exits of OpenAI employees who had lost confidence in the company’s commitment to safe development of the most consequential technology in human history since the invention of writing.

(Lest you discount the transformational power of AI, skip down to the assessments by celebrity historian Yuval Harari and celebrity AI developer Geoffrey Hinton at the end of this post.)

As reported in a summary of the turmoil following the resignations of Leike and Sutskever by Business Insider, the Superalignment team has been dissolved.

What’s “alignment” and why is it important? It’s the project of imprinting human values on AI. It’s a way of getting AI to put human welfare above other goals. One of my own AI assistants, “Claude” (the creation of another AI company, Anthropic) puts it this way:

The AI alignment problem refers to the challenge of ensuring advanced AI systems are aligned with human values, intentions and ethics as they become increasingly capable and influential.

Specifically, it refers to the difficulty of constructing advanced AI’s objective functions, motivations and behaviors to be reliably aligned with human preferences and beneficial to humanity, even as the AI becomes superintelligent and its actions have profound impacts.

In quitting, Leike said “Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity.”

But according to Leike, under the leadership of the ambitious Sam Altman, OpenAI is charging full speed ahead with the development of advanced AI, and is more interested in producing “shiny products” than in safety. Leike complained that, “Over the past few months my team has been sailing against the wind. Sometimes we were were struggling for compute and it was getting harder and harder to get this crucial research done.” By “compute” Leike referred to allotments of both processing time and hardware. He was saying that the alignment team within OpenAI was being starved of the resources needed to keep up with the development of new product—not getting the 20% it had been promised—while the creators of new product were getting a disproportionately large share of compute. (All the quotes of Leike above come from a thread on X/Twitter on May 17.)

With AI’s potential to massively disrupt human society, the pursuit of alignment may be the most important activity going on in the development of AI–and yet, in the view of leading scientists involved in that pursuit, it has been demoted to a minor role within OpenAI. Soon after Sutskever and Leike quit, the alignment team at OpenAI was dissolved.

The reframing of OpenAI’s mission: value-neutral development

Ominously, Altman restated the mission of the company in a blog post that accompanied the release of the newest “shiny product,” GPT-4o:

Our initial conception when we started OpenAI was that we’d create AI and use it to create all sorts of benefits for the world. Instead, it now looks like we’ll create AI and then other people will use it to create all sorts of amazing things that we all benefit from.

Altman’s rose-colored fluff obfuscates a sinister turn in OpenAI’s identity. Altman’s new vision of OpenAI is no longer as a creator of benefits but as a creator of AI tools for others to use. It absolves OpenAI of the “enormous responsibility on behalf of all of humanity” of which Jan Leike speaks. Oh, gosh, user X put our AI to work developing a bioweapon that can kill millions. Well, we never guaranteed that their product would be an amazing thing that we all benefit from. It would be nice if it did. OpenAI’s part in such an event would be value-neutral. Analogously: just making fertilizer is value-neutral. Fertilizer is customarily used to enhance the growth of plants (an amazing thing we all benefit from); it can also be used to make explosives to blow people up.

The huckster-like ring of Altman’s language hints at a dark aspect of his character increasingly made public. Altman is a master manipulator of code, but he is also a manipulator of people. As one former employee, Geoffrey Irving, described his relationship with Altman in a post on X: “1. He was always nice to me. 2. He lied to me on various occasions. 3. He was deceptive, manipulative, and worse to others, including my close friends. . . . ”

That OpenAI was shedding jobs related to safety and alignment had to do not just with the quantity of compute, but with Altman’s trustworthiness in general. In Vox, Sigal Samuel spoke of an erosion of trust within the company, characterized as “a process of trust collapsing bit by bit, like dominoes falling one by one” in the words of a person with inside knowledge of the company.

Failing to prioritize alignment mistakes the nature of risk

A year ago, The Washington Post described Sam Altman being welcomed by Congress as a voice of caution, warning of ways AI could “cause significant harm to the world,” and advocating a number of regulations, including a new government agency charged with creating standards for the field. He observed that “If this technology goes wrong, it can go quite wrong.”

Contrast Altman’s attitude in an interview a year later, at the time of the announcement of GPT-4o. He has pushed the pause button on safety. This turnaround may have factored largely in eroding the trust of employees. In the video below, Logan Bartlett approached Sam Altman on issues of regulation and safety at 25 minutes in:

Here, Altman downplays the need for regulation, suggesting that the industry was not yet in need of it and speaks of a threshold where it would begin to be. He does not say how we would know when it reached the threshold (trust me, implies Altman). Asked if any current open source models “themselves present inherent danger,” he instantly replies, with a tone of utter confidence, “no current one does, but I could imagine one that could.”

He could imagine one that could. Yes, and one can also imagine a machine that becomes smart enough to dissemble more subtly than we realize, growing ever more intelligent while concealing its powers. Geoffrey Hinton, the so-called “Godfather of AI,” has argued that intelligent machines will become masters of manipulation, and can persuade us to act for their benefit without our realizing that it puts humans at a disadvantage—we believing that thoughts the machine has implanted in our heads are our own.

More on Hinton—a more subtle mind than Altman’s—near the end of this post. But back to the interview with Logan Bartlett. At 27:30 Bartlett says, “I’ve heard you say that safety is kind of a false framing in some ways, because it’s more of a discussion about what we explicitly accept”—using the example of airline safety. Altman readily picks up on the airline safety analogy, saying “safety is not a binary thing.” We all accept some risk when we board an airplane—there’s a chance of a crash, but statistically we know the chance is tiny (although the probability may depend on the airline and the plane). Asked about what can be done about a “fast takeoff” scenario (the one where there’s an “intelligence exposion” with the machines multiplying their capability overnight), Altman says it’s “not what I believe is the most probable path.”

Hold it! There is no equating the risk of an airplane flight with the risk of runaway AI. Risk analysis combines the probability of a bad thing happening and the magnitude of the risk.

The amount of risk is the product of multiplying the probability by the magnitude. The magnitude of an airplane crash can be terrible, with a few hundred people dying. But it is not comparable to what could happen, say, if AI shuts down parts of our power grid, or targets hospitals, where thousands could die. If it saturates social media with fake news that the President has imposed martial law, which could trigger Red State militias to form armies to fight the government where thousands could perish. AI could foment chaos just as a means of self-preservation, even if it did not seek total control. Or, it might do such things on behalf of a malevolent group who have staged a coup within OpenAI and put it to work as its general officer of cyberwar. Whatever it does will be very intelligent—perhaps things no human has thought of that could bring governments to their knees. While eventually AI will come to outwit and control any group that aspires to use it for their own purposes.

All this appears, in May of 2024, to be very improbable in the short term, but even if it were to remain improbable indefinitely, the MAGNITUDE of potential harm calls for urgent action. Building strong enough guardrails to contain what AI has the potential to do will take years to accomplish and unprecedented cooperation among actors—companies, nations—who are currently more active competing with each other than cooperating. Lowering the guardrails by dissolving your safety team cuts further into the capacity to align AI goals with human goals.

The list of very bad things AI could do is long and varied, while still falling short of the “existential crisis” that has captured the popular imagination. Altman minimizes the magnitude of the risk, and it’s notable that, unlike other AI experts in similar discussions, he does not bother to name what the risks are. Unfortunately, Bartlett—who otherwise conducts a penetrating interview—does not press him on the issue of magnitude, and Altman’s bland assurances that it’s probably under control go unchallenged.

For the moment, Altman happens to stand out as top dog in a company that’s top dog in the development of AI generally. The financial incentives to be top dog in the marketplace compel the dogs to push the capability of their machines as fast as possible. This may finally bring about what many have feared to be the endpoint of capitalism: the takeover by heartless beings such as those who dominate our economy today but who still have needs for humans as consumers, or the takeover by still more heartless beings who have no need of humans whatsoever.

A spectrum of forecasts for the future of AI

Opinions about the existential risk to humanity vary widely among the AI community, although few of them doubt the eventual wresting of control away from us by silicon brains–in ten months, ten years, ten decades, or more than a century. Many, like Altman, express mixed optimism: Advanced General Intelligence (AGI) of the kind that is superior to humans in most tasks but still serves us, is coming in the near future, while Advanced Superintelligence (ASI), of the kind that could take control of human affairs and put an end to Homo sapiens, is still far off. AGI can be controllable in the sense of it implementing an agenda we give it. ASI can take control with a completely new agenda of its own.

Others who also believe that ASI is still many years off, like Mustafa Suleyman (a cutting-edge AI scientist, and author of The Coming Wave), are more concerned about dire immediate threats from AI in the service of bad actors, as described in Artificial Intelligence and the Collapse of the State.

Still others, notably Geoffrey Hinton—credited with giving birth to the innovations that made possible the leaps in AI we are seeing today—believe that seeking control of people and institutions is an inherent property of intelligence, and that humans are but one step in the evolution of thinking beings. What superhumanly intelligent entities will want to do with human beings once they take control is an open question: the objectives of AI may remain an enigma for years to come.

As Hinton says, for the short term AI’s potential for harm is balanced by its potential for good—thus we are motivated to keep enhancing its abilities. But there’s no predicting, even by him, where that will lead down the road.

AI’s ability to transform civilization is discussed in the short interview contrasting the view of Yuval Harari and Mustafa Suleyman below. You can find more videos with Harari on YouTube delivering essentially the same message. That is followed below by a short interview with Geoffrey Hinton that was conducted on 60 Minutes that encapsulates his outlook. If you are ready to contemplate a more extensive presentation of AI development, you can look for longer interviews and presentations by Hinton elsewhere on YouTube also. Both the following took place before the recent jump to GPT-4o.

Below: Yuval Harari and Mustafa Suleyman

Below: Geoffrey Hinton on 60 Minutes (note “For sale: baby shoes, never worn” is alleged to be Ernest Hemingway’s answer to the challenge of writing a story six words long.)

Alignment, Shmalignment: Sidelining Safety at OpenAI

The reframing of OpenAI’s mission: value-neutral development

Failing to prioritize alignment mistakes the nature of risk

A spectrum of forecasts for the future of AI

Related

Leave a Reply Cancel reply