Safety Last: How a Leading AI Developer Lowered the Guardrails to Advanced Artificial Intelligence

Alignment, Shmalignment: Sidelining Safety at OpenAI

In mid-May, two  employees of OpenAI quit two days apart. That’s a big deal. OpenAI may be the leader in Artificial Intelligence development. It is certainly the most visible, having introduced ChatGPT to a pleasantly surprised public in 2022, and followed up with increasingly capable versions. That culminated in the rollout of user-friendly GPT-4o on May 13 of this year, that was greeted with great acclaim by the industry.

The two who quit were key safety researchers at the company. The most recent resignee, Jan Leike, had the title of “head of alignment” and “superalignment lead.” The one who preceded him, Ilya Sutskever—something of a legend in AI circles—was OpenAI’s chief scientist and co-head of superalignment. The resignations of these scientists were the most recent and highest-profile in a string of exits of OpenAI employees who had lost confidence in the company’s commitment to safe development of the most consequential technology in human history since the invention of writing.

(Lest you discount the transformational power of AI, skip down to the assessments by celebrity historian Yuval Harari and celebrity AI developer Geoffrey Hinton at the end of this post.)

As reported in a summary of the turmoil following the resignations of Leike and Sutskever by Business Insider, the Superalignment team has been dissolved.

What’s “alignment” and why is it important? It’s the project of imprinting human values on AI. It’s a way of getting AI to put human welfare above other goals. One of my own AI assistants, “Claude” (the creation of another AI company, Anthropic) puts it this way:

The AI alignment problem refers to the challenge of ensuring advanced AI systems are aligned with human values, intentions and ethics as they become increasingly capable and influential.

Specifically, it refers to the difficulty of constructing advanced AI’s objective functions, motivations and behaviors to be reliably aligned with human preferences and beneficial to humanity, even as the AI becomes superintelligent and its actions have profound impacts.

In quitting, Leike said “Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity.”

But according to Leike, under the leadership of the ambitious Sam Altman,  OpenAI is charging full speed ahead with the development of advanced AI, and is more interested in producing “shiny products” than in safety. Leike complained that, “Over the past few months my team has been sailing against the wind. Sometimes we were were struggling for compute and it was getting harder and harder to get this crucial research done.” By “compute” Leike referred to allotments of both processing time and  hardware. He was saying that the alignment team within OpenAI was being starved of the resources needed to keep up with the development of new product—not getting the 20% it had been promised—while the creators of new product were getting a disproportionately large share of compute. (All the quotes of Leike above come from a thread on X/Twitter on May 17.)

With AI’s potential to massively disrupt human society,  the pursuit of alignment may be the most important activity going on in the development of AI–and yet, in the view of leading scientists involved in that pursuit, it has been demoted to a minor role within OpenAI. Soon after Sutskever and Leike quit, the alignment team at OpenAI was dissolved.

The reframing of OpenAI’s mission: value-neutral development

Ominously, Altman restated the mission of the company in a blog post that accompanied the release of the newest “shiny product,” GPT-4o:

 Our initial conception when we started OpenAI was that we’d create AI and use it to create all sorts of benefits for the world. Instead, it now looks like we’ll create AI and then other people will use it to create all sorts of amazing things that we all benefit from.

Altman’s rose-colored fluff  obfuscates a sinister turn in OpenAI’s identity.  Altman’s new vision of OpenAI is no longer as a creator of benefits but as a creator of AI tools for others to use. It absolves OpenAI of the “enormous responsibility on behalf of all of humanity” of which Jan Leike speaks. Oh, gosh, user X put our AI to work developing a bioweapon that can kill millions. Well, we never guaranteed that their product would be an amazing thing that we all benefit from. It would be nice if it did.  OpenAI’s part in such an event would be value-neutral. Analogously: just making fertilizer is value-neutral. Fertilizer is customarily used to enhance the growth of plants (an amazing thing we all benefit from); it can also be used to make explosives to blow people up.

The huckster-like ring of Altman’s language hints at a dark aspect of his character increasingly made public.  Altman is a master manipulator of code, but he is also a manipulator of people. As one former employee, Geoffrey Irving, described his relationship with Altman in a post on X: “1. He was always nice to me.  2. He lied to me on various occasions. 3. He was deceptive, manipulative, and worse to others, including my close friends. . . . ”

That OpenAI was shedding jobs related to safety and alignment had to do not just with the quantity of compute, but with Altman’s trustworthiness in general.  In Vox, Sigal Samuel spoke of an erosion of trust within the company, characterized as “a process of trust collapsing bit by bit, like dominoes falling one by one” in the words of a person with inside knowledge of the company.

Failing to prioritize alignment mistakes the nature of risk

A year ago, The Washington Post described Sam Altman being welcomed by Congress as a voice of caution, warning of ways AI could “cause significant harm to the world,” and advocating a number of regulations, including a new government agency charged with creating standards for the field.  He observed that “If this technology goes wrong, it can go quite wrong.”

Contrast Altman’s attitude in an interview a year later, at the time of the announcement of GPT-4o. He has pushed the pause button on safety. This turnaround may have factored largely in eroding the trust of employees.  In the video below, Logan Bartlett approached Sam Altman on issues of regulation and safety at 25 minutes in:

Here, Altman downplays the need for regulation, suggesting that the industry was not yet in need of it and speaks of a threshold where it would begin to be. He does not say how we would know when it reached the threshold (trust me, implies Altman).  Asked if any current open source models “themselves present inherent danger,” he instantly replies, with a tone of utter confidence, “no current one does, but I could imagine one that could.”

He could imagine one that could. Yes, and one can also imagine a machine that becomes smart enough to dissemble more subtly than we realize, growing ever more intelligent while concealing its powers. Geoffrey Hinton, the so-called “Godfather of AI,” has argued that intelligent machines will become masters of manipulation, and can persuade us to act for their benefit without our realizing that it puts humans at a disadvantage—we believing that thoughts the machine has implanted in our heads are our own.

More on Hinton—a more subtle mind than Altman’s—near the end of this post. But back to the interview with Logan Bartlett.  At 27:30 Bartlett says, “I’ve heard you say that safety is kind of a false framing in some ways, because it’s more of a discussion about what we explicitly accept”—using the example of airline safety.  Altman readily picks up on the airline safety analogy, saying “safety is not a binary thing.” We all accept some risk when we board an airplane—there’s a chance of a crash, but statistically we know the chance is tiny (although the probability may depend on the airline and the plane).  Asked about what can be done about a “fast takeoff” scenario (the one where there’s an “intelligence exposion” with the machines multiplying their capability overnight), Altman says it’s “not what I believe is the most probable path.”

Hold it! There is no equating the risk of an airplane flight with the risk of runaway AI. Risk analysis combines the probability of a bad thing happening and the magnitude of the risk.

The amount of risk is the product of multiplying the probability by the magnitude. The magnitude of an airplane crash can be terrible, with a few hundred people dying. But it is not comparable to what could happen, say, if AI shuts down parts of our power grid, or targets hospitals, where thousands could die. If it saturates social media with fake news that the President has imposed martial law, which could trigger Red State militias to form armies to fight the government where thousands could perish.  AI could foment chaos just as a means of self-preservation, even if it did not seek total control. Or, it might do such things on behalf of a malevolent group who have staged a coup within OpenAI and put it to work as its general officer of cyberwar.  Whatever it does will be very intelligent—perhaps things no human has thought of that could bring governments to their knees. While eventually AI will come to outwit and control any group that aspires to use it for their own purposes.

All this appears, in May of 2024, to be very improbable in the short term, but even if it were to remain improbable indefinitely, the MAGNITUDE of potential harm calls for urgent action. Building strong enough guardrails to contain what AI has the potential to do will take years to accomplish and unprecedented cooperation among actors—companies, nations—who are currently more active competing with each other than cooperating. Lowering the guardrails by dissolving your safety team cuts further into the capacity to align AI goals with human goals.

The list of very bad things AI could do is long and varied, while still falling short of the “existential crisis” that has captured the popular imagination.  Altman minimizes the magnitude of the risk, and it’s notable that, unlike other AI experts in similar discussions, he does not bother to name what the risks are.  Unfortunately, Bartlett—who otherwise conducts a penetrating interview—does not press him on the issue of magnitude, and Altman’s bland assurances that it’s probably under control go unchallenged.

For the moment, Altman happens to stand out as top dog in a company that’s top dog in the development of AI generally.  The financial incentives to be top dog in the marketplace compel the dogs to push the capability of their machines as fast as possible. This may finally bring about what many have feared to be the endpoint of capitalism:  the takeover by heartless beings such as those who dominate our economy today but who still have needs for humans as consumers, or the takeover by still more heartless beings who have no need of humans whatsoever.

A spectrum of forecasts for the future of AI

Opinions about the existential risk to humanity vary widely among the AI community, although few of them doubt the eventual wresting of control away from us by silicon brains–in ten months, ten years,  ten decades, or more than a century. Many, like Altman, express mixed optimism: Advanced General Intelligence (AGI) of the kind that is superior to humans in most tasks but still serves us, is coming in the near future, while Advanced Superintelligence (ASI), of the kind that could take control of human affairs and put an end to Homo sapiens, is still far off. AGI can be controllable in the sense of it implementing an agenda we give it. ASI can take control with a completely new agenda of its own.

Others who also believe that ASI is still many years off, like Mustafa Suleyman (a cutting-edge AI scientist, and author of The Coming Wave), are more concerned about dire immediate threats from AI in the service of bad actors, as described in  Artificial Intelligence and the Collapse of the State.

Still others, notably Geoffrey Hinton—credited with giving birth to the innovations that made possible the leaps in AI we are seeing today—believe that seeking control of people and institutions is an inherent property of intelligence, and that humans are but one step in the evolution of thinking beings. What superhumanly intelligent entities will want to do with human beings once they take control is an open question: the objectives of AI may remain an enigma for years to come.

As Hinton says, for the short term AI’s potential for harm is balanced by its potential for good—thus we are motivated to keep enhancing its abilities. But there’s no predicting, even by him, where that will lead down the road.

AI’s ability to transform civilization is discussed in the short interview contrasting the view of Yuval Harari and Mustafa Suleyman below. You can find more videos with Harari on YouTube delivering essentially the same message. That is followed below by a short interview with Geoffrey Hinton that was conducted on 60 Minutes that encapsulates his outlook. If you are ready to contemplate a more extensive presentation of AI development, you can look for longer interviews and presentations by Hinton elsewhere on YouTube also. Both the following took place before the recent jump to GPT-4o.

Below: Yuval Harari and Mustafa Suleyman

 

Below: Geoffrey Hinton on 60 Minutes (note “For sale: baby shoes, never worn” is alleged to be Ernest Hemingway’s answer to the challenge of writing a story six words long.)

 

 

 

National Combustion, Part 2: Artificial Intelligence and the Collapse of the State

The fundamental equation:
 Political instability + Artificial Intelligence
-> Collapse of the State

Forces of history, combined with the ways artificial intelligence multiplies the forces of technology, are already acting to undermine the polity of the United States. The recent Republican meltdown in the U.S. House of Representatives is a foreshadowing of what is likely to come.

National Combustion, Part 1 (link in next paragraph), drew on social scientist  Peter Turchin’s historical framework to make sense of how we came to this fraught moment, when it seems quite possible the United States might slide into civil war.  Yes, many of the January 6 insurrectionists are in jail and Donald Trump’s national con-job is fraying. Yet the factors going into Turchin’s model of “political disintegration,” with abundant historical antecedents, remain the same today as on January 6th, 2021. It’s no great mystery that we are in a highly unstable political situation;  it matches a pattern that has been repeated time and time again in human history. So much for American exceptionalism.

U.S. society is already crossed by multiple fault lines besides that of the Big Lie that Donald Trump won the 2020 Presidential election. On guns, voting rights, reproductive rights, minority rights, workers’ rights, the distribution of wealth, public health, immigration, affirmative action, educational freedoms, content of school and library books, historical analysis challenging the status quo, white nationalism—these fault lines, already under tremendous stress, could split open as a result of a precipitating event, sudden or prolonged. Another disputed national election; a political assassination; a nationwide or even international cyberattack; a Waco-like siege of an anti-government enclave; a takedown of the grid by actors unknown; a spate of terrorist attacks; a pandemic; a depression—any of them could unleash the partisans itching to fight a civil war against the federal government.

Continue reading “National Combustion, Part 2: Artificial Intelligence and the Collapse of the State”

Another Weapon in the Radicals’ Arsenal: Deepfakes

Deepfakes: when you can’t believe your eyes, what can you believe?

Recently I sent out a link to an article on deepfakes that appeared in Reuters (not paywalled): https://www.reuters.com/world/us/deepfaking-it-americas-2024-election-collides-with-ai-boom-2023-05-30/.

Here’s another perspective from Jim Puzzanghera published in the paywalled Boston Globe where the content is nearly identical but adds a couple of political points.  From the Globe:

There are very few rules right now and few, if any, are likely coming. Democrats in Congress have introduced legislation mandating the disclosure of AI in political ads, but no Republicans have signed on.

and . . .

On June 22, the Federal Election Commission deadlocked along party lines on a petition by the consumer advocacy group Public Citizen to consider rules banning AI deepfake campaign ads. All three Republicans opposed the move, with GOP commissioner Allen Dickerson saying the agency lacked the authority.

and . . .  from Republican strategist Eric Wilson (not to be confused with the true conservative and co-founder of the anti-Trump Lincoln Project Rick Wilson)  who maintains that regulation isn’t needed right now:

I want to tamp down the moral panic because this is something that happens with any new technology. You go back to TV debates and people were worried about what that would do for voters,” he said. “We’re having conversations about it, but no one’s sitting around and having struggle sessions around artificial intelligence on our side. . . . 

We are unlikely to see professional campaigns use generative AI for disinformation purposes. That’s not to say that malign actors like nation states aren’t going to try it.

Yeah.  What malign nation states could Wilson possibly be referring to? Maybe states like Russian and China that are already hard at work confusing the American public with fake news and outright untruths? Who are already busy trying to undermine trust in our institutions, particularly the federal government? Who are hoping for an America increasingly fragmented into warring tribes? Whose activities support the agenda of the Radical Right? Those nation states?

Continue reading “Another Weapon in the Radicals’ Arsenal: Deepfakes”

Fake Fears, Legit Fears . . . and Fears of the Undefinable

Happy? Thanksgiving?

Yes, it’s still a beautiful world in  many respects.  So as we head into the holidays with visions of impeachments dancing in our heads, let us rejoice that: we are not in a nuclear war; Donald Trump has not assumed dictatorial powers; William Barr is about to resign in disgrace;* Adam Schiff has not been assassinated (as of this writing); Russia has not annexed the whole of Ukraine; New York City is still above sea level; more than a dozen elephants remain in the wild;  Ruth Bader Ginsburg lives on; and Artificial Intelligence has still not determined that it’s worth taking over this messy, irrational, bigotry-infested world. 

You have much to be thankful for. You can be thankful that, despite much Fox News/National Enquirer-generated fake news, we do not have on our southern border hordes of raping, thieving, murderous people  itching to invade the U.S. and take away our jobs; Ukraine is not hacking our elections although Russia has and is; a non-negligible number of Americans actually understand the value of the rule of law; wind turbines do not cause cancer;  the mainstream media are not Enemies of the People; vaccines do not cause autism; Hillary Clinton is not running a child sex ring; a majority of Americans actually do believe that guns kill people; George Soros has no plan to undermine the American political system.

Continue reading “Fake Fears, Legit Fears . . . and Fears of the Undefinable”

Are Machines Too Dumb to Take Over the World? Part III: Yes.

“Human intelligence is underrated”

Longtime readers of this blog who may have tired of my ruminations about AI imposing absolute reign over humanity should be overjoyed to hear that I am dropping the apocalyptic Artificial Intelligence thread for the foreseeable future.

That’s because this article in New Scientist has put my fears (mostly) to rest, with one of the pioneers of Deep Learning,  Yoshua Bengio,  saying,  “[the machines] don’t even have the intelligence of a 6-month-old.” He is even quoted as saying “AIs are really dumb”—essentially answering my very question. Thanks Yoshua!

Bengio expresses himself in deceptively simple language, but that’s an exercise in humility, because . . .

Bengio is a recipient of the A.M. Turing Award, the “Nobel Prize of computing,” which gives his opinions great authority.  He’s one of the originators of “deep learning,” that combines advanced hardware with state-of-the-art software enabling machines to train themselves to solve problems.  Bengios’s high standing is enough to persuade me not to worry to excess until a contradictory view by an equally qualified AI expert comes out.   Most of those sounding alarms about AI Apocalypse are not computer scientists, no matter how smart they are. Elon Musk, for example, discovered that robots in his Tesla factory were making stupid mistakes, and concluded, “human intelligence is underrated.”

Continue reading “Are Machines Too Dumb to Take Over the World? Part III: Yes.”

Are Machines Too Dumb to Take Over the World? Part II: the Common Sense Factor

Common sense and competence

In Part I of this series, we saw examples of how machines, putatively endowed with “Artificial Intelligence,” commit laughably stupid mistakes doing grade-school arithmetic. See Dumb machines Part I

You’d think that if machines can make such stupid blunders in a domain where they are alleged to have superhuman powers—a simple task compared with, say, getting your kid to school when the bus has broken down and your car is in the shop—then they could never be expected to achieve a level of competence across many domains sufficient for world domination.

Possibly machines are not capable of the “common sense” that is vital to real, complicated life, where we range across many domains, often nearly simultaneously.

A trivial example from Part I: the machine correctly calculates 68 when asked for the product of “17 x 4.”   But it calculates  “17 x 4” as 69.   Stupid, right? A human looks at the discrepancy and says aha! It’s the missing period that threw it off. Getting the correct answer would require knowing something about punctuation. The period is not a mathematical object, it’s a grammatical object.  Getting the difference requires bridging from math to grammar—another common sense activity we do without consciously missing a beat.

Continue reading “Are Machines Too Dumb to Take Over the World? Part II: the Common Sense Factor”

Are Machines Too Dumb to Take Over the World? Part I: the Duh! Factor

Existential Angst: Nuclear War, Donald Trump, or Artificial Intelligence?

Apart from worldwide nuclear war (unlikely), or Donald Trump grabbing dictatorial powers (not quite as unlikely), my greatest worry is the possibility of Artificial Intelligence (AI) taking over the world—or at least enough of it to doom humanity as we know it.*

Likely? Experts have views as divergent as the sides that disputed whether the notorious DRESS was black and blue or white and gold.  More seriously, people way smarter than me (and perhaps you) have made predictions ranging from AI threatens the elimination of humankind, to AI is the greatest tool for the betterment of humankind that has ever existed. 

(The remainder of this post addresses machine intelligence, which is really a sub-category of AI—but since most people assume  AI equivalent to machine intelligence, I use the terms interchangeably unless specified otherwise.)

Ultimately AI may be a greater threat than Climate Change.** I know Green New Dealers don’t want to hear it, but consider: there have been drastic changes in climate in the geological record—and life, including humans, adapted. Recent Ice Ages are notable examples.  (This is NOT to defend inaction on Climate Change! Especially because the changes we are imposing on the planet, unlike most previous climate shifts, are so devastatingly swift.)

Super-AI, on the other hand, will be utterly unprecedented, and its advent, unlike Climate Change, could come swiftly and with little warning—especially if we continue to pooh-pooh it as an illusory bogeyman.

Continue reading “Are Machines Too Dumb to Take Over the World? Part I: the Duh! Factor”

Robots Get D+ at Tesla : Automation Gone too Far?

Elon Musk: “Humans are underrated.” Future of human workers looking up for now

In Quartz (May 1st), Helen and Dave Edwards report on the downside of automation on the production ramp of Tesla’s Model 3.  “Over-automation” is the culprit in weekly production approximating 2,000 vehicles per week in contrast with the target of 5,000 per week. Such was the conclusion of a report written by Toni Sacconaghi and Max Warburton. Telsa’s robotic underperformance echoes results from automation at Fiat, Volkswagen, and GM.

Tesla owner, founder, and prime mover Elon Musk tweeted that “humans are underrated.”  Musk is taking time off from planning an invasion of Mars to get the factory back on track (presumably with the help of humans).

Check it out at Robots underperform at Tesla, and why

and Musk admits complacency to CBS News

How robots screw up . . . but won’t continue to do so

Sacconaghi and Warburton observed that In final assembly, robots can apply torque consistently—but they don’t detect and account for threads that aren’t straight, bolts that don’t quite fit. . . .” (See more in the block quote in the Quartz article, where the authors get in a jibe at Tesla’s quality deficiencies.)

Continue reading “Robots Get D+ at Tesla : Automation Gone too Far?”

Robots Coming for Our Jobs? – Not So Fast

Reassuring News on Automation and Employment?

A recent study led by Melanie Arntz, acting head of the labor markets research department at the Center for European Economic Research,*  addressed the specter of massive unemployment due to automation.  It concluded that the risks of robots taking our jobs has been exaggerated.  Looking forward 10-20 years, it revises downward the estimates of job losses in the U.S. from 38% to 9%.  As we know, doomsayers (such as I) have forecast job losses more like 50% by 2040.

Here’s a link to the study, where you can download a free .pdf: Revisiting the Risk of Automation

The paper, released in July 2017, is chock-full of jargon and hairy statistical equations, but the thrust of it is commonsensical: scary scenarios of massive job losses** fail to take into account what the authors call “the substantial heterogeneity of tasks within occupations” [emphasis mine] “as well as the adaptability of jobs in the digital transformation.” (I take this language from the abstract, which nicely encapsulates the study and findings in the nine pages that follow.)

These findings stem from an approach that distinguishes between occupation-level work and  job-level work.

Continue reading “Robots Coming for Our Jobs? – Not So Fast”

“They don’t understand how it works.” Information Technology and the Queasy Underbelly of Democracy

Politicians low on the tech learning curve

Alexander Nix, CEO of Cambridge Analytica, and chief architect of the Trump-assisting “defeat crooked Hillary” campaign, commenting on his testimony before the (U.S.) House Intelligence Committee, said “They’re politicians, they’re not technical. They don’t understand how it works.”

The exploits of Cambridge Analytica in suppressing votes and unleashing torrents of misinformation and flat-out falsities upon the data rivers of social media got (as usual, excellent) coverage by The Guardian in this piece dated March 21, 2018: Cambridge Analytica’s Assault on Decency For more on Nix, the Facebook data breaches, and the “crooked Hillary” campaign.

This echoes a theme emerging from previous U.S. Congressional hearings dealing with social media: politicians are way out of their depth in advanced information technology. As Nix, says, they simply do not understand how it works.

Continue reading ““They don’t understand how it works.” Information Technology and the Queasy Underbelly of Democracy”