Development
Spooked by Mythos, Trump suddenly realized AI safety testing might be good
May 7, 2026 Development Source: Ars Technica
Share this article
Some firms that have signed agreements have signaled confidence in CAISI’s testing plans. On LinkedIn, Tom Lue, Google DeepMind’s vice president of frontier AI global affairs, said he was “pleased” with CAISI’s testing plans. In a blog, Microsoft said that “testing for national security and large-scale public safety risks necessarily must be a collaborative endeavor with governments, while crediting the expertise “uniquely held by institutions like CAISI” to conduct such testing. xAI, which is currently fighting against OpenAI in a trial over which firm’s leaders care more about AI safety, did not immediately respond to Ars’ request to comment.
However, critics aren’t sold on the government’s plan to vet models and are increasingly dubious of firms whose AI model designs are largely kept secret.
Critics suggested that CAISI may lack the funding or expertise to evaluate frontier AI models. And as Trump seemingly suspects, seeking voluntary commitments from AI firms may not create the kind of day-to-day transparency the public needs about frontier AI risks, critics have warned. Further, any politicization of the evaluation process—like opposing the release of models whose outputs disfavor a certain administration’s political views—could decrease trust in AI. Unchecked, that could ultimately dissuade firms from signing agreements, since increasing trust is supposedly a key motivator driving the latest attempt at government collaboration.
In its rush to announce its partners, CAISI did not specify the testing standards that will be used for evaluations.
That could be a problem, according to a LinkedIn post from Devin Lynch, a former director for cyber policy and strategy implementation at the White House Office of the National Cyber Director:
“Pre-deployment evaluations with frontier labs are exactly the kind of public-private collaboration needed to build trust, safety, and security into AI. The harder question is what ‘evaluation’ actually means at the frontier. Capability assessments are only as good as the threat models behind them. Our research on the AI tech stack finds that the Governance layer—standards, audits, liability frameworks—remains the least mature but most essential. CAISI will need to define, and publish, what it’s testing for, not just who it’s testing with.”
In a statement provided to Ars, Sarah Kreps, director of the Tech Policy Institute at Cornell University, said that AI firms should be developing closer ties with the government as AI advances. However, “the definition of ‘safe’ is contested” and “once you build a government vetting process for technology, you get the good with the bad,” she said.
Without defining standards, “the process can be politicized,” Kreps said. That risks creating a system where “whoever holds power gets to shape how the vetting works.”
So far, neither the Biden nor the Trump administrations has figured out how to avoid that, Kreps said.
Rumman Chowdhury, an AI governance consultant and founder of Humane Intelligence, similarly criticized CAISI’s preparedness. Chowdhury told Fortune that “current White House efforts to offer ‘sensible oversight’ over frontier AI models may sound good, but the devil is in the details.”
“It depends on their interpretation of these words,” Chowdhury said. “Evaluations are a policy tool, they are not actually data-driven. My concern is that this is another political tool that the administration wants to own and wield.”
As for funding, Congress in January approved up to $10 million to expand CAISI, Fortune reported. However, conservative think tank America First Policy Institute conducted a recent analysis finding that “CAISI remains underfunded compared with peer institutes internationally and lacks ‘appropriate funding.’”
To critics, the CAISI testing plan may not go far enough to protect the public from the most unforeseeable AI risks. Falco maintains that only independent audits can spare the public from the worst outcomes.
“The danger is that government oversight becomes political, performative, or captured by the companies it is supposed to evaluate,” Falco said. “The opportunity is to build a practical audit framework that lets the US remain the global leader in AI while creating credible accountability around the most consequential risks.”
To Lynch, the bigger test may be whether Trump’s testing plan succeeds in its mission to evade risks and stoke more trust in AI systems, while keeping a light touch to avoid overregulating firms.
CAISI “is building something important here,” Lynch said. “The test will be whether these collaborations ignite innovation, protect national security, and produce AI that is both trusted and trustworthy.”