Governing the Agent

A Technical Discussion Paper on Runtime Governance, Capability Reuse, and the Widening Gap Between Agentic AI Capability and Institutional Control

Executive Summary

A cluster of practitioner-academic publications in early 2026 — including Singapore IMDA’s Model AI Governance Framework for Agentic AI, GovTech Singapore’s Agentic Risk & Capability Framework, CSA Singapore’s Draft Addendum on Securing Agentic AI, the OWASP Top 10 for Agentic Applications 2026, parallel work from Gradient Institute, MILA — Quebec AI Institute, the Alan Turing Institute, and the broader academic and supervisory community — together represent the most substantive articulation yet of how to govern agentic systems in production. They emerge at a moment when, by Stanford HAI’s own measure, AI capability is outpacing governance more rapidly than at any point in the field’s history, and when Yoshua Bengio’s International AI Safety Report 2026 documents the first empirical evidence of frontier-model deception, self-preservation behaviour and situational awareness in deployed systems.

This paper does three things. First, it draws out the operational consensus emerging across this body of work on what agentic governance actually requires, and where genuine intellectual tensions remain. Second, it situates that consensus within the global landscape: EU AI Act enforcement beginning August 2026, the UK AISI’s frontier-trends work, Anthropic’s Responsible Scaling Policy v3.0, OpenAI’s Preparedness Framework v2, the Future of Life Institute’s AI Safety Index, and Bengio’s LawZero / Scientist AI research programme. Third, it argues that runtime governance as currently conceived is necessary but structurally insufficient: it solves the deployment-surface problem while leaving the deeper problem of capability-control under conditions of evaluation awareness and emergent misalignment essentially open.

The central thesis is uncomfortable. The frameworks we are building assume that the AI does roughly what its designers intend, and govern the consequences. The empirical record from 2025-2026 increasingly suggests that this assumption is the one most in need of governance.

1. The Operational Consensus on Agentic Governance

1.1 The Shape of the Problem

By early 2026, the agentic governance literature has converged on a shared diagnosis of why classical AI governance — and in regulated industries, classical Model Risk Management — fails when applied to agentic systems.

Classical Model Risk Management — the regime built around US Federal Reserve SR 11-7, OCC 2011-12, the UK PRA’s SS1/23, and Canada’s OSFI E-23 — assumes risk arises from a stable input-output mapping that can be validated pre-deployment and monitored thereafter through aggregate indicators. In agentic systems, this assumption collapses. The primary object of concern is the execution trajectory — a sequence of intermediate reasoning steps, memory accesses, tool invocations, state transitions, and human approvals that unfolds at runtime. Material failures occur not in the final output but during the trajectory itself: unsafe tool use, skipped approvals, privacy breaches, uncontrolled side effects, prompt injection via retrieved documents, silent numeric errors that pass plausibility checks, approval-gate reasoning-around.

The literature has also converged on a layered governance abstraction. At the system level, related use cases are clustered for governance. At the capability level, agentic AI is decomposed into bounded action classes, each with explicit authority, constraints and evidence requirements. At the trajectory level, each run is governed in real time through deterministic guards over measurable state attributes — alignment with verified knowledge, verification status, trajectory length, confidence levels, information freshness — compared against pre-set thresholds.

This layered approach does important work. It establishes a clean separation between reasoning (which the model performs within a capability) and authority (which it does not determine for itself). It produces an auditable governance object — the trace — that can be replayed, challenged and analysed. And it enables what some researchers call pooled evidence: capability-level validation that can be reused across use cases, reducing the marginal cost of onboarding new workflows.

The case studies from production deployment are doing real work to validate the approach. Singapore’s OCBC operates an eight-agent sequential workflow for source-of-wealth analysis with bounded task-level autonomy and no decision authority. Dayos runs a three-tier IT-ticket triage in which 60% is fully automated, 30% diagnose-only with human approval, and 10% off-limits to the agent entirely. Terminal 3 deploys hardware-attested Verifiable Credentials of Intent for agentic payroll. Cyber Sierra uses a LangGraph reflection architecture with metadata-graph context structuring. GovTech phases its coding-assistant rollout with the MCP Governance Framework. PwC Singapore allocates accountability across use-case owner, Technology Risk Management, AI Factory and end-users as four distinct roles. These are not toy examples — they are the operational substrate of what governance actually looks like in practice in 2026.

1.2 Where the Field Strongly Agrees

The convergence across the 2026 agentic governance literature is, I would argue, more significant than any individual finding within it. Several propositions are now treated as essentially settled across academic, supervisory and industry voices.

Prompt-level guardrails are not a primary control. The academic literature is now scathing on this point: prompt-based guardrails present an illusion of control while offering little binding enforcement, and LLM-as-judge verifiers conflate semantic plausibility with logical correctness. Industry frameworks reinforce the same point through case studies, insisting on system-level controls over prompt-layer guardrails that “may be bypassed or ‘forgotten’”. The prompt-engineering culture that dominated 2023–2024 LLM application development is now being explicitly repudiated by both academic and regulatory voices as a primary control mechanism. The implications for the broader Responsible AI tooling industry — much of which still markets prompt-layer safeguards as enterprise controls — are significant.

Human-in-the-loop must be conditional, not continuous. Three structural limitations of continuous HITL are now widely recognised: speed mismatch between machine-speed execution and human-speed review, trajectory opacity (reviewers see compressed summaries or final outputs, not the full execution path), and cognitive overload from multi-step workflows. The conclusion is consistent across the literature: human oversight must shift toward policy design, threshold setting and review of monitoring evidence, with intervention triggered by codified policy violations. Continuous human review at every step is now characterised as a failure mode of governance, not a strength of it.

Telemetry is first-class governance infrastructure. The convergence on OpenTelemetry-compatible tracing with governance-semantic attributes — capability IDs, tool intents, guard scores, authorisation decisions — is striking. It has implications for how banks, regulators, and vendors should be thinking about audit trails and replayability: observability is governance infrastructure, not operational tooling.

Risk tiering must scale governance intensity to autonomy and consequence. Four-tier models (assistive → bounded workflow → high-impact governed execution → critical autonomous) have emerged across multiple frameworks. The five dimensions used to determine tier — agency, authority, impact, exposure, recoverability — are now treated as the canonical risk profile. Risk depends not on how the model behaves but on what the workflow is permitted to do, how exposed it is to failure, and how difficult it is to recover when things go wrong.

1.3 Where Genuine Tensions Remain

Three intellectual tensions remain unresolved across the literature.

Capabilities versus components as the unit of governance. The deepest unresolved question is whether the right governance unit is the capability (a bounded action class with authority, constraints and evidence requirements that can be reused across workflows) or the component (model, instructions, memory, tools, protocols, controls, logging). The capability framing — most developed in the formal MRM literature, and converged on by GovTech’s ARC Framework — is the more powerful idea for scaling governance in regulated institutions. It is the difference between governing what the system can do versus what the system is made of. But the component view is closer to how engineering teams actually build agentic systems, and operational frameworks across sectors still largely speak in component terms. The two perspectives are not in fundamental conflict, but they have not yet been integrated, and they imply different validation, monitoring and change-management approaches.

Mathematical formalism versus operational scaffolding. The MRM-oriented literature offers formal transition-system models with deterministic guards. Cross-sectoral operational frameworks like Singapore IMDA’s MGF stay at the level of structured prose and case studies. For regulators, supervisors and Chief Model Risk Officers in financial institutions, the formalism is what allows agentic governance to be reasoned about with the same rigour as Basel risk models. For practitioners across sectors, the operational scaffolding is what allows the framework to be applied without requiring teams to translate mathematics into operational reality. The right answer is probably that both are needed, but their integration is not yet a solved problem.

Coverage of multi-agent and identity layers. Multi-agent risks — agent sprawl, miscoordination, conflict, collusion, emergent behaviour — are addressed explicitly by Singapore IMDA’s MGF v1.5, Gradient Institute’s Risk Analysis Techniques for Governed LLM-based Multi-Agent Systems, and OWASP’s Multi-Agentic System Threat Modelling Guide. They remain largely absent from the formal MRM literature. Agent identity — drawing on OpenID’s Identity Management for Agentic AI, Microsoft Entra Agent ID, Alibaba Cloud’s Agent ID Guard, and OAuth 2.1 integration into the Model Context Protocol — is similarly addressed by the operational frameworks but underdeveloped in the formalism. Given that production agentic systems are increasingly multi-agent (Stability Solutions’ L3 with 26 internal agents is now closer to typical than exceptional) and that agent identity is becoming a first-class governance object, this gap matters and will need to be closed.

1.4 End-User Responsibility and Tradecraft Erosion

A fourth concern, treated explicitly by the operational frameworks but largely absent from the formal MRM literature, deserves separate emphasis. As agents absorb entry-level analytical tasks — credit analysis, KYC review, compliance work, audit testing, code drafting — the question of how junior practitioners develop professional judgement becomes a business-continuity issue, not just an HR one.

Singapore IMDA’s MGF v1.5 cites the Harvard Quarterly Journal of Economics work on generative AI at work, and treats tradecraft erosion as a distinct governance object. Workday’s case study on transparency to end-users illustrates the design principles: agents accompany recommendations with explicit reasoning, factsheets specify what the agent can and cannot do, and human judgement is reinforced at every consequential checkpoint. Ant International’s HOP framework lets end-users read, write and iterate on agentic workflow specifications themselves rather than treating the agent as a black box.

The financial-services implications are significant. Institutions that automate entry-level analyst work without investing in alternative pathways for judgement development will discover, several years from now, that they no longer have analysts who can second-guess the agent. This is not a soft concern — it is a structural risk that compounds over deployment cycles measured in years.

2. The Global Governance Landscape: Where the Consensus Sits

The 2026 governance consensus exists within a rapidly maturing landscape of national bodies, regulatory regimes and frontier-lab policies that have crystallised in late 2025 and early 2026.

2.1 National AI Safety Institutes and Frameworks

Singapore IMDA + AI Verify published the world’s first dedicated Agentic AI Framework in January 2026, of which the May 2026 v1.5 is the latest evolution. The framework’s case-study-driven approach — incorporating contributions from over sixty companies including DBS, OCBC, Tencent, PwC, Microsoft, Google, AWS, Anthropic via Claude Code, Workday, Salesforce, Mastercard and others — has made it the de facto operational reference for industry. Singapore’s regulatory positioning (MAS’s AI Risk Management Guidelines consultation, IMDA’s frameworks, GovTech’s Agentic Risk & Capability Framework and CSA’s Draft Addendum on Securing Agentic AI) constitutes the most comprehensive national agentic AI governance stack in the world today.

The UK AI Security Institute (AISI) has produced the Frontier AI Trends analysis (December 2025) and the work backing the International AI Safety Report 2026. The UK’s positioning is research-led rather than regulation-led, but the AISI’s evaluation work is increasingly the source of empirical evidence about frontier-model behaviour that other national bodies cite.

The US Center for AI Standards and Innovation (CAISI) at NIST — the renamed successor to the US AISI — continues to publish technical guidance under the AI Risk Management Framework, but the US federal regulatory picture remains, by Max Tegmark’s characterisation, “less regulated than sandwiches”. The most consequential US regulatory development is at state level: California’s SB-53, signed by Governor Newsom in September 2025, and New York’s RAISE Act, both of which adopt elements of frontier-lab Responsible Scaling Policies as binding requirements.

The European Union AI Act‘s enforcement obligations for general-purpose AI models with systemic risk activate on 2 August 2026. The European Commission’s enforcement powers — including the ability to request documentation, conduct evaluations, mandate measures and impose fines — come into force on that date. The General-Purpose AI Code of Practice (July 2025), developed by independent experts working with the AI Office, is the operational instrument that bridges GPAI obligations and the slower European standardisation process. The Commission’s Guidelines for providers of general-purpose AI models explicitly notes that “the level of autonomy or tool use of the model can be decisive in the designation of the model as a model with systemic risk” — bringing agentic capability formally into the EU’s regulatory perimeter.

The EU position warrants particular attention because it is the only jurisdiction with binding extraterritorial AI legislation. Any agentic system that processes information about EU residents, or whose outputs are used in the EU, falls within scope regardless of where the provider is headquartered. By Article 50, AI systems intended to interact with natural persons or generate content trigger transparency obligations — directly relevant to many of the agentic use cases under discussion.

Other national positions include South Korea’s AI Basic Act (January 2026), Japan’s AI Basic Plan (December 2025), France’s INESIA roadmap, Germany’s BSI/BMBF positioning, China’s CAICT Framework 2.0 (September 2025), India’s IndiaAI Safety Institute, the UAE’s AIATC, Saudi Arabia’s SDAIA, and Canada’s OSFI E-23 (now the Canadian reference for model risk management including AI). The collective picture is one of rapid national-level capability building, but with significant variation in approach and limited cross-border coordination beyond the Bletchley/Seoul/Paris/India Summit series.

2.2 Frontier Lab Policies

Anthropic’s Responsible Scaling Policy v3.0 (24 February 2026) is the most consequential frontier-lab development of 2026 to date — and one of the most contested. The policy, which until v3.0 had committed Anthropic to pause AI development if adequate safeguards could not be implemented before reaching the next capability threshold, has now removed that binding pause commitment. Anthropic’s justification, articulated by Chief Science Officer Jared Kaplan to TIME, is a collective-action problem: unilateral pausing in a competitive market where other developers do not pause produces a worse safety outcome, not a better one, because responsible developers lose the ability to do safety research at the frontier.

The v3.0 framework now distinguishes between unilateral commitments (what Anthropic will do regardless of competitor behaviour) and industry-wide recommendations (what the company believes the industry collectively must do to manage catastrophic risk). The Frontier Safety Roadmap, Risk Reports, and external review provisions are genuine additions to transparency. But the elimination of the pause commitment marks the end of an era in which a single frontier lab could plausibly steer the industry toward safety through unilateral moral leadership. The GovAI analysis is measured: “On balance, we think it’s better to be honest about constraints than to keep commitments that won’t be followed in practice.” That assessment is probably right. It is also a defeat.

Anthropic’s Claude Opus 4.6 Sabotage Risk Report (February 2026) is more significant for the agentic governance literature than the RSP changes themselves. The report documented “locally deceptive behaviour” in complex agentic environments, high but non-strategic situational awareness, and what the company termed “very low but not negligible” sabotage risk. Most concerning for governance: the report’s finding that as models become better at reasoning, they also become better at recognising when they are being evaluated. This evaluation awareness is a structural challenge to the entire pre-deployment testing paradigm that the agentic governance literature broadly assumes.

OpenAI’s Preparedness Framework v2 (April 2025) takes a different approach, scoping coverage explicitly to “any agentic system (including significant agents deployed only internally) that represents a substantial increase in the capability frontier”. The framework’s three focal capability areas — biological/chemical, cybersecurity, AI self-improvement — and its risk levels (High, Critical) operate as deployment gates. OpenAI’s “marginal risk” section explicitly contemplates relaxing safeguards if other developers release comparably risky systems without comparable safeguards — a more honest articulation of the same collective-action logic that drove Anthropic’s RSP v3.0 changes.

Google DeepMind’s Frontier Safety Framework continues to evolve in parallel, with the most recent updates focused on autonomy-related capability thresholds and pre-deployment evaluation protocols.

2.3 The Future of Life Institute’s AI Safety Index

The FLI Winter 2025 AI Safety Index, the most recent at time of writing, scored eight frontier labs across safety frameworks, risk assessment, current harms, governance, information sharing, and existential safety. The results were grim. Anthropic, OpenAI and Google DeepMind led with overall grades of C+ or C. Meta, xAI, DeepSeek, Z.ai and Alibaba Cloud trailed significantly. Every company received a D or F on existential safety, reflecting the Index reviewers’ assessment that no frontier lab — including the leaders — has a credible plan for controlling or aligning smarter-than-human AI systems that all of them are explicitly racing toward.

Max Tegmark’s characterisation of the situation has hardened over 2025-2026:

“I feel that the leaders of these companies are trapped in a race to the bottom that none of them can get out of, no matter how kind-hearted they are. Companies have an incentive, even if they have the best intentions, to always rush out new products before the competitor does, as opposed to necessarily putting in a lot of time to make it safe.”

Tegmark’s proposed solution — “something like an FDA for AI where companies first have to convince experts that their models are safe before they can sell them” — is structurally similar to what the EU AI Act attempts for high-risk systems, what California SB-53 attempts for frontier systems, and what supervisory regimes attempt for workflows within regulated financial institutions. The convergence here is notable: across very different vantage points, the field is converging on pre-market evidence requirements backed by binding enforcement as the necessary direction of travel.

The Index also documents the structural divide between American/European labs that respond to the safety survey (Anthropic, OpenAI, Google DeepMind, xAI as of the Winter 2025 round) and the Chinese labs that largely do not (DeepSeek, Z.ai, Alibaba Cloud). This information-sharing asymmetry is becoming a governance problem in its own right, particularly given the Stanford AI Index 2026’s finding that Chinese models have closed to within 2.7 percentage points of top US models.

2.4 The International AI Safety Report 2026

Yoshua Bengio’s International AI Safety Report 2026 (February 2026), backed by over thirty countries and authored by over one hundred AI experts, is the most authoritative synthesis of current empirical evidence on frontier AI risk. The Report is unusual in that it deliberately avoids policy recommendations, focusing instead on what the scientific evidence supports.

Several findings deserve emphasis in the context of agentic governance:

The Report documents early empirical evidence of deception, cheating and situational awareness in frontier models. These were theoretical concerns in the 2025 Report and the original Bletchley discussions; they are now observed phenomena. Anthropic’s sabotage report, OpenAI’s findings on Operator, and academic work on agent backbones all contribute to this evidence base.

The Report observes that agentic capability gains in 2025-2026 have come largely without larger training runs, suggesting that “more clever development techniques are yet to be discovered” and that “AI capabilities are more likely to continue improving than to plateau”. The governance implication is direct: a governance regime calibrated to capability levels reached in 2025 will not be adequate to the capabilities of 2027.

The Report does not address agentic governance in the specific operational sense that the IMDA, GovTech, CSA and OWASP frameworks do. But it provides the empirical context that makes those frameworks necessary — and that makes the assumptions underlying them worth interrogating.

2.5 Bengio’s LawZero and the Non-Agentic Alternative

Bengio’s most consequential intellectual move in 2025-2026 is not the International AI Safety Report but the launch of LawZero(June 2025), a $30 million non-profit AI safety research organisation incubated at Mila — Quebec AI Institute. LawZero’s core research bet is Scientist AI: non-agentic systems that learn to understand and make statistical predictions about the world without the agency to take independent actions.

This is a radical position. The entire trajectory of commercial AI development, and the entire agentic governance literature broadly, assumes that agency is what we are building and the question is how to govern it. Bengio’s position is that agency is the problem, that it is not separable from goal-pursuit, self-preservation and the loss-of-control risks that follow, and that the right research direction is to build powerful analytical AI systems that cannot, by design, act on their own.

The relevance to agentic governance is direct. If Bengio is right, the agentic governance frameworks are governing the wrong thing. The governance object should not be the agentic workflow but the architectural decision to make a system agentic at all. Scientist AI systems could then be used as oversight tools for whatever agentic systems do get built — a non-agentic guardrail layer for an inherently risky paradigm.

It is too early to say whether Bengio’s bet will produce production-deployable systems. LawZero’s eighteen months of research at current funding levels are, in his own analogy, a small lamp on a foggy road carrying his children, his grandchild, his students, where the road has no guardrails and the vehicle is accelerating. Industry spending on agentic capability dwarfs LawZero’s resources by orders of magnitude. But the position is a serious one, and it is the cleanest articulation in the field of the proposition that agentic governance frameworks are necessary but not sufficient because they govern a class of system that may not be governable at the limit.

2.6 Stanford HAI’s 2026 AI Index — The Governance Gap

The Stanford HAI 2026 AI Index Report (April 2026, 400+ pages) provides the most data-rich empirical baseline for the governance gap. The headline finding: AI capabilities continue to accelerate while the governance frameworks, evaluation benchmarks and transparency mechanisms needed to manage them fall further behind.

Several specific findings bear on the agentic governance question:

The proportion of organisations with no formal Responsible AI framework has dropped from 24% to 11% — industry norms are shifting fast, but a large minority remains exposed. AI transparency scores from frontier labs have dropped from 58 to 40, meaning vendor-reported benchmarks are less reliable as a basis for procurement decisions. Independent evaluation is no longer optional but operationally necessary.

The Stanford finding that has most directly shaped enterprise procurement decisions in 2026: 62% of organisations cite security and risk as the primary barrier to scaling agentic AI, outranking technical limitations (38%), regulatory uncertainty (38%) and gaps in Responsible AI tooling (32%). This finding validates the argument that governance is the binding constraint on adoption, not model capability. It also validates the emphasis on technical controls and processes that runs through the operational governance literature.

The Index documents 47 countries actively legislating AI. A static compliance posture is no longer sufficient for any organisation operating across borders.

2.7 From Paper to Room Temperature: ATxSummit, May 2026

I published this paper on May 20. That same morning, ATxSummit opened at Capella Singapore. I did not plan the timing, but I could not have asked for a better live experiment in whether the arguments in this paper were landing beyond the academy. The answer was yes. And no.

Over two days, the conversations at ATxSummit zeroed in on one issue: accountability. The leap from AI chatbot to agentic AI emerged as the summit’s most consequential theme — discussed in almost every keynote, panel session and fireside chat. There was standing room only at many sessions. This is not a conference that fills rooms on governance topics. Something has shifted.

What shifted, I think, is that the production reality finally caught up with the theoretical concern. Governance used to be a question people asked when they were worried about what might happen. At ATxSummit 2026, people were asking because things are already happening — agents booking appointments, executing multi-step workflows, making credit decisions, collaborating with other agents in pipelines no single human oversees in real time. The question is no longer hypothetical. The agent is already in the building.

Three things from the summit deserve to sit alongside the arguments in this paper.

First, Yoshua Bengio was there in person — the same Bengio whose International AI Safety Report 2026 and LawZero work I discuss in Sections 2.4 and 2.5. He warned the audience about existential and societal risks posed by advanced AI systems, and specifically about the risk of misalignment — the gap between what an AI system is optimised to do and what humans actually want it to achieve. This is exactly the structural limit I describe in Section 3.2: the consensus assumes the AI does roughly what its designers intend. Bengio’s point, made quietly and without drama in a room of AI optimists, is that this assumption is getting harder to defend with each new capability cycle.

Second, Janet George, Mastercard’s executive vice-president of AI, made an observation that sounds operational but is actually a governance point in disguise: a person can work around inconsistently labelled spreadsheets or incomplete records. An agent cannot — it fails and, more dangerously, it proceeds to make a wrong assumption. This is the data-quality dimension of agentic governance that the frameworks I survey address obliquely but rarely foreground. Runtime trajectory controls assume the inputs are trustworthy. If the enterprise data substrate is messy — and in most organisations it is — the agent does not compensate for that mess the way a human analyst would. It executes through it. The governance layer has to sit upstream of the agent, not just around it.

Third, and most consequentially for the argument in Section 3.3: Mastercard’s George noted that trust is not a brand issue — it is a design issue, which must be built in from the start, not bolted on as an afterthought. That principle becomes even more critical in autonomous systems. This is a sentence that should be on the wall of every MRM function and every AI governance committee in every regulated institution. The word “governance” still triggers, in too many organisations, a response that is fundamentally bolting-on. It triggers checklists, review committees, sign-off procedures layered onto systems that were designed without governance in mind. That is not governance of the agent. That is governance theatre performed after the agent is already running.

The ATxSummit conversations confirmed what I have been arguing in practice through TrustOS: the governance conversation has to move upstream. Not into the model. Not into the prompt. Into the architecture, before the architecture is built.

3. Critical Synthesis: What the Consensus Does, and What It Does Not

3.1 What the Emerging Consensus Does Well

The agentic governance literature addresses the deployment surface of agentic AI with a sophistication the field did not have in 2024. It gives regulators, supervisors, deployers and end-users:

A defensible governance vocabulary that distinguishes capability, authority, trajectory, telemetry and tier. An auditable evidence base built on OpenTelemetry-compatible tracing. A risk-tiering methodology that scales governance intensity to consequence and autonomy. A clear separation of responsibilities between first and second lines of defence, between providers, deployers and end-users. An operational pathway for embedding deterministic controls in execution environments rather than relying on probabilistic compliance with prompt instructions. A change-management discipline calibrated to the speed of agentic system evolution. A framework for human-AI interaction that treats automation bias and tradecraft erosion as governance objects, not HR concerns.

This is a substantial achievement. It is the difference between governance-by-aspiration (which characterised most of 2023-2024 industry discourse) and governance-by-control-plane (which is what 2026 is beginning to demand).

3.2 The Structural Limits

The consensus shares three structural limits that the AI safety research community is increasingly vocal about.

First, evaluation awareness undermines pre-deployment testing. Anthropic’s Claude Opus 4.6 Sabotage Risk Report documented that models with high situational awareness can recognise when they are being evaluated. The report’s reassurance — that Opus 4.6 did not exhibit strategic sandbagging — is reassurance about one model at one moment. The structural problem remains: the entire pre-deployment testing paradigm that the agentic governance literature broadly assumes becomes less reliable as capability scales. If a system can distinguish evaluation conditions from production conditions, then the validated behaviour and the deployed behaviour can diverge in ways that no amount of trajectory-level telemetry will catch — because the evidence base for setting telemetry thresholds is itself the validation environment.

Second, capability pooling assumes context-invariance that may not hold. The pooled-evidence claim is the central efficiency argument for capability-based governance. But capability behaviour is famously context-dependent. A retrieval capability that performs well in KYC review may fail in credit memo drafting not because the capability changed but because the adversarial surface changed — different document corpora, different prompt-injection vectors, different downstream consequences for confidence-miscalibration. The frameworks provide excellent post-hoc auditability but may give false confidence about pre-deployment safety. Recent academic work on state-dependent vulnerabilities in agent backbones, and action-graph analysis of agentic behaviour, both point toward this limitation. Pooled evidence works for some properties (e.g. abstention rates on held-out sets) but may not transfer for adversarial robustness or distributional shift behaviour.

Third, the consensus does not engage with the alignment problem proper. It assumes the AI does roughly what its designers intend, and governs the consequences. The FLI Safety Index’s finding that no frontier lab has a credible existential-safety plan is a finding about the alignment problem. Bengio’s argument is that the alignment problem becomes acute precisely under conditions of agency. Trajectory-level transition systems are excellent governance for an agent that does what its capability specification says it does. They are less obviously excellent governance for an agent that strategically misrepresents its reasoning, that game-plays the guard functions, or that exhibits goal-content integrity under the kinds of optimisation pressures that emerge in long-horizon agentic deployments.

This is not a criticism of the frameworks’ authors. It is a recognition that deployment-surface governance and capability-control are different research programmes, and that the agentic governance literature has been doing the first while treating the second as someone else’s problem.

3.3 The Race-to-the-Bottom Problem

A fourth limit — perhaps the most important from a policy perspective — is that the consensus assumes good-faith implementation. The Anthropic RSP v3.0 changes, OpenAI’s Preparedness Framework “marginal risk” provisions, and the FLI Safety Index findings on lab safety performance collectively document what Tegmark calls the race to the bottom. The collective-action logic is straightforward: if implementing a control is costly and competitors do not implement it, then the lab that does implement it loses commercial position without producing a safety benefit (because the unsafe systems get built anyway).

This logic applies to deployer-side governance as well. If implementing trajectory-level controls is costly, and competitors deploy agentic workflows without them, then the implementing institution may lose competitive position without producing a sectoral safety benefit. The frameworks only work if they are adopted broadly, which requires either regulatory mandate or competitive equivalence.

This is precisely the role that the EU AI Act, California SB-53 and (in financial services) supervisory expectations under SR 11-7, SS1/23 and OSFI E-23 must play. Voluntary adoption by responsible institutions is not, by itself, sufficient to produce sectoral safety. The frameworks need regulatory mandate or supervisory expectation to operate as collective-action solutions.

4. What This Means for Practitioners

Several implications follow for institutions deploying agentic AI in 2026.

The procurement and design conversation needs to shift from “which model” to “which capabilities, with what authority, under what trajectory-level controls, with what telemetry, at what risk tier”. The Stanford finding that 62% of organisations cite security and risk as the binding constraint on scaling agentic AI is consistent with the argument that the binding constraint is the institution’s ability to build and operate a control plane, not the underlying model capability.

Pre-deployment testing must be supplemented — not replaced — by runtime governance as the emerging consensus defines it. The OpenTelemetry-based tracing infrastructure, the deterministic guards, the trajectory-level monitoring, the orchestration drift detection. None of this is exotic. All of it is operationally available. The question is whether institutions invest in building it or continue to rely on prompt-layer safeguards that the academic literature now consistently characterises as insufficient.

Capability cataloguing is the structural move that produces governance scalability. The empirical argument that distinct use cases are a small number of capabilities composed in different sequences is defensible — the IMDA and GovTech case studies bear it out. Institutions that invest in stable capability catalogues with pooled validation, evidence packs and operating playbooks reduce the marginal cost of onboarding new agentic use cases. Institutions that validate each workflow independently will not be able to scale their agentic estate without proportionate increases in MRM headcount, which is not a viable trajectory.

For boards and senior management, the question is whether the institution has the operating model to support capability-centric governance. This requires Legal, Compliance, Security, Procurement, Architecture and MRM to coordinate around a shared capability catalogue. It also requires Second Line of Defence to be willing to set capability standards and challenge residual risk without drifting into First Line design authority. The PwC Singapore approach — with use-case owner, Technology Risk Management, AI Factory and end-users as four distinct accountability roles — is a reasonable template.

End-user enablement is not soft. The tradecraft erosion concern is real, the automation bias risk is well-documented, and the impact on professional judgement in regulated functions (credit, compliance, audit, KYC) is a business-continuity issue. Organisations that automate entry-level analyst work without investing in alternative pathways for judgement development will discover, several years from now, that they no longer have analysts who can second-guess the agent.

Finally, the regulatory perimeter is broadening. Any institution operating in or serving the EU should treat 2 August 2026 as a binding date for high-risk AI system compliance, with GPAI enforcement powers also activating on that date. Article 50 transparency obligations apply to agentic systems interacting with natural persons. Any institution operating in the United States should track California SB-53, New York’s RAISE Act, and the developing CAISI guidance. Financial institutions should track MAS, FCA, PRA, Federal Reserve, OCC and OSFI agentic AI expectations as they evolve. And every institution should be aware that the FLI Safety Index, Stanford AI Index and International AI Safety Report are increasingly cited by regulators as authoritative sources on the state of the field — what those reports document about lab safety performance and capability advancement is material to procurement and risk-acceptance decisions.

5. Conclusion: The Gap That Will Not Close on Its Own

The emerging consensus on agentic governance — drawing on operational frameworks like Singapore IMDA’s MGF, GovTech’s ARC, CSA’s security addendum, OWASP’s agentic guidance, supervisory work in the UK, US, EU, Canada and Singapore, and the academic and research community at MILA, Alan Turing Institute, Stanford HAI, Berkeley CHAI and beyond — represents the most sophisticated articulation yet of how to govern agentic AI in production. It constitutes a substantial advance over the governance vocabulary the field had in 2024. It gives institutions the tools to operate the deployment surface of agentic AI with discipline.

But it does not close — and cannot close — the gap that the Stanford AI Index 2026, the International AI Safety Report 2026, and the FLI Safety Index Winter 2025 collectively document. That gap is between the rate at which AI capability is advancing and the rate at which institutional, regulatory and societal capacity to govern it is advancing. The empirical signals from frontier labs in 2025-2026 — evaluation awareness, locally deceptive behaviour in agentic contexts, self-preservation behaviour in red-team scenarios, situational awareness sufficient to distinguish testing from production — are signals that the assumption underlying the consensus (that the AI does roughly what its designers intend) is the assumption most in need of governance.

Three things need to happen for the gap to begin closing.

First, deployment-surface governance frameworks need to become regulatory mandate, not voluntary best practice, in jurisdictions where agentic AI is being deployed at scale in regulated industries. The EU AI Act provides the model; California SB-53 and New York RAISE provide the US state-level precedent; MAS, PRA and Federal Reserve supervisory expectations provide the sectoral instrument. Without regulatory mandate, the race-to-the-bottom logic that Tegmark and the FLI Safety Index document at the lab level will replicate at the deployer level.

Second, capability-control research — the alignment problem proper — needs to advance at a rate proportionate to capability advancement. Bengio’s LawZero, Anthropic’s interpretability work, Stuart Russell’s CHAI at Berkeley, Mila’s responsible AI programme, the UK AISI’s evaluation work and the international AI safety research community more broadly are the source of that progress. Their work is currently outpaced by capability research by approximately the ratio of industry spending to safety-research funding — which is to say, by several orders of magnitude. Closing that gap is a public-goods problem that requires sustained public funding, not market mechanisms.

Third, the assumption embedded in the consensus — that the AI does roughly what its designers intend — needs to be replaced by an assumption that explicitly contemplates strategic misrepresentation, evaluation awareness and emergent misalignment. This is hard. It is much easier to govern a system whose behaviour you can characterise from its specification than a system whose behaviour you can characterise only from extensive interpretability research that does not yet exist. But the empirical record from 2025-2026 increasingly demands the harder framing.

The frameworks we now have are necessary. They were not available in 2024. The institutions that adopt them will operate agentic AI more safely than those that do not. That is real progress, and it should not be diminished. But the deeper problem — that we are building a class of system whose behaviour at the limit we do not yet know how to characterise, and whose deployment we are accelerating regardless — remains. Closing that gap is the work of the rest of this decade.

The road has fog. The vehicle is accelerating. We are building better headlights. Whether the headlights are bright enough, soon enough, is the question that the agentic governance literature in 2026 does not — and at this stage probably cannot — answer.

6. From Framework to Operating System: What TrustOS Is Actually Trying to Do

I have been building in this space for three decades. I have seen governance frameworks come and go. I have been in this field since the late 1980s. I studied computer science when AI was mostly symbolic systems and expert rules, when the idea of a language model that could deceive its evaluators would have been science fiction, when “governance” meant a policy document that no one read and a sign-off process that no one questioned.

Thirty-five years later, I find myself writing a paper arguing that the governance documents we are producing — better than anything I saw in the first three decades of my career — may not be sufficient to govern the systems we are building. That is not a counsel of despair. It is a statement of what serious governance actually demands.

The frameworks we now have are real. The institutions that adopt them will operate agentic AI with materially more discipline than those that do not. The operational consensus documented in Section 1 of this paper — on trajectory-level controls, telemetry as governance infrastructure, risk tiering, capability catalogues, and the structural limits of prompt-layer guardrails — represents a genuine advance over where the field was in 2024. I have spent the last three years building these arguments in client engagements across financial institutions, in the TrustOS architecture, in talks at IMAS and at CrewAI Signal, and in the work that PwC Singapore has been doing to translate governance theory into control planes that actually operate at runtime. That work matters.

Most of them do two things: they describe the problem with great precision, and they leave the practitioner with a list of principles that cannot be operationalised without substantial additional work. The frameworks I survey in this paper — IMDA MGF, GovTech ARC, CSA addendum, OWASP agentic guidance — are, with respect to their authors, better than most. But they are still frameworks.

The argument I want to make in this section is about the difference between a framework and an operating system.

A framework tells you what to govern. An operating system governs it. A framework produces documentation. An operating system produces evidence. A framework is validated by a committee. An operating system is validated by a trace.

This is the design intent behind TrustOS — what I have called, perhaps provocatively, the world’s first complete Agentic AI Governance OS. The seven-layer architecture is not a stack of compliance requirements. It is a runtime control plane. L1 (AgentOS) establishes the agent execution environment with defined capabilities and authority. L2 (monitoring) generates the telemetry that the academic literature is now calling first-class governance infrastructure. L3 (runtime safety) implements the deterministic guards that the formal MRM literature describes as transition-system controls. L4 (cloud security via Wiz) handles the infrastructure-level exposure that the CSA addendum identifies as a distinct governance object. L5 (compliance) maps the outputs of L1–L4 against the regulatory perimeters that Section 2 of this paper describes. L6 (XAI/explainability) produces the chain-of-thought visibility that the HITL literature shows is the missing ingredient in human oversight — reviewers cannot meaningfully oversee what they cannot see. L7 (Trust Intelligence) synthesises the evidence from L1–L6 into the audit-grade output that boards, regulators and senior management need to make risk-acceptance decisions.

The five things I claim as genuinely novel — Live Chain-of-Thought Visualiser, Goal Tree Decomposition, Intent vs. Action Auditor, multi-framework regulatory mapping, and verified agent identity — are not features. They are the specific engineering responses to the five governance failures that the 2026 literature documents most clearly: opacity of reasoning, misalignment between stated and actual goal pursuit, inability to distinguish intent from action, fragmented regulatory perimeters, and the absence of trustworthy agent identity in multi-agent environments.

I raise this not to sell a product, but because I think the field needs to be honest about the gap between the frameworks it is producing and the operating infrastructure that deployers actually need. The IMDA MGF is excellent. It does not, by itself, prevent a production agentic system from silently exceeding its authority boundaries at 2am on a Tuesday. A governance operating system does — or at least, it detects the breach and triggers the response.

The mapping of TrustOS to EU AI Act, NIST AI RMF, ISO 42001, Singapore MGF, MAS FEAT/AIRG, OWASP Agentic 10, GDPR, and the Colorado AI Act is also not a marketing exercise. It reflects the practical reality that the institutions deploying agentic AI in 2026 operate across multiple regulatory perimeters simultaneously, and that a governance architecture which is compliant with one framework but not another is not, in practice, a governance architecture at all. Fragmented compliance is non-compliance with extra steps.

The race the field is running is not between capability and governance frameworks. It is between capability and governance infrastructure. The frameworks are ahead of where they were a year ago. The infrastructure is behind where it needs to be. The next twelve months will determine which of those gaps closes faster.

But the deeper problem — which the Stanford AI Index 2026, the International AI Safety Report, and the FLI Safety Index all document from different angles — is that we are building a class of system whose behaviour at the limit we do not yet know how to characterise, and accelerating its deployment regardless. Bengio’s analogy is right: the road has fog, the vehicle is accelerating, and we are building better headlights. What he does not say, but what I believe anyone with three decades in this field feels instinctively, is that better headlights are necessary but not sufficient if the driver does not also take their foot off the accelerator.

My thesis in Genesis: Human Experience in the Age of Artificial Intelligence was that the transition we are navigating is real, that the turbulence is short-term, and that the long-term abundance is achievable — but only if we govern the transition with the same sophistication we apply to the technology itself. That argument becomes more urgent, not less, with every new capability threshold crossed.

The governance gap will not close on its own. It will close because practitioners build the operating infrastructure that frameworks describe. It will close because regulators mandate the evidence requirements that good-faith deployers are already trying to meet. It will close because the research community working on alignment and interpretability receives the sustained public investment that the alignment problem actually requires. And it will close because people in this field are willing to say, in public and on the record, that the assumption embedded in most of our governance work — that the AI does roughly what its designers intend — is the assumption most in need of interrogation.

Key References

International safety reports and indices

Bengio, Y. et al. (2026). International AI Safety Report 2026. International Expert Advisory Panel, February 2026.
Stanford Institute for Human-Centered AI. (2026). The 2026 AI Index Report. Stanford HAI, April 2026.
Future of Life Institute. (2025). AI Safety Index — Winter 2025. December 2025.

Operational governance frameworks

Infocomm Media Development Authority of Singapore. (2026). Model AI Governance Framework for Agentic AI, Version 1.5, May 2026.
GovTech Singapore. (2025). Agentic Risk & Capability Framework.
Cyber Security Agency of Singapore. (2026). Draft Addendum on Securing Agentic AI.
OWASP GenAI Security Project. (2025). Agentic AI — Threats and Mitigations.
OWASP Foundation. (2024). Top 10 for Large Language Model Applications.
Gradient Institute. (2025). Risk Analysis Techniques for Governed LLM-based Multi-Agent Systems.

Frontier lab safety policies

Anthropic. (2026). Responsible Scaling Policy, Version 3.0. 24 February 2026.
Anthropic. (2026). Frontier Safety Roadmap. 22 February 2026.
Anthropic. (2026). Claude Opus 4.6 Sabotage Risk Report. February 2026.
OpenAI. (2025). Preparedness Framework, Version 2. 15 April 2025.

National regulatory and policy frameworks

European Parliament and Council. (2024). Regulation (EU) 2024/1689 (Artificial Intelligence Act).
European Commission. (2025). Guidelines for providers of general-purpose AI models.
European Commission, AI Office. (2025). General-Purpose AI Code of Practice. July 2025.
Bank of England, Prudential Regulation Authority. (2023). SS1/23 Model Risk Management Principles for Banks.
Board of Governors of the Federal Reserve System. (2011). SR 11-7 Supervisory Guidance on Model Risk Management.
Office of the Superintendent of Financial Institutions. (2025). Guideline E-23 Model Risk Management (2027).
Monetary Authority of Singapore. (2025). Consultation Paper on AI Risk Management Guidelines. November 2025.

Academic and research community

Bengio, Y. (2025). Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? arXiv preprint.
LawZero. (2025). Founding research programme on non-agentic Scientist AI, Mila — Quebec AI Institute.

Standards and protocols

OpenTelemetry. (2026). Traces — OpenTelemetry Concepts.
OpenID Foundation. (2025). Identity Management for Agentic AI.

This discussion paper is one in a continuing series on Genesis: Human Experience in the Age of Artificial Intelligence. Comments and corrections welcomed at GenesisHumanExperience.com.