Why Responsible, Orchestrated Agentic AI Is the Only Way to Scale in Production

By Dr Luke Soon – AI Futurist, Ethicist | Author of Genesis: Human Experience in the Age of Artificial Intelligence*

In the race to deploy agentic AI – autonomous systems capable of planning, reasoning, adapting, and taking independent action – many organisations are unwittingly falling into what has become known as the “one-model trap.” Relying on a single monolithic large language model (LLM) for production-grade agents may deliver impressive results in controlled demonstrations, but it quickly unravels under real-world demands. As someone who has spent nearly three decades architecting AI transformations for global enterprises, and who has written and spoken extensively about responsible scaling on LinkedIn and at genesishumanexperience.com, I have seen this pattern repeatedly. The challenges are not merely technical; they are deeply architectural, operational, and ethical. True scalability demands a shift towards responsible, human-centric orchestration – the core philosophy that underpins my work in Genesis: Human Experience in the Age of Artificial Intelligence.

The One-Model Trap: Why Demos Succeed and Production Fails

Agentic systems in production confront inherently messy environments: wildly varying request complexity, tools with unpredictable latency, spiralling costs from retries and extended contexts, and constantly evolving policies. A single frontier model attempting to handle everything becomes a brittle single point of failure – expensive, slow at scale, and extraordinarily difficult to govern.

Drawing from extensive enterprise deployments at PwC Singapore and through collaborations with IMDA and Microsoft, a typical agent workload distribution reveals the inefficiency clearly. Approximately 70% of tasks are routine – simple classification, retrieval, or basic data transformations that do not require the computational heft or expense of a massive model. Around 20% involve moderate reasoning combined with tool use, while only about 10% represent genuinely complex edge cases demanding sophisticated planning and ambiguity resolution. Routing every interaction through one high-capability model wastes resources on trivial work, inflates latency during traffic spikes, and exacerbates governance issues when rules and safeguards are embedded deep within monolithic prompts.

The gap between demonstration and production is stark. Pilots often optimise for a single compelling output. Production demands system-level reliability: Did the entire agentic workflow complete safely, within acceptable time and cost boundaries, and in full compliance? Metrics such as p95 and p99 tail latency, runaway cost from looping behaviours, and undetected policy violations become critical failure points. Tight coupling between model, prompts, tools, and orchestration logic turns minor updates into major migrations, rapidly accumulating technical debt. This is precisely why legacy approaches to building and governing AI fall short, as I have highlighted in recent LinkedIn discussions on scaling agentic systems responsibly.

Benchmarks Alone Cannot Bridge the Reality Gap

My recent article on genesishumanexperience.com, “Exposed: Why AI Benchmarks Are Failing Us All – And How Singapore’s Real-World Governance Is Fighting Back,” underscores a related challenge. Traditional benchmarks evaluate narrow, sanitised capabilities while ignoring the “reality gap” – the unpredictable human contexts, dynamic workflows, adversarial inputs, and operational constraints that define live environments. As I argued there, we urgently need rigorous, lifecycle-based evaluation approaches that translate glossy scores into trustworthy real-world performance.

Singapore’s governance ecosystem offers a powerful model for closing this gap. The Model AI Governance Framework for Agentic AI (MGF), launched by the Infocomm Media Development Authority (IMDA) in January 2026 and described as the world’s first dedicated framework for autonomous agents, provides structured guidance across four key dimensions: assessing and bounding risks upfront, ensuring meaningful human accountability, implementing technical controls and processes, and enabling end-user responsibility. It builds directly on earlier iterations of the Model AI Governance Framework (originally released in 2019 and updated in 2020 for traditional AI, with a 2024 version for generative AI).

Complementing the MGF is AI Verify, IMDA’s open-source AI governance testing framework and software toolkit. AI Verify enables organisations to validate systems against 11 internationally aligned principles – including transparency, explainability, robustness, fairness, safety, security, and accountability – through a combination of technical tests and process checks. It supports both traditional and generative AI use cases and is particularly valuable for agentic systems, where it helps assess execution accuracy, policy adherence, tool usage, and robustness to edge cases and environmental changes. Singapore’s approach also draws inspiration from global standards, including the NIST AI Risk Management Framework (AI RMF), which emphasises the core functions of Govern, Map, Measure, and Manage to integrate trustworthiness throughout the AI lifecycle. NIST’s socio-technical lens and focus on mapping risks across design, development, deployment, and operation provide a complementary foundation for handling the heightened uncertainties of agentic behaviours.

Security considerations are equally critical. The OWASP Top 10 for Agentic Applications (2026) highlights agent-specific risks such as agent goal hijacking, tool misuse and exploitation, identity and privilege abuse, agentic supply chain vulnerabilities, memory and context poisoning, and cascading failures. These risks amplify dramatically when agents operate autonomously with access to tools, memory, and external systems. Integrating OWASP guidance with Singapore’s MGF and AI Verify creates a robust layered defence.

Additional Singapore-specific elements, such as the Monetary Authority of Singapore’s (MAS) AI Model Risk Management guidelines for the financial sector and broader initiatives like the Global AI Assurance Pilot, further strengthen practical implementation. Frameworks such as CIRCLE (Contextualise → Identify → Represent → Compare → Learn → Extend) offer longitudinal, human-centric validation that complements these tools, helping organisations move beyond static testing to ongoing, context-aware assurance.

The Path Forward: Multi-Model Orchestration with a Strong Governance Stack

The solution lies not in pursuing a marginally “better” single model, but in designing layered, orchestrated architectures where models function as swappable, specialised components behind a robust control plane. As I have detailed in my writings on the “Seven Pillars of Agentic AI” and “The 8 Types of LLMs Powering the Age of AI Agents,” capability tiering and intelligent routing are essential: fast, cost-effective specialist models for routine tasks; mid-tier models for synthesis and retrieval; and frontier models reserved strictly for high-stakes escalations. Deterministic guardrail layers and policy-as-code mechanisms add reliability.

A strong control plane must remain separate from generation logic. Routing decisions, policy enforcement, session budgets, retry mechanisms, and orchestration should be explicit, versioned, and auditable. Failure-aware patterns – including timeouts, circuit breakers, graceful degradation to lower-tier models or human-in-the-loop handoffs, and per-step token budgets – prevent cascading issues and cost overruns. Comprehensive observability, extending concepts like OpenTelemetry to full agentic traces across models, tools, retrieval, and policies, becomes indispensable.

This approach aligns closely with the AI Governance Stack I advocate: a five-layer model encompassing policy and accountability, MLOps and observability, data governance, security and guardrails, and agent-specific orchestration. At PwC we have been piloting “Agent OS” concepts that incorporate multi-agent collaboration, workflow-centric design, and real-time monitoring, all grounded in Singapore’s MGF and AI Verify for measurable trustworthiness. By bounding agent “action spaces,” enforcing least-privilege tool access, requiring plan reflection and human checkpoints, and conducting continuous monitoring, organisations can mitigate excessive agency while preserving utility.

From Technical Architecture to the Agentic Organisation: Centring Human Experience

Scaling agentic AI ultimately reshapes organisations and the nature of work itself. In pieces such as “The Imminent AI Disruption of Human Work,” I describe a transition towards a post-labour paradigm in which cognitive routines are increasingly commoditised, freeing humans to focus on orchestration, creativity, empathy, and meaning-making. Agentic systems become collaborative “direct reports,” but this shift introduces short-term challenges: the productivity paradox, recomposition of roles, and the ever-present risk of unchecked autonomy.

A multi-model, governance-first architecture directly addresses these concerns. It supports fluid “work charts” rather than rigid hierarchies, preserves meaningful human agency through reflection loops and sandboxing, and maintains accountability chains even as autonomy increases. Above all, it keeps human experience (HX) at the centre. In the experience economy and the age of intelligence, trust remains the ultimate currency. Without rigorous governance – informed by NIST’s risk management principles, OWASP’s security mitigations, AI Verify’s testing capabilities, and the MGF’s structured accountability – even the most advanced agents will struggle to earn and retain that trust.

A Call to Responsible Action

The one-model trap is tempting because it simplifies procurement, benchmarking, and initial pilots. Yet production realities – amplified by agentic risks documented in OWASP, NIST, and Singapore’s pioneering frameworks – demand a more mature approach. Enterprises that adopt intelligent routing, embed governance as foundational infrastructure, and design with human flourishing in mind will unlock safe, scalable, and sustainable agentic transformations.

Singapore’s leadership – through the MGF for Agentic AI, AI Verify, and alignment with global standards such as NIST – demonstrates that innovation and responsibility can advance together. By learning from these models, organisations worldwide can navigate the fork between abundance and unintended consequences, steering agentic AI towards outcomes that genuinely amplify human potential.

I welcome your perspectives. Have you encountered one-model limitations in your agentic initiatives? How are you integrating frameworks like the MGF, AI Verify, NIST RMF, or OWASP guidance into your governance practices? Share your experiences in the comments or connect with me on LinkedIn. Let us collectively build agentic systems that are not only powerful, but profoundly responsible and human-centred.

Why Responsible, Orchestrated Agentic AI Is the Only Way to Scale in Production

Share this:

Leave a comment Cancel reply