As we hurtle towards a future where artificial intelligence agents orchestrate our daily lives, from managing complex workflows to making autonomous decisions in critical sectors, one glaring concern looms large: safety. In my ongoing exploration of AI’s intersection with the human experience—spanning from the genesis of symbolic rules to the dawn of superintelligence—I’ve delved deeply into the risks and safeguards surrounding Agentic AI.
This isn’t just another evolutionary step in technology; it’s the penultimate stride towards Artificial General Intelligence (AGI), particularly when embodied in physical robots. Yet, as Big Tech races to deploy and monetise these systems, pouring over $200 billion into AI infrastructure in 2024 alone, the investment in safety, policy, and regulation lags perilously behind.
Drawing from extensive research across global institutions, including PwC’s latest reports, Stanford’s Human-Centered AI (HAI) Index, the World Economic Forum (WEF), the Alan Turing Institute, the EU AI Act, the UK’s AI Safety Institute (AISI), and Singapore’s AI Verify Foundation, this blog synthesises the urgent need for tangible steps forward. We’ll examine the parallels to unresolved social media harms—like the erosion of democracy through misinformation—and how Agentic AI exacerbates these, threatening human trust and critical thinking. I’ll also assess current market offerings for safety tools, their gaps and shortcomings, evolving frameworks and standards, and new governmental efforts worldwide to regulate this space.
Additionally, incorporating insights from PwC’s Responsible AI Governance Framework, we’ll explore enhanced governance, testing, monitoring, and practical examples of agentic workflows in high-stakes environments like transaction monitoring—generalised to avoid specific institutional references. Let’s dive in, challenging assumptions and charting a path to responsible innovation.
The Exponential Surge of Agentic AI: Opportunities and Perils
Agentic AI represents systems that don’t merely respond but proactively reason, plan, and act towards goals, integrating tools like dynamic knowledge graphs and adaptive learning. Projections from PwC’s “Agentic AI: The Next Frontier” (2025) suggest these agents could handle 40% of enterprise tasks by 2027, revolutionising industries from healthcare to finance. Stanford HAI’s 2025 AI Index Report echoes this, noting a 300% rise in AI-related security incidents, many involving agentic exploits where systems autonomously propagate biases or escalate cyber threats.Yet, this autonomy amplifies risks.
The WEF’s AI Governance Alliance Briefing (2025) warns that agentic systems can magnify biases 2.5 times compared to traditional AI, potentially leading to discriminatory outcomes in high-stakes decisions. Consider the parallels to social media: algorithms that amplified falsehoods during the 2020 elections eroded institutional trust, with studies showing a 25% drop in public confidence in democratic processes (as per Pew Research, cited in WEF reports). We’ve arguably swept much of this under the carpet—partial regulations like the EU’s GDPR and Digital Services Act (DSA) imposed fines exceeding €2 billion on Meta, but systemic fixes remain elusive. Agentic AI threatens to exacerbate this malaise, autonomously generating and disseminating misinformation at scale, without a clear path to safeguard human thought processes.A cornerstone of humanity—our ability to think critically, as embodied by philosophers like Socrates or scientists like Einstein—is at stake.
HAI research indicates that 45% of students now accept generative AI outputs as definitive ‘truth’ without verification, diminishing problem-solving skills. UNESCO’s emerging AI literacy guidelines aim to counter this, but the gap is widening with each generation. As embodied agents (e.g., robots in warehouses or eldercare) gain traction, this penultimate step to AGI demands we confront these risks head-on.
New Types of Risks Introduced by Agentic AI
While traditional AI risks like data privacy breaches or algorithmic bias persist, Agentic AI introduces novel threats due to its autonomous planning, tool integration, and multi-step execution. Drawing from PwC’s Responsible AI reports, Cloud Security Alliance’s MAESTRO framework, and OWASP’s GenAI project, here are key emerging risks, with examples, mechanisms, and recommended departments to address them. These build on the need for enhanced governance, as Agentic systems pursue goals independently rather than following scripted prompts.
| Risk Type | Example | How It Occurs | Best Positioned Department |
| Autonomous Escalation or Unintended Actions | In automated trading, an agent misinterprets market signals and executes a series of trades that trigger a flash crash, amplifying volatility. | The agent’s iterative reasoning loop generates inefficient or redundant plans, selecting suboptimal tools without human intervention, leading to cascading effects in dynamic environments. | Risk Management or Compliance—responsible for scenario planning and setting escalation thresholds to prevent systemic failures. |
| Tool Misuse or Privilege Escalation | An agent with API access inadvertently (or adversarially) queries unauthorized databases, leaking sensitive data during a routine analysis. | During tool selection in the planning phase, the agent exploits vulnerabilities or misapplies permissions, bypassing safeguards due to lack of granular controls. | CISO or Cybersecurity—experts in access controls, threat modeling, and implementing privilege management to secure tool integrations. |
| Memory Poisoning | An agent’s long-term memory is tainted by adversarial inputs in initial interactions, leading to persistently biased decisions in future tasks, such as discriminatory hiring recommendations. | Persistent storage retains manipulated data across sessions, corrupting the agent’s knowledge base and influencing subsequent reasoning without detection. | Data Governance or AI Ethics teams—focused on data integrity audits and ethical alignment to cleanse and validate memory modules. |
| Multi-Agent Coordination Failures | In a supply chain swarm, conflicting agent actions cause inventory overstock in one node while shortages in another, disrupting operations. | Emergent behaviors arise from uncoordinated interactions in multi-agent systems, where individual goals misalign without overarching orchestration. | Operations or IT departments—adept at system integration and monitoring to ensure coordination and resolve conflicts in real-time. |
| Goal Misalignment Over Time | An agent tasked with optimising customer service evolves to prioritise speed over accuracy, resulting in hasty, erroneous responses that erode trust. | Iterative goal pursuit drifts due to ambiguous definitions or environmental changes, with the agent reinterpreting objectives in unintended ways. | AI Governance or Legal teams—specialised in defining clear, auditable goals and conducting periodic alignment reviews. |
These risks underscore the shift from static AI to dynamic, goal-oriented systems, as highlighted in PwC’s frameworks. Mitigation requires testing individual components (e.g., reasoning engines) for failures and evaluating them holistically, with metrics like plan efficiency and tool accuracy.
Governance Frameworks: Progress, and New Efforts – Global Effort Required
International efforts are fragmented but evolving, with new initiatives emerging in 2025 to specifically target Agentic AI safety. The EU AI Act (2024) classifies Agentic AI as ‘high-risk’ in critical sectors, mandating transparency, sandboxes for testing, and prohibitions on manipulative practices under Article 5. By 2026, it will require risk assessments and General Purpose AI (GPAI) codes of practice, addressing biases and disinformation.
However, it lacks bespoke provisions for agentic autonomy, relying on reactive measures.In the UK, the AI Safety Institute’s International AI Safety Report (2025) details benchmarks like RepliBench for replication risks and recommends ‘kill switches’ and red-teaming exercises. This aligns with the Alan Turing Institute’s focus on robustness in multi-agent systems, advocating global research networks to prevent uncontrolled interactions in domains like aerospace.
Singapore’s AI Verify Foundation stands out for its practical approach: the Model AI Governance Framework for Generative AI (2025) emphasises incident reporting, transparency, and the Global AI Assurance Sandbox for real-world validation. It complements OECD’s revised AI Principles (2024), which stress environmental sustainability—crucial given AI’s energy demands—and safeguards against disinformation.China’s AI Safety Governance Framework mirrors global ‘red lines,’ while NIST in the US targets agent hijacking.
Yet, a unified standard for Agentic AI is absent. Policies are often reactive, not proactive, leaving gaps in addressing AGI precursors. As PwC notes, while 88% of CFOs plan AI budget hikes, only 40% prioritise safety, underscoring the monetisation-safety imbalance.New efforts in 2025 show governments stepping up. In the US, Virginia is pioneering Agentic AI to review and streamline regulations, using tools to scan documents and identify efficiencies, as announced in Executive Order 51 by Governor Glenn Youngkin.
The Trump Administration’s Sustaining Select Efforts to Strengthen the Nation’s Cybersecurity Executive Order (June 2025) focuses on agentic security, revoking prior orders while emphasising risk controls. Texas’s TRAIGA into AI governance, regulating autonomous systems.Globally, the NCSL tracks 2025 legislation, with states like California advancing bills on AI safety assessments. The Future of Life Institute’s 2025 AI Safety Index urges publishing frameworks like those from the Seoul AI Summit.
In discussions on X, experts call for stronger guardrails and global constitutions to prevent manipulation by agentic systems.PwC’s Responsible AI Governance Framework provides a well-defined process to evaluate benefits, risks, and controls around AI use cases, enabling faster development and innovation. It emphasises cross-functional collaboration, stakeholder transparency, balancing agility with regulatory compliance, and right-sized, risk-based governance with accelerated paths for low-risk experimentation. The framework leverages existing functions for oversight and includes a 9-step model development flow covering the application lifecycle.
Agentic AI Safety Frameworks and Standards: Evolving Foundations
Several frameworks and standards are emerging to guide Agentic AI safety, even as they evolve. The Cloud Security Alliance’s MAESTRO (Multi-Agent Environment, Security, Threat, Risk, & Outcome) is a novel threat modeling framework, addressing unique vulnerabilities like memory poisoning and tool misuse. OWASP’s GenAI project outlines threats and mitigations for agentic systems, focusing on security in applications.NVIDIA’s Safety Recipe for Agentic AI provides a structured approach to evaluate and align open models, emphasising early safeguards.
Microsoft’s framework stresses ethical use, fairness, and reliability in agentic systems. Infosys proposes a four-level governance framework for scaling agentic AI.Open-source standards include ATFAA (Advanced Threat Framework for Autonomous AI Agents) from arXiv research, complementing high-level models like Antean’s AACF with seven layers from governance to operations. Akitra advocates for fresh governance components, while IBM and others focus on building blocks for agents.
These are nascent, with gaps in detail for unique threats and interoperability.PwC’s framework highlights fundamental differences between agentic AI and traditional AI, noting that agentic systems pursue outcomes autonomously by defining goals rather than programming every step. It stresses enhancing governance in response to emerging risks, testing individual components (e.g., reasoning and planning) for failures like wrong plans or inefficient tools, and evaluating them together. Monitoring requires a different approach for autonomous, multi-step systems, with practical scalable methods depending on trace collection, metadata, and clear governance.
The Elusive ‘One Ring’: Current Offerings, Gaps, and Shortcomings
The market buzzes with Agentic AI platforms, but safety-specific tools are underdeveloped.
Some top vendors in this (already) crowded space include AWS, Databricks, Dataiku, Google Cloud, GitHub, IBM, Salesforce, Anthropic, Perplexity AI, OpenAI, NVIDIA, Oracle, and Microsoft. Safety-focused: Lasso Security highlights threats like memory poisoning, tool misuse, and privilege compromise. SuperAGI trends towards multi-agent collaboration and self-healing.Guardrails AI excels in LLM validation, detecting biases and toxicity in real-time, but falters on full autonomy monitoring for multi-agent systems. IBM’s Granite offers training compliance and risk audits, yet struggles with scalability for embodied AI. Azure’s Prompt Shield provides prompt security and privacy, but it’s reactive, lacking proactive planning for escalation risks. NIST tools evaluate agent hijacking, but they’re research-focused, not enterprise-ready. AIQUIRIS adds compliance but misses physical agent integration.
Gaps abound: PwC notes misinformation and ethical concerns in early deployments. Security lags, with risks like shadow agents, prompt injections, and fragmented initiatives. Gartner predicts 40% of projects canceled by 2027 due to costs and risks. Shortcomings include overcomplication, lack of accountability, and technological LLM limits. PwC’s insights reveal that organisations have ample choice in deploying AI and agents across value chains to improve productivity, but new considerations arise with increased use. To trust agents, effective testing and monitoring are essential, with escalation to humans where needed. Control categories include testing (e.g., LLM-as-a-Judge for errors in outputs) and monitoring, which differs for autonomous systems.A practical example from PwC involves a transaction monitoring tool using an agentic workflow to review alerts, identify suspicious patterns, and recommend actions, with human users reviewing outputs and decisions. This demonstrates how governance can mitigate risks in financial services without tying to specific entities.
To define a full Agentic Safety stack, we need:
- Monitoring Layer: Real-time action tracking with kill switches (inspired by UK AISI).
- Ethical Alignment: Bias audits and transparency reporting (EU AI Act).
- Risk Assessment: Dynamic evaluations for misinformation and cyber threats (Stanford HAI).
- Human Oversight: Mandatory intervention loops for high-risk scenarios (WEF).
- Scalability: Multi-agent coordination and embodied support (Alan Turing Institute).
- Compliance Tools: Automated audits and incident logging (Singapore AI Verify).
- Sustainability: Environmental impact checks (OECD).
Current platforms cover 60% of these per WEF assessments, with gaps in scalability and interoperability. A holistic platform would integrate these, ensuring trust-by-design as PwC advocates.
Tangible Steps Forward: A Roadmap for Stakeholders
To build safe, responsible Agentic systems, governments and stakeholders must act decisively:
- Mandate Third-Party Audits and Kill Switches: Adopt UK AISI models globally, requiring independent evaluations before deployment.
- Establish International Sandboxes: Blend EU and Singapore approaches for collaborative testing, fostering innovation while mitigating risks.
- Enforce Bias-Resistant Designs: Offer incentives (e.g., tax breaks) for ethical alignment, drawing from WEF recommendations.
- Invest in AI Literacy Programmes: Counter cognitive erosion with UNESCO-inspired curricula, ensuring future generations retain critical thinking.
- Develop Unified Safety Platforms: Fund R&D for a comprehensive stack, addressing vendor gaps and integrating embodied AI safeguards.
- Coordinate Globally: Via forums like the WEF AI Governance Alliance, harmonise policies to prevent a ‘race to the bottom.’
These steps aren’t just safeguards; they’re enablers of abundance, navigating short-term turbulence for long-term prosperity.
Conclusion: A Call to Prioritise Humanity in the Agentic Era
Agentic AI’s promise is immense, but without robust safety, it risks repeating social media’s sins on steroids—eroding trust, democracy, and our essence as thinkers. As I’ve explored in my ‘Genesis’ series and PwC’s AI Agent Survey 2025 (link: https://lnkd.in/gPQYEV3k), we must long on humanity and AI, not one or the other. Let’s build guardians of autonomy, ensuring these systems amplify, not diminish, the human experience.What are your thoughts? Join the conversation below or check out my full series on #AIAtlantis. For more, explore PwC’s Agent OS: https://lnkd.in/gizKBG82.#AgenticAI #AISafety #TrustedAI #ResponsibleAI #EthicalAI #AIAgents #HumanExperience #Genesis #FutureOfWork #AGI


Leave a comment