By Dr Luke Soon
This post is not a primer for executives. It is a technical blueprint for programmers building agentic systems from scratch — an opinionated roadmap synthesising emerging best practices, frameworks, and pitfalls.
1. Defining the Agent: Role, Scope, and Guardrails
Before writing a line of code, clarity is paramount:
Role and use case: Define the agent’s core problem. Is it a travel planner, a financial research assistant, or a compliance monitor? Boundaries: Establish hard guardrails around scope. Over-generalisation is a leading cause of agentic instability (see Smeyatsky, 2025). User and system interactions: Decide whether the agent is human-in-the-loop (HITL) or fully autonomous.
Think of this step as API design first, autonomy second — a principle echoed by LangChain and Anthropic’s Constitutional AI.
2. Inputs, Outputs, and Schemas
Agents collapse without structured I/O contracts. JSON schemas (via Pydantic, TypedDict, or LangChain structured outputs) enforce consistency.
from pydantic import BaseModel
class TravelPlan(BaseModel):
flights: list[str]
hotels: list[str]
itinerary: dict

Outputs must be machine-verifiable. Free-form text is brittle; structured outputs enable downstream orchestration.
3. Prompt Engineering as a Software Discipline
Forget “prompt hacking.” For agents, prompts are specifications:
Role-specific instructions (planner, executor, verifier). Tone and domain context (legal, financial, healthcare). Robustness via self-critique (Llama Guard, Claude Constitutional Principles).
Prompts should be version-controlled and tested like code. Tools such as PromptLayer and LangSmith bring CI/CD to prompt management.
4. Reasoning, Tools, and APIs
The leap from chatbot → agent lies in tool-augmented reasoning.
ReAct (Yao et al., 2022): interleaves reasoning traces with tool calls. APIs and calculators: Extend the LLM beyond its parametric memory. Chain-of-Thought (CoT) vs Tree-of-Thoughts (Yao et al., 2023): controlled deliberation mechanisms.
Here, LangChain, OpenAI function calling, and AutoGen are canonical choices.
5. Multi-Agent Collaboration
Single agents are brittle. Robust systems use division of labour:
Planner: Breaks tasks into sub-goals. Executor: Performs atomic actions. Checker: Validates and backstops.
Frameworks like CrewAI, LangGraph, and Swarm implement orchestration patterns.
This resonates with Minsky’s “Society of Mind”: agents of agents, not a monolithic oracle.
6. Memory: Beyond Statelessness
Memory is the sine qua non of agency. Consider three layers:
Episodic: Short-term (last n messages). Semantic: Vector DBs (Pinecone, Chroma). Symbolic/long-term: SQL/graph stores for structured recall.
A hybrid stack prevents catastrophic forgetting while enabling retrieval-augmented generation (RAG).
7. Multimodality (Optional, but Transformative)
Integrating speech, vision, and action turns static LLMs into embodied agents:
Speech: Whisper, ElevenLabs. Vision: GPT-4 Vision, BLIP-2. Action: Robotics control via VIMA or APIs.
This enables agentic robotics, a frontier explored by DeepMind RT-X.
8. Orchestration and Evaluation
Agent pipelines require orchestration:
Queues & triggers (n8n, Temporal). Message passing: Agent-to-Agent protocols. Evaluation loops: LLM-as-judge (Zhou et al., 2023).
Evaluation spans functional accuracy, safety alignment, and efficiency. Without it, agents drift into failure modes (see HBR, 2025).
9. Deployment: API, UI, and Monitoring
The final step: expose, monitor, iterate.
API/UI: FastAPI, Gradio, Streamlit. Monitoring: Telemetry and safety metrics (WhyLabs, Weights & Biases). Governance: Embed compliance hooks (see AI Verify, Singapore).
Remember: an agent in production is not static. It must be continuously audited, retrained, and stress-tested.
Visual Frameworks
The following diagrams (adapted from The Product Compass Newsletter) illustrate two complementary views:
9-Step Build Process (from Scratch) – role definition → multimodal ability → deployment. 7-Step System Process – system prompt → LLM → tools → memory → orchestration → UI → evaluation.
Together, they form a dual-lens design pattern for modern agentic systems.
Final Thoughts
We are moving from stateless chatbots to autonomous agentic ecosystems. For programmers, the challenge is not just chaining APIs, but engineering robustness, interpretability, and control.
To paraphrase Geoffrey Hinton: neural nets scaled beyond imagination, but without agents, they remain passive. The agent layer is where intelligence begins to act — and where safety, governance, and human alignment will be tested most severely.
References & Further Reading
Yao et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. Yao et al. (2023). Tree of Thoughts: Deliberate Problem Solving with LLMs. Zhou et al. (2023). LLM-as-Judge: LLM-based Evaluation. Smeyatsky (2025). Agentic AI Security Landscape White Paper. HBR (2025). Organizations Aren’t Ready for the Risks of Agentic AI. Lasso Security (2025). What is Agentic AI?. OpenAI (2023). Function Calling. LangChain (2024). LangGraph Documentation.


Leave a comment