🎯 Find Your Pattern — Interactive Selector
Pick the approach that feels most natural to you:
Which scenario best describes what you're building?
Foundations — What Makes a System "Agentic"?
An agentic AI system is one where the model doesn't just respond — it plans, decides, and acts across multiple steps to achieve a goal, often using tools, memory, and other agents along the way.
Think of it like the difference between a calculator (you do all the thinking, it executes one step) and an accountant (you give them a goal, they figure out the steps themselves). The move from calculator to accountant is the move from "LLM call" to "agentic system".
A system becomes more agentic as it gains more of these four properties. Understanding which ones your use case needs is the first step toward choosing the right architecture.
- Planning: The ability to break a goal into sub-tasks and sequence them — even if the sequence wasn't known upfront.
- Tool Use: The ability to take actions in the world — searching the web, writing files, calling APIs, executing code.
- Memory: The ability to retain and recall information across steps or sessions — from simple context windows to long-term vector stores.
- Multi-step Autonomy: The ability to run a chain of reasoning/action loops without a human approving every step.
Developers often obsess over which LLM to use, but for agentic systems, the architecture — how you connect, orchestrate, and constrain your agents — has a far greater impact on reliability, cost, and maintainability.
A well-architected system with a mid-tier model will outperform a poorly-architected system using the most powerful model available. Architecture defines how failures propagate, how errors get corrected, how costs scale, and how humans stay in control.
Decision Guide — The Questions to Ask First
E.g. "summarise this document" vs "research and write a market report"
E.g. coding + security review + documentation are very different skill profiles
E.g. researching 5 companies simultaneously vs building on each prior finding
Sending an email is irreversible. Generating a draft is not.
E.g. generated code that must actually run, or a report that needs fact-checking
E.g. prompt refinement, code debugging, creative writing improvement cycles
The Patterns — A Practical Field Guide
One LLM instance that reasons over a task and invokes tools (search, code execution, file I/O, APIs) as needed. The model decides when and how to use each tool, forming a think → act → observe loop until the task is complete.
- Task is self-contained and well-scoped
- A single context window is sufficient
- Toolset covers all needed actions
- You want simplicity and low latency
- Task exceeds the context window
- Specialised skills are required at each step
- Errors in one step cascade catastrophically
- Parallelism would significantly reduce time
A customer support agent that looks up order details, checks policy docs, and drafts a personalised reply — all in one pass.
Tool call loops can spiral. Always set a maximum iteration count to prevent runaway execution and unexpected costs.
A fixed sequence of LLM calls where the output of each step becomes the input of the next. No dynamic routing — the flow is defined at design time. Each step can be a different prompt, model, or even a non-LLM transformation.
- The workflow is predictable and deterministic
- Each step genuinely improves quality (e.g. draft → critique → refine)
- You need easy observability and debugging
- Different steps benefit from different system prompts
- The number of steps is unknown upfront
- Intermediate steps need to branch based on content
- Latency is critical (each step adds time)
A content pipeline: Extract topics → Research each topic → Draft section → Edit for style → Format for publication.
Error amplification — a mistake in step 2 gets refined and polished by steps 3–5, making it harder to detect. Add validation gates between stages.
A central orchestrator agent breaks the goal into sub-tasks and dispatches them to specialised subagents. Each subagent has a narrow, deep focus. The orchestrator collects results, handles errors, and decides on next steps dynamically — it's the brain; the subagents are the hands.
- Task complexity exceeds a single agent's context
- Specialist subagents (coder, researcher, writer) add real value
- The path to completion is dynamic, not known upfront
- You need fault isolation between components
- The task is simple enough for a single agent
- Orchestration latency/cost is unacceptable
- You lack observability tools to debug multi-agent flows
A software development agent: Orchestrator plans the feature → Coder subagent writes code → Tester subagent runs tests → Reviewer subagent checks style → Orchestrator reports back.
The orchestrator is a single point of failure. If it loses track of state or misinterprets a subagent's output, the whole task degrades. Invest heavily in its system prompt and context management.
Multiple agents (or LLM calls) run concurrently on independent sub-tasks and their results are aggregated. Ideal when a large task can be decomposed into independent chunks that don't need each other's output to proceed.
- Sub-tasks are independent of each other
- Latency reduction is a priority
- You want multiple independent opinions or perspectives
- Large inputs need to be chunked and processed separately
- Sub-tasks have dependencies on each other
- Cost is constrained (parallel = multiple concurrent LLM calls)
- Aggregating results is non-trivial or lossy
Analysing 50 customer interviews simultaneously — each agent handles one transcript, then results are synthesised into themes by a final aggregation call.
The aggregation step is often underestimated. Combining 10 independent summaries into one coherent output requires a skilled final prompt — it's not just concatenation.
A Generator agent produces an output; an Evaluator agent critiques it against specific criteria and feeds structured feedback back to the Generator. This loop continues until the Evaluator's criteria are met or a max iteration count is reached.
- Quality can be objectively evaluated (tests pass, rubric met)
- First-pass output is rarely good enough
- Iterative refinement has diminishing returns you can detect
- The generator and evaluator benefit from different personas/prompts
- There is no clear, automatable success criterion
- Loop cost/latency is unacceptable
- The evaluator itself is unreliable or biased
Code generation: Generator writes a function → Evaluator runs unit tests and returns failing cases → Generator fixes and retries → loop ends when all tests pass.
Infinite loops and "sycophantic drift" — the evaluator starts approving outputs to end the loop. Always enforce a hard iteration limit and log each cycle.
Multiple peer agents collaborate, share context, and hand off tasks to each other without a single central orchestrator. Each agent decides when to act, when to defer, and when to call on a peer. Coordination emerges from the agents' interactions.
- Tasks are highly dynamic and unpredictable
- No single agent can have all necessary context
- You need resilience — no central point of failure
- Agents have genuinely complementary, peer-level expertise
- You need deterministic, auditable behaviour
- Cost and latency are tightly constrained
- Your observability tooling is immature
- The task is well-defined (massive overkill)
An autonomous research team: Researcher, Fact-Checker, Devil's Advocate, and Synthesiser agents debate findings, challenge each other's conclusions, and converge on an answer.
Emergent chaos — agents can contradict each other, loop endlessly, or amplify each other's errors. This pattern requires the most mature engineering foundations to deploy safely.
Side-by-Side Comparison
| Pattern | Autonomy | Complexity | Cost | Best For |
|---|---|---|---|---|
| Single Agent + Tools | Medium | Low | Low | Self-contained tasks |
| Pipeline | Low | Low | Low–Med | Predictable, staged workflows |
| Orchestrator + Subagents | High | Medium | Medium | Complex, dynamic multi-step tasks |
| Parallelisation | Medium | Medium | Medium–High | Large, decomposable tasks |
| Generator + Evaluator | Medium | Medium | Medium | Quality-critical outputs |
| Swarm | Very High | High | High | Open-ended, adversarial tasks |
| Use Case | Recommended Pattern | Why |
|---|---|---|
| Customer support bot | Single Agent + Tools | Contained, tool-driven, low risk |
| Blog post generation | Pipeline | Predictable stages: research → draft → edit |
| Software development assistant | Orchestrator + Subagents | Specialist skills, dynamic planning |
| Batch document analysis | Parallelisation | Independent docs, speed matters |
| Code generation + testing | Generator + Evaluator | Objective success criterion (tests pass) |
| Strategy research with critique | Swarm | Needs adversarial debate, emergent insight |
| Data pipeline with LLM steps | Pipeline | Structured, auditable, deterministic |
| Multi-market competitive analysis | Parallelisation + Orchestrator | Parallel research, central synthesis |
Pitfalls, Principles & Production Checklist
- Over-engineering from day one: Starting with a Swarm when a single agent with two tools would have worked. Earn complexity through iteration.
- No hard iteration limits: Loops without exit conditions will run until your API budget is exhausted. Always set a max step count.
- Treating all actions as reversible: Sending emails, writing to databases, and calling payment APIs are not undoable. Build explicit confirmation gates for irreversible actions.
- Insufficient observability: You can't debug what you can't see. Every agent call should log its input, output, tool calls, and decision rationale.
- Trusting subagent output blindly: An orchestrator that passes subagent output directly to the next step without validation is a reliability bomb. Validate at each handoff.
No matter how capable your agents, always require human approval before:
- Sending any communication to an external party (email, message, post)
- Writing to or deleting from a production database
- Triggering financial transactions of any kind
- Making changes that affect other people's work or data
- Taking any action the system hasn't been explicitly tested on
The safest default: agents propose, humans approve. Automate the approval away only after you have strong confidence from real-world testing.
- [ ] Max iteration limit set on every loop and every agent
- [ ] Structured logging on every LLM call with input/output/tool use
- [ ] Graceful error handling — what does the system do when a subagent fails?
- [ ] Cost budget — per-task token limits are set and monitored
- [ ] Human review gates before all irreversible actions
- [ ] Context window management — long tasks must summarise or offload memory
- [ ] Fallback behaviour — what happens when the LLM is unavailable?
- [ ] Prompt injection defence — agents that ingest external content must sanitise it
- [ ] End-to-end tested on real tasks before production deployment