An Ideas to Life experiment

🤖 Agentic AI Architecture Patterns Cheat Sheet

A decision guide for choosing the right architecture for your AI system

🎯 Find Your Pattern — Interactive Selector

Choose Your Path

Pick the approach that feels most natural to you:

Which scenario best describes what you're building?

💬
Customer Support Bot
Answer questions, look up orders, check policies, draft responses
📝
Content Generation Pipeline
Research topics, draft articles, edit for style, format for publication
💻
Coding Assistant
Write code, run tests, review quality, document functions
📊
Batch Document Analysis
Process hundreds of documents, extract insights, synthesise findings
🔬
Research & Synthesis
Deep research across sources, fact-check claims, produce comprehensive reports
⚙️
Workflow Automation
Multi-step business process with decisions, approvals, and handoffs

Foundations — What Makes a System "Agentic"?

The Core Idea in One Sentence

An agentic AI system is one where the model doesn't just respond — it plans, decides, and acts across multiple steps to achieve a goal, often using tools, memory, and other agents along the way.

Think of it like the difference between a calculator (you do all the thinking, it executes one step) and an accountant (you give them a goal, they figure out the steps themselves). The move from calculator to accountant is the move from "LLM call" to "agentic system".

The Four Agentic Properties

A system becomes more agentic as it gains more of these four properties. Understanding which ones your use case needs is the first step toward choosing the right architecture.

  • Planning: The ability to break a goal into sub-tasks and sequence them — even if the sequence wasn't known upfront.
  • Tool Use: The ability to take actions in the world — searching the web, writing files, calling APIs, executing code.
  • Memory: The ability to retain and recall information across steps or sessions — from simple context windows to long-term vector stores.
  • Multi-step Autonomy: The ability to run a chain of reasoning/action loops without a human approving every step.
💡 Mental Model Before You Begin
Ask yourself: "On a scale from 'vending machine' to 'employee', how much autonomous judgement does my system need?" A vending machine executes exactly what you press. An employee interprets your goal, makes decisions, handles surprises, and reports back. Most agentic systems live somewhere in between — and your architecture should match exactly where on that scale you need to sit.
Why Architecture Choice Matters More Than Model Choice

Developers often obsess over which LLM to use, but for agentic systems, the architecture — how you connect, orchestrate, and constrain your agents — has a far greater impact on reliability, cost, and maintainability.

A well-architected system with a mid-tier model will outperform a poorly-architected system using the most powerful model available. Architecture defines how failures propagate, how errors get corrected, how costs scale, and how humans stay in control.

Decision Guide — The Questions to Ask First

Work through these questions in order — your answers narrow the field of suitable patterns.
1
Is the task well-defined with a predictable, linear sequence of steps?

E.g. "summarise this document" vs "research and write a market report"

Yes → Consider Pipeline or Single Agent No → Consider Orchestrator or Swarm patterns
2
Does the task require specialised skills that benefit from separation of concerns?

E.g. coding + security review + documentation are very different skill profiles

Yes → Consider Multi-Agent (Orchestrator-Subagents) No → A single capable agent may suffice
3
Can tasks be run in parallel, or must they be strictly sequential?

E.g. researching 5 companies simultaneously vs building on each prior finding

Parallelisable → Parallelisation or Swarm patterns Sequential → Pipeline or Orchestrator
4
How high is the cost of an error? Are actions reversible?

Sending an email is irreversible. Generating a draft is not.

High risk / irreversible → Mandatory Human-in-the-Loop checkpoints Low risk / reversible → More autonomy is acceptable
5
Does quality need to be validated before the output is used?

E.g. generated code that must actually run, or a report that needs fact-checking

Yes → Add an Evaluator agent or reflection loop Sometimes → Build in optional review gates
6
Do you need to optimise through iteration — not just execute once?

E.g. prompt refinement, code debugging, creative writing improvement cycles

Yes → Optimiser or Generator-Evaluator loop No → Avoid loops — they add cost and latency
💡 The Golden Rule of Agentic Design
Start with the simplest architecture that could possibly work, then add complexity only when you hit a clear limitation. Most teams jump straight to multi-agent orchestration when a well-prompted single agent with a few tools would have solved the problem cheaper, faster, and with fewer failure modes. Complexity should be earned, not assumed.

The Patterns — A Practical Field Guide

1. Single Agent with Tools Low Complexity
"One capable generalist with the right equipment"

One LLM instance that reasons over a task and invokes tools (search, code execution, file I/O, APIs) as needed. The model decides when and how to use each tool, forming a think → act → observe loop until the task is complete.

✅ Use When
  • Task is self-contained and well-scoped
  • A single context window is sufficient
  • Toolset covers all needed actions
  • You want simplicity and low latency
❌ Avoid When
  • Task exceeds the context window
  • Specialised skills are required at each step
  • Errors in one step cascade catastrophically
  • Parallelism would significantly reduce time
🌍 Real Example

A customer support agent that looks up order details, checks policy docs, and drafts a personalised reply — all in one pass.

⚠️ Key Risk

Tool call loops can spiral. Always set a maximum iteration count to prevent runaway execution and unexpected costs.

2. Pipeline (Sequential Chain) Low Complexity
"An assembly line where each station transforms the work"

A fixed sequence of LLM calls where the output of each step becomes the input of the next. No dynamic routing — the flow is defined at design time. Each step can be a different prompt, model, or even a non-LLM transformation.

✅ Use When
  • The workflow is predictable and deterministic
  • Each step genuinely improves quality (e.g. draft → critique → refine)
  • You need easy observability and debugging
  • Different steps benefit from different system prompts
❌ Avoid When
  • The number of steps is unknown upfront
  • Intermediate steps need to branch based on content
  • Latency is critical (each step adds time)
🌍 Real Example

A content pipeline: Extract topics → Research each topic → Draft section → Edit for style → Format for publication.

⚠️ Key Risk

Error amplification — a mistake in step 2 gets refined and polished by steps 3–5, making it harder to detect. Add validation gates between stages.

3. Orchestrator + Subagents Medium Complexity
"A manager who delegates to specialists"

A central orchestrator agent breaks the goal into sub-tasks and dispatches them to specialised subagents. Each subagent has a narrow, deep focus. The orchestrator collects results, handles errors, and decides on next steps dynamically — it's the brain; the subagents are the hands.

✅ Use When
  • Task complexity exceeds a single agent's context
  • Specialist subagents (coder, researcher, writer) add real value
  • The path to completion is dynamic, not known upfront
  • You need fault isolation between components
❌ Avoid When
  • The task is simple enough for a single agent
  • Orchestration latency/cost is unacceptable
  • You lack observability tools to debug multi-agent flows
🌍 Real Example

A software development agent: Orchestrator plans the feature → Coder subagent writes code → Tester subagent runs tests → Reviewer subagent checks style → Orchestrator reports back.

⚠️ Key Risk

The orchestrator is a single point of failure. If it loses track of state or misinterprets a subagent's output, the whole task degrades. Invest heavily in its system prompt and context management.

4. Parallelisation Medium Complexity
"Many workers tackling different parts of the same job simultaneously"

Multiple agents (or LLM calls) run concurrently on independent sub-tasks and their results are aggregated. Ideal when a large task can be decomposed into independent chunks that don't need each other's output to proceed.

✅ Use When
  • Sub-tasks are independent of each other
  • Latency reduction is a priority
  • You want multiple independent opinions or perspectives
  • Large inputs need to be chunked and processed separately
❌ Avoid When
  • Sub-tasks have dependencies on each other
  • Cost is constrained (parallel = multiple concurrent LLM calls)
  • Aggregating results is non-trivial or lossy
🌍 Real Example

Analysing 50 customer interviews simultaneously — each agent handles one transcript, then results are synthesised into themes by a final aggregation call.

⚠️ Key Risk

The aggregation step is often underestimated. Combining 10 independent summaries into one coherent output requires a skilled final prompt — it's not just concatenation.

5. Generator + Evaluator (Reflection Loop) Medium Complexity
"A writer and an editor in a continuous improvement cycle"

A Generator agent produces an output; an Evaluator agent critiques it against specific criteria and feeds structured feedback back to the Generator. This loop continues until the Evaluator's criteria are met or a max iteration count is reached.

✅ Use When
  • Quality can be objectively evaluated (tests pass, rubric met)
  • First-pass output is rarely good enough
  • Iterative refinement has diminishing returns you can detect
  • The generator and evaluator benefit from different personas/prompts
❌ Avoid When
  • There is no clear, automatable success criterion
  • Loop cost/latency is unacceptable
  • The evaluator itself is unreliable or biased
🌍 Real Example

Code generation: Generator writes a function → Evaluator runs unit tests and returns failing cases → Generator fixes and retries → loop ends when all tests pass.

⚠️ Key Risk

Infinite loops and "sycophantic drift" — the evaluator starts approving outputs to end the loop. Always enforce a hard iteration limit and log each cycle.

6. Swarm (Peer-to-Peer Multi-Agent) High Complexity
"A team that self-organises without a central manager"

Multiple peer agents collaborate, share context, and hand off tasks to each other without a single central orchestrator. Each agent decides when to act, when to defer, and when to call on a peer. Coordination emerges from the agents' interactions.

✅ Use When
  • Tasks are highly dynamic and unpredictable
  • No single agent can have all necessary context
  • You need resilience — no central point of failure
  • Agents have genuinely complementary, peer-level expertise
❌ Avoid When
  • You need deterministic, auditable behaviour
  • Cost and latency are tightly constrained
  • Your observability tooling is immature
  • The task is well-defined (massive overkill)
🌍 Real Example

An autonomous research team: Researcher, Fact-Checker, Devil's Advocate, and Synthesiser agents debate findings, challenge each other's conclusions, and converge on an answer.

⚠️ Key Risk

Emergent chaos — agents can contradict each other, loop endlessly, or amplify each other's errors. This pattern requires the most mature engineering foundations to deploy safely.

Side-by-Side Comparison

Patterns at a Glance
Pattern Autonomy Complexity Cost Best For
Single Agent + Tools Medium Low Low Self-contained tasks
Pipeline Low Low Low–Med Predictable, staged workflows
Orchestrator + Subagents High Medium Medium Complex, dynamic multi-step tasks
Parallelisation Medium Medium Medium–High Large, decomposable tasks
Generator + Evaluator Medium Medium Medium Quality-critical outputs
Swarm Very High High High Open-ended, adversarial tasks
Choosing by Use Case
Use Case Recommended Pattern Why
Customer support bot Single Agent + Tools Contained, tool-driven, low risk
Blog post generation Pipeline Predictable stages: research → draft → edit
Software development assistant Orchestrator + Subagents Specialist skills, dynamic planning
Batch document analysis Parallelisation Independent docs, speed matters
Code generation + testing Generator + Evaluator Objective success criterion (tests pass)
Strategy research with critique Swarm Needs adversarial debate, emergent insight
Data pipeline with LLM steps Pipeline Structured, auditable, deterministic
Multi-market competitive analysis Parallelisation + Orchestrator Parallel research, central synthesis
💡 Hybrid Architectures Are the Norm
Real production systems rarely use a single pattern in isolation. A common combination: an Orchestrator manages the top-level workflow, spawns Parallel subagents for independent research, then passes results through a Pipeline for staged refinement, with a Generator + Evaluator loop on the quality-critical final output. Identify the dominant pattern first, then bolt on secondary patterns where needed.

Pitfalls, Principles & Production Checklist

❌ The 5 Most Common Agentic Architecture Mistakes
  • Over-engineering from day one: Starting with a Swarm when a single agent with two tools would have worked. Earn complexity through iteration.
  • No hard iteration limits: Loops without exit conditions will run until your API budget is exhausted. Always set a max step count.
  • Treating all actions as reversible: Sending emails, writing to databases, and calling payment APIs are not undoable. Build explicit confirmation gates for irreversible actions.
  • Insufficient observability: You can't debug what you can't see. Every agent call should log its input, output, tool calls, and decision rationale.
  • Trusting subagent output blindly: An orchestrator that passes subagent output directly to the next step without validation is a reliability bomb. Validate at each handoff.
Human-in-the-Loop: When to Always Require It

No matter how capable your agents, always require human approval before:

  • Sending any communication to an external party (email, message, post)
  • Writing to or deleting from a production database
  • Triggering financial transactions of any kind
  • Making changes that affect other people's work or data
  • Taking any action the system hasn't been explicitly tested on

The safest default: agents propose, humans approve. Automate the approval away only after you have strong confidence from real-world testing.

Production Readiness Checklist
  • [ ] Max iteration limit set on every loop and every agent
  • [ ] Structured logging on every LLM call with input/output/tool use
  • [ ] Graceful error handling — what does the system do when a subagent fails?
  • [ ] Cost budget — per-task token limits are set and monitored
  • [ ] Human review gates before all irreversible actions
  • [ ] Context window management — long tasks must summarise or offload memory
  • [ ] Fallback behaviour — what happens when the LLM is unavailable?
  • [ ] Prompt injection defence — agents that ingest external content must sanitise it
  • [ ] End-to-end tested on real tasks before production deployment
💡 The Minimal Footprint Principle
Design your agents to request only the permissions they actually need, avoid storing sensitive data beyond the immediate task, prefer reversible actions over irreversible ones, and err on the side of doing less and confirming with the user when uncertain. An agent that does less, reliably, is far more valuable than one that attempts more and fails unpredictably.