◈ AGI

Agentic Architectures: ReAct, Reflexion, Planning, and Multi-Agent Coordination

A practical survey of the reasoning patterns behind autonomous AI agents, from ReAct and Reflexion to planning, self-consistency, and the trade-offs of multi-agent coordination.

Agentic Architectures

An “agent” is a language model situated within a loop that allows it to plan, act upon the world via tools, observe results, and adapt. The model itself is frozen; the intelligence of the system comes from how that loop is structured. This article surveys the reasoning patterns that make agents work and the design choices that determine whether they succeed or spiral out of control.

Why Patterns Matter

A raw language model produces a single response to a prompt. That’s sufficient for a question, but not for a goal like “research this topic, fact-check, and write a report.” Goals require multiple steps, intermediate checks, resilience to error, and the ability to use external tools. Agentic architectures are established recipes for turning a one-shot predictor into a multi-step problem solver.

Core Reasoning Patterns

ReAct (Reason + Act)

ReAct interleaves thought, action, and observation in a loop. The agent reasons about what to do, takes an action (often a tool call), observes the outcome, then re-reasons with that new information. This grounding in real observations allows the agent to self-correct: if a search yields nothing useful, the next thought can pivot strategy. ReAct is the workhorse pattern for structured reasoning and retrieval tasks, at a moderate token cost stemming from the running internal monologue.

Reflexion

Reflexion adds learning from failure. When a task fails, the agent writes a natural language self-critique — “I assumed the file existed; I should have checked first” — and stores it in memory. On the next attempt, that critique is fed back in as context. The agent isn’t retrained; it simply carries a verbal lesson. Reflexion shines on iterative work like coding, where getting it right the first time is unrealistic. The cost is that it requires reliable failure detection and persistent memory.

Tree-of-Thoughts and Graph-of-Thoughts

These patterns explore multiple parallel reasoning branches, evaluate intermediate states, and prune weaker ones. They can significantly outperform linear reasoning chains on naturally decomposable problems, such as mathematical puzzles and games. The trade-off is substantial: exploring multiple branches can be ten to a hundred times more token-expensive, so they’re only worth it when the problem structure justifies the search.

Plan-and-Execute

Here the agent generates a full plan upfront, then executes the steps — often with a cheaper model — only replanning if something fails. The advantage is fewer expensive reasoning calls and a plan that can be human-reviewed before any actions run. The fragility is that if the initial plan is fundamentally flawed, the agent has little opportunity to pivot mid-stream. A related variant, ReWOO, plans with placeholders and runs all tools before integrating results, drastically cutting token counts at the cost of no mid-stream self-correction.

Self-Consistency and Chain-of-Verification

Instead of trusting a single line of reasoning, these methods generate several independent paths and then reconcile them — either by majority vote or by a verification pass that checks the collective answer. This reduces hallucinations and catches gaps that trigger replanning. It’s especially valuable for high-stakes queries, where being confidently wrong is costly.

Pragmatic Hybridization

To complete tasks autonomously, the most powerful general-purpose formula combines three patterns: ReAct + Reflexion + Self-Consistency.

  • ReAct provides real-time self-correction via tool observation.
  • Reflexion captures failures so future runs improve.
  • Self-Consistency verifies answers via parallel reasoning paths.

Together they handle ambiguity, learn from mistakes, and check their own work — which is why this combination underpins many production research and coding agents.

Proper Tool Use

Tools are how agents touch reality, and sloppy tool design is a leading cause of agent failure. Established best practices:

  • Narrow scope. Each tool does one thing. Avoid Swiss Army knife tools whose behavior the model cannot predict.
  • Stable wrapper. Return a consistent shape — success, error, metadata — so the agent can reason about results uniformly.
  • Separate reads from writes. Risky write actions should never be improvised in the way a read query might be.
  • Strict schema. Every parameter described with a JSON schema, leaving no ambiguity for the model to hallucinate into.

Two important conventions have matured. The Model Context Protocol (MCP) has become the standard way to expose tools — browser automation, file access, databases — to any compatible model, with a large ecosystem of ready-made servers. The CodeAct pattern treats writing and running code as a single action: the agent emits a code block, a sandbox executes it, and stdout/stderr return as observation. CodeAct bundles many tool calls into fewer model turns and is naturally auditable, as the model’s code is explicit for review.

The Control Loop and its Safety Barriers

Every agent boils down to a simple loop: observe state, generate a thought and action, execute, update memory, then check whether to stop or continue. The danger is that if unbounded, this loop can run forever and burn money — precisely the kind of error that killed early autonomous agents.

Robust systems impose barriers:

  • Step and tool-call limits per loop and per session, so the agent cannot iterate indefinitely.
  • Token budgets with hard caps, and model tier degradation as the budget tightens.
  • Clear termination criteria — goal met, budget exhausted, or no progress (same action repeated) — instead of “just keep going until perfect.”
  • A three-layer barrier on input (request validation), output (fact-checking and constraint checking), and tool execution (a safety filter before any external calls).
  • Persisting state to file or git, not just relying on the LLM’s memory, so that as context fills, the system can checkpoint, spawn a new agent, and continue cleanly.

Single-Agent vs. Multi-Agent Coordination

A lively debate in the industry: when do you need multiple agents coordinating rather than one?

The argument for multi-agent systems is decomposition. A primary coordinating agent breaks a task into independent threads, parallel workers each handle a thread in their own context window, and the primary agent synthesizes the results. On naturally parallelizable work — research, data gathering, verification — this can yield huge jumps in quality, as token expenditure across independent paths solves problems a single agent’s budget cannot.

The argument against is fragility. Multi-agent setups can poorly share context, make conflicting decisions, and — in debate-style designs — descend into sycophancy, where agents agree with the majority even when the majority is wrong. For inherently sequential work like iterative code fixes, a single agent with good memory is simpler and more reliable.

The resolution is that the choice depends on the problem, not ideology:

  • Sequential or iterative work → a single agent plus Reflexion.
  • Parallelizable work → coordinator-with-workers pattern.
  • Latency or cost-sensitive work → a single agent, to avoid coordination overhead.

The multi-agent pattern that wins in practice is centralized coordination with isolated subagents: one agent holds the full context and spawns short-lived, memory-isolated subagents for specific subtasks, then makes the final decision itself. Isolation prevents the sycophancy that plagues peer-to-peer debate designs, and a simple gateway — only spawn subagents when a task is truly decomposable — keeps costs in check.

Putting It All Together

The art of agentic architecture is matching the pattern to the problem: ReAct for grounded reasoning, Reflexion for learning from failure, planning for long-term coherence, self-consistency for high-stakes answers, and disciplined tool use coupled with pervasive guardrails. None of this requires retraining a model. It’s engineering around frozen models — precisely why this is the most accessible and fastest-moving frontier in applied AI today.