Orchestration on Corebaseit — POS · EMV · Payments · AI

Multi-Agent Systems Scale Vertically. They Need to Scale Horizontally.

contact@corebaseit.com (Vincent Bevia) — Fri, 03 Apr 2026 10:00:00 +0100

This post continues the ideas explored in Part I: Super Agents and Multi-Agent Communication and Part II: Swarm Intelligence. Those posts covered how agents coordinate within a workflow. This one asks what happens after the workflow ends.

After spending time with the orchestrator pattern and the swarm pattern, I kept running into the same gap — one that the field has not been honest enough about.

Agents can communicate within a workflow. They can share state, hand off tasks, and coordinate through structured message protocols. I covered all of that in the previous posts, and all of that is solved. What is not solved is this: once the run completes and the agents figure out how to handle a complex workflow, that knowledge stays isolated. The next run starts cold.

That is the vertical scaling trap. And the more I read — across Reflexion, ERL, Letta’s stateful agent work, and Google Research’s recent findings on scaling agent systems — the more I realized this is the most important unsolved problem in multi-agent architecture today.

What Vertical Scaling Actually Means

The industry has concentrated its investment on making individual agents more capable in isolation — longer context windows, stronger reasoning models, richer tool sets, more compute per inference call. This is vertical scaling: more depth, more power, more intelligence concentrated in a single node.

Vertical scaling has delivered real gains. Modern LLM-based agents can handle significantly longer reasoning chains, maintain larger working memories, and invoke more complex tool sequences than agents from two years ago. The benchmark numbers confirm this.

But vertical scaling has a ceiling, and that ceiling is architectural, not computational. No matter how capable a single agent becomes, a system of agents that starts each run from a blank slate cannot accumulate collective intelligence over time. Every execution is, in a meaningful sense, the first time that system has encountered the problem.

That is the definition of a system that does not learn.

The Statefulness Illusion

This was the part that clarified the problem most for me. LLM agents are stateless by design. The model itself has no memory between API calls — every inference starts fresh, bounded by what exists inside the current context window. What looks like agent memory in most production frameworks is actually infrastructure built around the model: conversation history injected into the prompt, vector stores queried at retrieval time, workflow state persisted in an external database.

The agent does not remember. The infrastructure remembers. And the agent only knows what the infrastructure decides to surface at inference time.

This distinction matters because it exposes the scope of what is currently being solved. Stateful agent frameworks — LangGraph, MemGPT/Letta, Amazon Bedrock AgentCore Memory, and others — address continuity within a workflow and within a user session. They do not address what happens between runs, across agent instances, or across different executions of the same workflow by different users.

Each agent run, regardless of the framework, is still largely isolated from the accumulated experience of every run that came before it.

The Horizontal Scaling Problem

Horizontal scaling in multi-agent systems means something different from what the term usually implies in infrastructure. It is not about running more agent instances in parallel — that is a load distribution problem, and it is solved. The horizontal scaling problem I’m describing is about propagating learned competence across agents and across runs.

When I mapped the gap concretely, it looked like this:

Capability	Current State
Agents share state within a run	Solved
Agents communicate within a workflow	Solved
Agent learns within a run (self-reflection)	Partial — Reflexion
Successful strategy propagates to next run	Not solved
Knowledge discovered by one agent available to others	Not solved
Collective intelligence accumulates over time without retraining	Not solved

The bottom three rows represent the horizontal scaling gap. It is not a matter of framework maturity — it is an architectural primitive that does not yet exist in production multi-agent systems.

What the Field Has Built as Workarounds

Research and engineering teams have made partial progress, and it’s worth naming what exists honestly.

Shared episodic memory stores. Agents can write successful reasoning traces or strategy summaries to a vector database that future agent instances retrieve via RAG. This is useful, but the memory is static once written. It does not update based on outcomes, and retrieval quality determines whether the right experience surfaces at the right moment.

Reflexion and its descendants. Reflexion (Shinn et al., NeurIPS 2023) introduced a framework where agents verbally reflect on task feedback and store those reflections in an episodic memory buffer to improve decision-making in subsequent trials — without modifying model weights. This is a genuine step forward, and it’s the work that first made me think seriously about this problem. But Reflexion is fundamentally a within-run or within-session mechanism. The reflective memory does not propagate across agent instances or persist as a shared resource across independent runs.

ExpeL and Experiential Reflective Learning. More recent work, including ExpeL (Zhao et al., 2024) and ERL (2025), extracts reusable heuristics by comparing successful and failed trajectories, then injects the most relevant heuristics into future agent contexts via retrieval. This is directionally correct. ERL reports a +7.8% improvement over a ReAct baseline on complex agentic benchmarks precisely because failure-derived heuristics provide negative constraints that prune ineffective strategies. But even here, the experience pool is curated offline, retrieval is still prompt injection, and the feedback loop is not real-time.

Prompt distillation and fine-tuning. Successful agent runs can generate training data that feeds a fine-tuning pipeline. This is horizontally scalable in principle — the knowledge of one run eventually improves the base model that all agents use. But the feedback loop is slow, expensive, requires human curation, and operates offline. It is not collective learning; it is deferred knowledge consolidation.

Workflow libraries and pattern registries. Teams manually curate successful workflow templates. This is human-mediated knowledge transfer, not agent-mediated. It does not scale.

None of these close the gap. They are engineered workarounds for the absence of a proper horizontal learning primitive.

What Is Actually Missing

The architectural primitive that does not yet exist is a persistent, agent-writable, outcome-weighted knowledge layer — one where agents contribute strategy signals after a run completes, and those signals influence future agent behavior without requiring a full retraining cycle or human curation.

The biological analogy came back to me here from the swarm intelligence research I covered in Part II: pheromone trails in ant colonies are not just a communication mechanism — they are a distributed, incrementally updated knowledge store. Shorter, higher-quality paths accumulate stronger signals through positive feedback. Failed paths evaporate. The swarm’s collective intelligence is encoded in the medium itself, not in any individual. No central controller decides which trails are “good.” The outcome does.

What that looks like for LLM-based multi-agent systems is still an open design problem, but the requirements I’ve been able to identify are:

Outcome-weighted writes. Agent runs that complete successfully contribute to the shared knowledge layer with positive weight; failed runs contribute negative constraints. Both are useful — ERL’s results show that failure-derived heuristics often outperform success-derived ones on search tasks.
Decentralized propagation. The update mechanism cannot require a human in the loop or an offline batch process. Strategy signals need to propagate in something close to real time across agent instances.
Relevance-gated retrieval. Future agents need to surface relevant prior experience without injecting everything into context. This is partially addressed by LLM-based retrieval scoring, but remains unsolved at scale.
No weight updates required. The mechanism needs to operate within the context engineering layer, not through gradient descent. Retraining is too slow and too expensive for real-time collective learning.

Why the Industry Has Not Solved It

The more I thought about it, the more I realized the incentive structure explains the gap more than the technical difficulty does.

Vertical scaling — a bigger model, a stronger benchmark score, a longer context window — has a clear commercial lever. It is attributable to a specific product release and easy to market. Horizontal knowledge propagation is architecturally harder, requires runtime infrastructure that does not exist yet, and the value it generates is distributed across runs and users rather than attributable to a single capability upgrade.

Google Research’s recent work on scaling agent systems found that adding more agents does not consistently improve performance — multi-agent coordination yields substantial gains on parallelizable tasks but can actually degrade performance on sequential workflows. More agents is not the answer. Smarter knowledge transfer is. But that is a harder problem to benchmark and a harder story to sell.

The Architectural Opportunity

The systems that will win over the next two to three years will not be the ones with the largest individual agents. They will be the ones that figure out how to make collective experience accumulate efficiently across runs, across users, and across agent instances — without requiring a human editor or an offline training cycle to make it useful.

This is, in a meaningful sense, the missing layer of agentic AI infrastructure. The orchestration layer exists — I covered it in Part I. The communication protocols exist. The shared state store exists. The swarm coordination patterns exist — I covered those in Part II. What does not exist is a production-grade mechanism for collective learning that operates at runtime.

The research directions are beginning to converge on this problem — Reflexion, ERL, Collaborative Memory — but none has produced a general-purpose primitive that production systems can adopt. That gap is both the honest state of the art and the most interesting open problem in multi-agent architecture today.

References

Letta. “Stateful Agents: The Missing Link in LLM Intelligence.” letta.com
Shinn, N. et al. “Reflexion: Language Agents with Verbal Reinforcement Learning.” NeurIPS 2023. arxiv.org/abs/2303.11366
Rezazadeh, M. et al. “Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control.” 2025. arxiv.org/abs/2505.18279
“Experiential Reflective Learning for Self-Improving LLM Agents.” 2025. arxiv.org/abs/2603.24639
Google Research. “Towards a Science of Scaling Agent Systems: When and Why Agent Systems Work.” 2026.
Part I: Super Agents and Multi-Agent Communication — orchestration, structured communication, and the single source of truth
Part II: Swarm Intelligence — The Opposite Architectural Bet — decentralized coordination and emergent intelligence
Reasoning Models and Deep Reasoning in LLMs — the reasoning strategies that power individual agents
The Obsolescence Paradox: Why the Best Engineers Will Thrive in the AI Era — engineering judgment in the age of autonomous AI systems

Super Agents and Multi-Agent Communication: Architecture That Actually Scales

contact@corebaseit.com (Vincent Bevia) — Fri, 27 Mar 2026 22:00:00 +0100

This is Part I of a two-part series on multi-agent AI architecture. This post covers centralized orchestration. Part II explores the opposite approach: swarm intelligence.

I’ve been reading a lot about “super agents” lately — and once I got past the marketing noise, I found a genuinely useful architectural pattern underneath.

The term gets thrown around loosely, but the more I dug into it — across AWS documentation, IBM’s multi-agent research, LangGraph’s implementation guides, and a handful of practical engineering write-ups — the more I realized it maps cleanly onto problems that single-model, turn-by-turn systems simply cannot solve reliably: multi-step workflows with branching logic, delegated expertise, and external system integration. The concept is not new — multi-agent coordination has decades of research behind it — but LLMs have made it practically viable in ways that weren’t possible three years ago.

This post is my attempt to organize what I’ve learned: what the term actually means, how agents communicate in practice, and a minimal Python implementation I put together to make the pattern concrete before reaching for a framework.

What Is a Super Agent?

The clearest definition I found across the literature: a super agent is an autonomous AI system capable of interpreting a high-level goal, decomposing it into sub-tasks, orchestrating tools and specialist agents, and executing a multi-step workflow with minimal human intervention. That’s the architectural distinction that separates it from a standard chatbot — a chatbot responds turn-by-turn; a super agent plans, delegates, acts, and adapts.

What struck me when I started pulling the concept apart is how concrete the capabilities actually are:

Decompose goals — translate a high-level objective (“Audit our Q2 pipeline and notify the reps”) into a sequenced set of executable tasks.
Orchestrate tools and sub-agents — coordinate search, code execution, external APIs, CRM writes, and domain-specific agents as a unified workflow.
Maintain long-horizon context — preserve memory of the user, the project state, and intermediate results across multiple reasoning steps.
Act in external systems — send emails, update records, generate documents, and book reservations — not just describe how to do those things.
Support human-in-the-loop — pause for confirmation, accept corrections, and revise plans accordingly.

The framing that resonated most with me is that a super agent functions as a digital teammate that can plan, decide, and act — not a passive assistant that generates single responses.

Do Agents Actually Talk to Each Other?

This was the question that pulled me deeper into the topic. The answer is yes — and the way they do it is where the architecture gets interesting. In multi-agent systems, agents communicate via structured messages to coordinate work, share intermediate results, and negotiate task ownership.

Communication Mechanisms

From what I found, three mechanisms dominate in practice:

Message passing. Agents exchange typed messages (request, result, status, feedback) over a bus, queue, or shared memory store. The message structure includes sender, receiver, intent, payload, and timestamp, so both sides can route and act on messages reliably. This is the most flexible mechanism and the one that most closely resembles traditional distributed systems communication — which, coming from a systems engineering background, immediately made sense to me.

Shared state. Rather than direct peer-to-peer calls, agents read from and write to a single authoritative state object. This is the foundation of LangGraph-style graphs and is the pattern most relevant to in-process agent systems. The state object becomes both the communication channel and the coordination mechanism — agents don’t need to know about each other, only about the state contract.

Natural language over a structured envelope. LLM-based agents can exchange plain-text prompts and responses, but production systems wrap those in a JSON schema or DSL to reduce ambiguity and enable deterministic parsing. The natural language carries the semantic content; the envelope carries the routing and type information that machines need to act on it reliably.

Coordination Patterns

The coordination patterns I kept seeing across the literature include request–response, broadcast, task announcement and bidding, and peer-to-peer collaboration where agents refine each other’s outputs. The coordination role is explicit: either a planner agent delegates to workers, or agents operate in a fully collaborative graph where outputs flow through defined contracts.

What I found particularly useful to think about is how the choice of coordination pattern has direct architectural consequences. A centralized planner is simpler to reason about and debug, but creates a single point of failure. A fully distributed collaboration graph is more resilient but harder to monitor and control. Most production systems seem to land somewhere in between — a planner that delegates to autonomous agents, with guardrails and fallback logic at the orchestration layer.

A Minimal In-Process Pattern

To make this concrete for myself, I put together a minimal example. The cleanest starting point I could find for understanding agent-to-agent communication requires only three components: a shared state object, two agent functions, and a lightweight orchestrator that sequences them.

from dataclasses import dataclass, field
from typing import List, Dict

@dataclass
class State:
 user_goal: str
 messages: List[Dict[str, str]] = field(default_factory=list)
 draft: str = ""
 review: str = ""

def writer_agent(state: State) -> None:
 state.draft = f"Draft for goal: {state.user_goal}"
 state.messages.append({
 "from": "writer",
 "to": "reviewer",
 "type": "draft",
 "content": state.draft,
 })

def reviewer_agent(state: State) -> None:
 incoming = state.messages[-1]["content"]
 state.review = f"Reviewed version of: {incoming}"
 state.messages.append({
 "from": "reviewer",
 "to": "writer",
 "type": "review",
 "content": state.review,
 })

def run_workflow(goal: str) -> State:
 state = State(user_goal=goal)
 writer_agent(state)
 reviewer_agent(state)
 return state

state = run_workflow("Create a short API integration summary")
print(state.messages)
print(state.review)

writer_agent() produces a draft and appends a typed message targeted at the reviewer. reviewer_agent() reads that message and writes its response back into the same structure. Both agents live in the same process, yet the message list enforces a clean protocol boundary — which is exactly what makes the design debuggable and extensible.

Why This Pattern Scales

What I like about this design is that the agents are loosely coupled: they do not invoke each other’s business logic directly; they communicate through state and message contracts. That separation makes it straightforward to insert a supervisor, add retries, inject validation, or introduce checkpointing without rewriting each agent’s core responsibility.

When I later looked at LangGraph, I found this same idea formalized as graph nodes that receive state and return a Command specifying which node runs next and what state updates to apply. The plain Python example above maps directly to START → writer → reviewer → END, with shared state as the communication channel. Building the minimal version first helped me understand what the framework is actually abstracting.

The Super Agent as Orchestrator

One pattern that came up consistently across everything I read: in production multi-agent systems, the super agent is the orchestrator — not another worker. This distinction matters more than it sounds.

The orchestrator does not perform domain work. It decomposes the user goal and assigns sub-tasks to specialist agents. It tracks workflow state, evaluates intermediate results, and decides on next steps, retries, or fallbacks. It enforces policies, cost boundaries, and safety checks at a single control point. Every specialist agent has a scoped responsibility; the orchestrator has workflow-level visibility.

I sketched out two diagrams to think through how this works in practice. The first illustrates a software delivery context: a single Super Agent at the top of the hierarchy delegates to five specialized agents — Requirements, Coder, Refactor, Test, and Documentation — each with a clearly scoped responsibility and no direct coupling to the others.

The second diagram scales the same pattern to a broader engineering context. Here the orchestrator coordinates six agents covering the full stack — Requirements, Architecture, Frontend, Backend, Test, and Security — and what I noticed is that the hierarchy holds regardless of how many specialists you introduce.

What stays constant across both diagrams — and what I think is the key insight — is that the orchestrator is the only node with full workflow visibility. Specialist agents receive scoped inputs and produce scoped outputs. They do not need to know what the other agents are doing. That coordination burden belongs entirely to the super agent.

The practical three-layer production pattern that I kept seeing emerge:

Layer	Role
Orchestrator / super agent	Owns the workflow graph, task assignment, and gate logic
Shared context store	Versioned state or artifacts (DB, files, or structured in-memory state) — the single source of truth
Specialist agents	Read from the store, produce outputs into it, never assume hidden state

This layering felt immediately familiar to me. It mirrors how well-designed distributed systems have always worked: a coordinator with global visibility, workers with local scope, and a shared data layer that keeps everyone honest.

Single Source of Truth: Non-Negotiable

One thing that stood out across nearly every resource I read: multi-agent systems fail when each agent builds its own version of reality. The mature architectures all anchor the entire system to a single source of truth — whether that is a shared in-process state object, a central database, or a versioned artifact store.

The benefits are concrete, and they’re the same benefits I’ve seen in any well-designed distributed system:

Consistency. No diverging world-views across agents running in parallel. When the coder agent writes a function and the test agent writes assertions against it, both are working from the same artifact — not from separate memories of what the specification said.

Debuggability. One place to inspect current state across the entire workflow. When something goes wrong — and in multi-agent systems, something always goes wrong — you need a single pane of glass to understand what each agent saw, what it produced, and where the chain broke.

Clean handoffs. Agents know exactly which fields or artifacts they are responsible for updating. They do not invent state. They do not carry assumptions from a previous run. They read, process, and write — through the central store.

Agents may maintain local working memory or intermediate caches for their own reasoning steps, but they must reconcile through the central truth store before producing outputs that other agents depend on. This is the difference between a system that works reliably and one that works until the agents’ internal models diverge — which, without a single source of truth, they eventually will.

The Bigger Picture

After going through all of this, my takeaway is that the super agent concept is not hype — if you ground it in architecture. The key properties are clear: a goal-decomposing orchestrator, loosely coupled specialist agents, structured inter-agent communication, and a single authoritative state store. The Python pattern in this post is deliberately minimal — I wanted to see the essential reasoning surface before layering on a framework.

If you are building toward a LangGraph or similar implementation, the concepts translate directly: nodes map to agents, edges map to message contracts, and the graph state is your single source of truth. The abstraction is different. The architecture is the same.

The broader realization I came away with is that the hard problem in agentic AI is not making individual agents smarter. It is making multiple agents coordinate reliably — which is, fundamentally, a systems engineering problem. The same principles that make distributed systems work — clear contracts, shared state, scoped responsibility, centralized coordination — are exactly the principles that make multi-agent systems work.

The models handle the reasoning. The architecture handles the reliability.

But centralized orchestration is not the only way to coordinate agents. In Part II, I explore the opposite architectural bet — swarm intelligence — where there is no orchestrator, no global plan, and global competence emerges from local interactions. Understanding when each pattern wins is what makes the difference between a good multi-agent design and an overengineered one.

References

Attention.com. “Introducing Super Agent: Your AI Teammate for Revenue Execution.” 2025.
IBM Think. “What is a Multi-Agent System.” ibm.com
AWS Prescriptive Guidance. “Agentic AI: Multi-Agent Collaboration Patterns.” docs.aws.amazon.com
GeeksforGeeks. “Multi-Agent System in AI.” geeksforgeeks.org
SmythOS. “Agent Communication in Multi-Agent Systems.” smythos.com
ApXML. “Communication Protocols for LLM Agents.” 2025.
DigitalOcean. “Agent Communication Protocols Explained.” digitalocean.com
LangChain. “LangGraph Multi-Agent Systems Overview.” langchain-ai.github.io
LangChain. “Multi-Agent Collaboration Tutorial.” langchain-ai.github.io
VentureBeat. “How Architectural Design Drives Reliable Multi-Agent Orchestration.” 2025.
IBM Community. “Agentic Multi-Cloud Infrastructure Orchestration.” 2025.
Latenode Community. “How Separate Agents Share a Single Memory.” 2025.
Part II: Swarm Intelligence — The Opposite Architectural Bet — decentralized coordination, emergent intelligence, and when to choose swarm over orchestrator
AI Sycophancy — why confident-looking AI output still requires verification, even from autonomous agents
Reasoning Models and Deep Reasoning in LLMs — the reasoning strategies that power individual agents
The Obsolescence Paradox: Why the Best Engineers Will Thrive in the AI Era — engineering judgment in the age of autonomous AI systems