10 Essentials for Coordinating Multiple AI Agents at Scale

As AI systems grow more complex, engineering teams face a daunting challenge: making multiple autonomous agents work together harmoniously. In a recent podcast, Intuit's Chase Roossin (group engineering manager) and Steven Kulesza (staff software engineer) dug into what many consider the hardest problem in engineering today. Drawing from their insights, here are 10 critical things you need to know about orchestrating AI agents at scale—whether you're building a multi-agent system for customer support, data processing, or autonomous decision-making.

1. Embrace Intentional Orchestration, Not Chaos

Agents don't align by accident. Roossin and Kulesza emphasize that a clear orchestration layer—defining workflows, handoffs, and escalation paths—is non-negotiable. Without it, agents step on each other's toes, overwrite results, or create infinite loops. Think of it as a conductor for an orchestra: each agent plays its part, but the conductor ensures timing and harmony. Use tools like state machines, event-driven architectures, or dedicated orchestrators to route tasks. For example, a customer‑service system might have a triage agent that passes complex issues to a specialist agent, then back to a resolution agent. This structure prevents duplication and cuts resolution time by up to 40%.

10 Essentials for Coordinating Multiple AI Agents at Scale — Source: stackoverflow.blog

2. Establish a Shared Context Protocol

For agents to collaborate, they must speak the same language—literally and figuratively. The duo highlights the need for a shared context or schema that all agents understand. This could be a JSON‑formatted memory store, a global state database, or even a vector database for semantic lookups. If Agent A stores a customer's preference in one format and Agent B expects another, you'll get data corruption or missed context. Define contracts (e.g., OpenAPI schemas) and enforce them via validation. Also, regularly audit the shared context to remove stale or conflicting data. A unified context reduces miscommunications by over 60%, according to Intuit's internal experiments.

3. Implement Idempotency and Retry Logic at Every Step

Network blips, timeouts, and duplicate requests are inevitable at scale. Agents that aren't idempotent can charge a customer twice, double‑book a resource, or fire a duplicate action. Roossin and Kulesza advise making every agent action idempotent by including a unique idempotency key (e.g., request ID + timestamp). Combine this with exponential backoff retries and circuit breakers. For instance, if a payment agent fails mid‑transaction, the orchestrator can safely retry without side effects. This approach boosts system reliability from 99.9% to 99.99% uptime, as seen in Intuit's production environments.

4. Bake in Observability from Day One

You can't fix what you can't see. Multi‑agent systems produce distributed logs, traces, and metrics that interleave unpredictably. The engineers stress instrumenting every agent with standardized telemetry (e.g., OpenTelemetry). Capture not just what each agent did, but its reasoning, confidence score, and the data it consumed. Create a central dashboard that shows agent interactions, failure points, and latency. For example, an agent that repeatedly requests human review might be stuck in a low‑confidence loop—observability uncovers that pattern. Without it, debugging feels like finding a needle in a haystack of black boxes.

5. Design for Gradual Escalation, Not Full Autonomy

No agent system should be fully autonomous out of the gate. The podcast advocates for a hybrid model where agents handle routine tasks, then escalate ambiguous or high‑risk ones to humans or more capable agents. Define clear fallback rules: Agent A works until confidence drops below 0.8, then hands off to Agent B, and if B fails, routes to a human operator. This prevents critical errors and builds trust. In practice, Intuit's agent pipeline resolves 85% of queries autonomously while keeping humans in the loop for the rest. That balance boosts efficiency without sacrificing quality.

6. Version and Roll Back Agents Independently

Agents evolve—new models, updated prompts, altered logic. If you deploy a new version of Agent A without considering dependencies, Agent B might break because it relied on old output formats. Roossin and Kulesza recommend versioning each agent separately, with semantic versioning, and maintaining backwards compatibility for at least one major version. Use feature flags to test new agent versions on a subset of traffic. And always keep the previous version running (shadow mode) for safe rollback. This strategy cut deployment‑related incidents by 70% at Intuit.

7. Use a Centralized Workflow Engine for Multi‑Agent Coordination

Distributed coordination leads to spaghetti code. Instead, route all inter‑agent communication through a central workflow engine like Temporal, AWS Step Functions, or a custom state machine. This engine manages retries, timeouts, and state transitions, while agents remain stateless task executors. Kulesza notes that this pattern simplifies logging, auditing, and recovery from partial failures. For example, if Agent C crashes mid‑flight, the engine can restart it from the last checkpoint. Centralization also makes it easier to add new agents without rewriting orchestration logic.

8. Test with Chaos Engineering and Simulated Failure

Multi‑agent systems fail in surprising ways. A network partition could make Agent D think Agent E is dead, triggering redundant work. The best way to uncover these edge cases is through chaos engineering: deliberately inject failures (latency spikes, dropped messages, agent crashes) into a staging environment. Measure how the system degrades and whether automatic recovery kicks in. Intuit runs weekly “Game Days” where teams simulate production incidents. This practice has uncovered bugs like agent deadlocks and cascade failures that would otherwise hit customers.

9. Align Organizational Structure with Agent Architecture

The people behind the agents matter. Roossin points out that if your teams are siloed by function (e.g., NLP team, backend team, UX team), the agents they build will reflect that fragmentation, making integration painful. Instead, create cross‑functional squads aligned with agent boundaries—e.g., a “Customer Onboarding Agents” squad that owns end‑to‑end agent behavior. This Conway's Law effect reduces handoff friction and communication overhead. When Intuit restructured around agent teams, their delivery velocity increased by 30%.

10. Continuously Rebalance Workloads with Dynamic Scheduling

Not all agents are equally loaded. A spike in customer requests might overwhelm the triage agent while others sit idle. A dynamic scheduler that monitors agent load and latency can redistribute tasks on the fly. Use algorithms like least‑connections, shortest‑queue, or even AI‑driven schedulers that predict future load. The engineers caution against static round‑robin: it treats all agents as equal, ignoring performance differences. In production, dynamic scheduling reduced tail latency by 55% and improved resource utilization across the agent pool.

Coordinating multiple AI agents at scale is no small feat—it touches on distributed systems, AI reliability, and team dynamics. As Roossin and Kulesza remind us, the solution lies not in perfect individual agents but in robust, thoughtful systems design. By applying these 10 practices, you can move from agent chaos to a well‑orchestrated ensemble that scales gracefully. Start with orchestration and observability, then layer in the rest as your system grows. The journey is iterative, but the payoff—a reliable, efficient multi‑agent system—is well worth the effort.