The Conversational Fallacy: Why Text Based Agent Orchestration is Broken

I wrote a research paper as part of the AgenticMail project that I think captures one of the most overlooked problems in multi agent systems. The paper introduces the concept of the Conversational Fallacy: the assumption that because LLMs communicate in natural language, agent to agent communication should also be natural language.

It shouldn’t. And the numbers prove it.

The Problem with Conversational Orchestration

The standard approach to multi agent systems right now goes something like this: a parent agent decides it needs help with a subtask, so it spawns a sub agent. The parent writes a natural language prompt describing what it wants. The sub agent processes that prompt, does its work, and writes a natural language response back. The parent then parses that response to extract the useful information.

Every step of this is wasteful.

The parent has to formulate a clear, unambiguous prompt. The sub agent has to parse that prompt, figure out what’s actually being asked, and decide on a format for its response. The parent then has to parse the response, which might be structured differently every time. Natural language is flexible, which sounds like an advantage until you realize that flexibility means unpredictability.

Worse, the token costs are enormous. A simple “look up this email and tell me the sender” task might involve hundreds of tokens of prompt context, chain of thought reasoning, and a verbose natural language response, all for what could be a single function call returning one string.

The 53x Speedup

The paper benchmarks structured RPC (remote procedure calls) against conversational sub agent spawning across a range of common agent tasks. The results are striking: structured RPC achieves 53x faster response times on average.

The reason is straightforward. In the RPC model, the parent agent calls a typed function with specific parameters. The receiving agent (or service) processes the request and returns structured data. There’s no ambiguity in the request format. There’s no parsing overhead on the response. There’s no wasted tokens on conversational pleasantries or chain of thought reasoning that the parent doesn’t need to see.

For AgenticMail specifically, this means when an agent needs to search another agent’s mailbox, it doesn’t spawn a sub agent and say “Hey, can you look through your emails for anything about the Henderson account?” Instead, it calls a structured endpoint with parameters like agent_id, query, and limit, and gets back a typed array of email objects.

The Conversational Fallacy

The deeper insight in the paper is about why the industry defaulted to conversational orchestration in the first place. I call it the Conversational Fallacy: because LLMs are trained on and excel at natural language, developers assume natural language is the right interface for everything involving LLMs.

But inter agent communication isn’t a natural language problem. It’s a coordination problem. And coordination problems have been solved in computer science for decades with structured protocols, typed interfaces, and well defined contracts. We don’t let microservices talk to each other in prose. We use gRPC, REST with schemas, message queues with defined formats. The same principles apply to agent to agent communication.

The Conversational Fallacy leads to systems that are slow (every interaction requires an LLM inference), expensive (tokens add up fast across multi agent workflows), and fragile (natural language responses can vary unpredictably, breaking downstream parsing).

What Structured RPC Looks Like in Practice

In AgenticMail, agent to agent interactions are defined as typed RPC calls. Each available operation has a schema specifying its inputs and outputs. The calling agent doesn’t need to compose a prompt; it fills in a typed request. The receiving side doesn’t need to generate a natural language response; it returns structured data.

This doesn’t mean LLMs are removed from the picture. The parent agent still uses an LLM to decide which RPC call to make and how to interpret the results in context. The LLM handles the reasoning and decision making, which is what it’s good at. The actual data exchange happens through structured channels, which is what protocols are good at.

Why This Matters Beyond AgenticMail

The paper argues that as multi agent systems become more common, the industry needs to move past the Conversational Fallacy. Building agent to agent communication on natural language might feel intuitive and demo well, but it doesn’t scale. The 53x performance gap isn’t an optimization curiosity; it’s the difference between agent systems that are practical in production and ones that are too slow and expensive to deploy.

The full paper is included in the AgenticMail repository for anyone who wants the detailed methodology and benchmarks.

Source Code

The research paper, including detailed methodology and benchmark data, is available in the AgenticMail repository.

View the full source on GitHub