We May No Longer Need Kafka Compatibility

This week at Iceberg Summit, a friend and I got into a pretty heated discussion about Kafka. Specifically: in the age of AI agents, does Kafka compatibility still matter as much as we think it does? We didn't really settle it, but the conversation stuck with me, because I think some assumptions we've all been making for years might no longer hold.

Kafka's Ecosystem and Compatibility

Kafka is running in over 90% of Fortune 500 companies. At this point it's not really a technology choice anymore, it's closer to a de facto standard.

What Kafka actually does is pretty simple: decouple and buffer. Dozens of systems producing data, dozens consuming it, Kafka in the middle so neither side has to care about the other. Production rate doesn't need to match consumption rate. It's a clean model and it works.

Which is why the market pattern is so consistent. Almost every new data streaming company makes themselves Kafka-compatible, because Kafka-compatible means you can be a drop-in replacement. You inherit the entire ecosystem without asking anyone to change anything: Kafka Connect, Kafka Streams, every consumer SDK engineers have already written and debugged and stopped thinking about.

Compatibility is the moat. Ecosystem is what keeps people from jumping ship.

That logic has been pretty much unchallengeable for the past decade. But two things are shifting right now that I think are worth taking seriously.

Shift 1: Upstream and Downstream Are Converging

A lot of Kafka's value comes from the sheer variety of what it connects. Upstream you might have Oracle, MySQL, MongoDB, Salesforce, Stripe, HubSpot. Downstream you might have Elasticsearch, Redshift, Snowflake, and 50 different microservices. That variety is exactly why you need something in the middle with a standard interface.

But that variety is shrinking, rapidly.

On the upstream side, the database market has basically consolidated around Postgres. Most new databases provisioned in the AI era are Postgres. This means the upstream source for CDC is becoming increasingly uniform. Postgres logical replication is on its way to being the standard input whether anyone officially decided that or not.

For SaaS integrations, this work used to take months. Writing connectors, handling rate limits, mapping schemas, keeping up with API changes every time a vendor decides to break something. Now if you throw a coding agent at it, you can have something working in days - not perfect, but working. The cost of these integrations is dropping, and when that cost drops, so does the urgency of having a standard protocol to unify everything.

On the downstream side, the same consolidation is happening at the analytical layer. Picking a data warehouse used to be a whole thing: Redshift vs Snowflake vs BigQuery. Now Apache Iceberg is becoming the default table format that everyone accepts. Doesn't really matter which query engine you use anymore, the data can just live as Iceberg tables on S3 or GCS.

So when upstream is mostly Postgres CDC, and downstream is mostly Iceberg on object storage, does the pipe in the middle really need a complex protocol designed for infinite variety? When the number of things you're connecting shrinks, the integration complexity shrinks with it. And the argument for "we need a standard protocol to handle all this" gets proportionally weaker.

Shift 2: Coding Agents Are Changing What Protocol Means

This one feels more fundamental to me.

Why do people want Kafka compatibility? A big reason is developer experience. Kafka-compatible means your engineers don't have to learn new SDKs, new CLIs, new mental models. They already know how Kafka works, so they can just use your thing. Faster onboarding, less friction. That's real value.

But that logic assumes the user is a human engineer who has to personally learn the protocol.

Almost every engineer now uses coding agents. Engineers nowadays simply describe what they want, and the agents write the code. A coding agent does not care what protocol you use. It'll figure out Kafka Protocol in seconds. It'll figure out some custom HTTP-based protocol you designed yourself in seconds. The learning cost of a protocol, which used to be a real cost that justified standardization, is now basically zero for the agent.

So if protocol learning cost is no longer the bottleneck, what makes a protocol good?

Maybe the answer is just: simplicity. The best protocol might be the most straightforward one, not because it has the biggest ecosystem, but because it's the easiest for an agent to reason about and operate. A file system is like that. It's the oldest abstraction we have, and it might be the one AI agents are most naturally fluent in.

We spent years optimizing for developer experiences for humans. We might need to start thinking about what the equivalent looks like for AI agents. And those two things don't necessarily point in the same direction.

First Principles: Designing a Message Queue for the Agentic AI Era

Suppose we start from scratch today. Forget Kafka, forget all the historical baggage. How would we design a message queue for the agentic AI era?

Upstream: Three Main Sources

First, SaaS Webhooks. Most SaaS products support webhooks, pushing HTTP requests when events happen. This is the lightest possible integration: no polling, no persistent connections.

Second, Postgres CDC. Using Postgres logical replication to listen to the WAL and capture row-level change events. As Postgres consolidates its dominance, this pipeline covers more and more of the landscape.

Third, agents. This is the new one. Agents are going to be producing events just like any other system, except the events are less "row updated in database" and more tool calling traces, chat logs, etc. That's a fundamentally different shape of data, and it's worth designing for from the start rather than bolting it on later.

Downstream: SaaS, Iceberg, and Agents

SaaS, like it or not, will still be there, especially for non-developers. Iceberg is the analytical endpoint, for accumulating historical data that OLAP queries can run against.

But the more interesting downstream is Agents. I think agents are the next generation of applications, the next generation of microservices. A business process that used to be handled by a set of microservices coordinating over RPC will increasingly be handled by a set of agents coordinating over messages. In this model, the downstream of a message queue is no longer "a consumer program polling for work." It's "an agent listening, and deciding autonomously what to do next based on the message."

Communication: Webhooks and Polling, Not Kafka Protocol

The most natural communication pattern between agents is a webhook: when an event happens, POST it to the target Agent's endpoint. Simple, stateless, HTTP-native.

For cases that need to pull historical messages or do batch processing, polling is enough. Downstream queries for new messages, takes them, processes them, acknowledges them.

These two patterns cover the vast majority of use cases, and both are HTTP-based, which makes them extremely coding agent-friendly. Generating an HTTP client is one of the things Coding Agents do best.

Storage: Object Store First

The storage layer of the next generation message queue, I believe, will be built on object storage: S3, GCS, R2, or others.

Yes, object storage latency is higher than local disk. But it is dramatically cheaper, and it requires almost no operational overhead. Running a Kafka cluster means maintaining ZooKeeper or KRaft, managing disks, networks, replication factors, and retention policies. Object storage means PUT and GET.

This is a classic trade of latency for cost. Worth noting: this idea is not new. Kafka's own diskless architectures, like the approach WarpStream uses and Confluent's Tiered Storage, both offload data to object storage for cost reduction. Even within the Kafka ecosystem, the direction of travel is already pointing here.

A More Aggressive Idea: File System as the Protocol

There's an even more aggressive possibility worth taking seriously: using the file system itself as the read/write interface for messages.

Upstream writes a message by writing a file to a path. Downstream consumes a message by reading that file. Path is topic, filename is offset, append is produce, read is consume.

This sounds primitive, but it has one property that matters: the file system is the interface AI agents are most fluent in. Almost every agent framework, when dealing with persistence, context storage, or tool calls, defaults to file system abstractions. If the message queue's interface is also the file system, Agents can operate on messages with zero friction, no dedicated SDK, no protocol knowledge required.

S3 files could become one of the most important building blocks of the next generation message queue.

The Backward Compatibility Question

None of this means Kafka compatibility is worthless tomorrow.

If you need to connect to systems built before any of this AI stuff, if you're replacing an existing Kafka cluster, if your team already has deep investment in the Kafka ecosystem, then yes, you need Kafka compatibility. Migration costs are real and I'm not hand-waving them away.

But I keep thinking about what happened with Postgres and MySQL.

MySQL is undoubtfully the king in the web 2.0 era. Postgres never made itself MySQL-compatible. But it has eaten a massive chunk of MySQL's market over just the past 5 years. Not by being compatible, but by being genuinely better in ways that mattered: JSON support, PostGIS, logical replication, a more active extension ecosystem. New projects started defaulting to Postgres. Migrations started happening. MySQL compatibility stopped being the prerequisite people assumed it was.

The dynamic I'm pointing at is: compatibility is a strong moat, but it's not permanent. It holds as long as the ecosystem keeps growing around the original. When the new projects stop starting with MySQL and start with Postgres instead, MySQL compatibility becomes less valuable over time almost automatically.

If the next generation of infrastructure projects defaults to object storage and HTTP and file-based interfaces instead of starting with Kafka, the same thing happens to Kafka compatibility. Gradually, then suddenly.

I think we might be closer to that inflection point than most people realize.

Anyway

I know most of this is further out than where most engineering teams are today. If you're running Kafka at scale it's not going anywhere for you anytime soon, and I'm not suggesting it should.

But I do think the underlying assumptions are worth questioning now, before the shift happens rather than after. When the main actors in your system are AI Agents instead of human-operated services, when writing code becomes more about describing intent than implementation, when protocol learning cost basically disappears, a lot of the reasoning behind our infrastructure choices changes.

The next generation message queue might not need to be Kafka-compatible. It might just be simpler, cheaper, and easier for Agents to work with.

If you're already building something in this direction, or if you think I'm completely wrong about this, I'd genuinely like to hear it.