Three Promises in the Agent Era, Part 2: Agents on production MySQL are fine. Until they're not.

You’re running agents against MySQL. So am I. It’s where we are. Sooner or later you’re going to be recovering data because of something one of them did. These are the three most common shapes that day takes.

An agent is loading test fixtures. It hits a foreign key error. It tries again with a different value. Same error. On the fourth try it issues SET foreign_key_checks=0, retries the inserts, and they succeed. The session ends. By lunchtime the next day, a JOIN against the parent table returns half the rows it should. Tickets start coming in.

The database did exactly what it was told. Every query was atomic, isolated, durable. ACID held the whole time. The flag exists. The flag does what the manual says it does.

The one that run the queries, the caller, flipped it without knowing what they gave up.

And the caller isn’t here anymore.

This is the first of three promises I wrote about. Execute Queries promises correctness under load. The hard contract, ACID, holds in the agent era. The soft contract, the layer of conventions calibrated for human callers, is what walks past you when you’re not looking.

SET foreign_key_checks=0 is a session flag. It exists for legitimate reasons. Bulk imports. Schema migrations. Anyone who’s reloaded a multi-hour mysqldump has flipped it. It’s a tradeoff. You give up protection against orphans for speed. The trade is fine if you understand it.

Humans deliberate. They have a model of why the constraint was there and a plan to put it back. They run the load, re-enable the check, run a SELECT to confirm no orphans before handing the database back.

Agents do not deliberate. They have a goal and a toolbox, and when a tool returns an error they reach for the thing that makes the error go away. The flag makes the error go away. The insert succeeds. The agent moves on.

The constraint comes back the next session.

The orphan rows do not.

Even humans forget to turn the constraint back on. The difference is the human remembers tomorrow. The agent doesn’t have a tomorrow.

The transactions that nobody held

Forty agent workers are processing a batch. Each opens a transaction, reads a row, sends the contents to an LLM for classification, waits eight seconds, writes the result, commits. By minute three the InnoDB history list has crossed two million. By minute five an unrelated reporting query that ran in 200ms yesterday is timing out. Nobody touched the reporting query.

A human who opens BEGIN and walks to lunch is rare and obvious. The DBA sees the long transaction in INFORMATION_SCHEMA.INNODB_TRX and kills it.

Long transactions held by agents are not pathologies. They are the workload.

Every transaction is open during the LLM round-trip. Locks held. Undo entries piled up. History list growing because purge can’t advance past the oldest active read view. None of this violates ACID. Everyone else suffers anyway.

The slow query log is uninformative. Each statement inside the transaction is fast. The transaction as a whole is slow, but the slow log measures statements, not transactions.

The assumption that broke is that wall-clock time per transaction would be comparable to the wall-clock time of the queries inside it. Humans typed faster than the queries ran.

LLMs do not.

Nobody told the database the unit of work was now bracketed by inference latency.

The user who got charged twice

An agent is reconciling payouts. It runs UPDATE accounts SET balance = balance - 100 WHERE id = 42. The connection times out. It retries. The retry succeeds. Both UPDATEs were committed. The user is short two hundred dollars and the database has no record of anything wrong.

Idempotency is the contract that says I might call this twice, make it safe. MySQL has no concept of idempotency for arbitrary writes. That UPDATE is not idempotent. Period.

Humans who think about idempotency design around it. They use a unique transaction key. They check before they update. They make it INSERT ... ON DUPLICATE KEY UPDATE. The ones who don’t cause occasional bugs. Agents who don’t cause them at scale.

Agents retry. That’s the default on every kind of failure: timeout, network blip, ambiguous response. The retry is reasonable in many contexts. It is catastrophic for non-idempotent writes.

You can find this in the binlog if you go looking. Two UPDATE events, milliseconds apart, identical. No errors. No alerts. The database did what it was asked to do, twice, because it was asked twice.

The agent didn’t mean to. It doesn’t mean anything. That’s the whole problem.

What this means for whoever ends up holding the bag

The original framework asked: is the query slow? Is the optimizer choosing badly? Is there lock contention?

Those questions are still good. They miss the case where the contract is being honored but whoever asked for the query didn’t know what they were asking for.

The new diagnostic question is shorter and harder.

Did the caller understand what they asked for?

If the answer is no, no amount of tuning will help. The fix is upstream of the database. None of it lives in MySQL. All of it was a condition MySQL was originally allowed to assume.

The promise still holds. The contract is intact.

What we lost is the right to assume that the one writing to production has anything to lose.

Part 3 is on Relationships: https://blog.dbtrail.com/an-ai-agent-is-talking-to-your-mysql-and-the-database-doesnt-know-who-is-that-agent-era-part-3/

If you’ve seen these patterns in production, especially shapes I didn’t cover, write. The examples are the part of this argument that get richer the more cases people contribute.

Three Promises in the Agent Era, Part 2: Agents on production MySQL are fine. Until they’re not.

The transactions that nobody held

The user who got charged twice

What this means for whoever ends up holding the bag

Leave a comment Cancel reply