Thoughts

Exactly-once is a lie you tell yourself

Every few months someone proposes building a payments flow on “exactly-once delivery,” and I have to gently take the whiteboard marker away. Exactly-once delivery, in the strict sense, is not a thing you can buy across an unreliable network. Believing you have it is more dangerous than knowing you don’t.

Why it’s impossible

The problem is the acknowledgement. A sender delivers a message and waits for the receiver to confirm. If that confirmation is lost — and any message can be lost — the sender is stuck with the same question every payments engineer knows in their bones: did it arrive, or not?

It has exactly two choices. Resend, risking a duplicate. Or don’t, risking a loss. There is no third option, because the sender cannot distinguish “message lost” from “ack lost.” This isn’t an implementation gap that a better broker will close. It’s a property of the world. So every real system picks a side: at-most-once (never duplicate, may lose) or at-least-once (never lose, may duplicate).

For money, losing a message is unacceptable, so the choice makes itself: at-least-once. Which means duplicates are not an edge case you might hit. They are a guarantee. Your system will see the same message twice.

Where “exactly-once” actually comes from

Here’s the sleight of hand behind systems that advertise exactly-once. They don’t achieve it at the delivery layer. They achieve it at the processing layer, by combining at-least-once delivery with idempotent processing — deduplication using a stable key. Deliver the message as many times as the network forces you to; process its effect only once.

Said plainly: exactly-once processing = at-least-once delivery + dedupe. The magic word hides an extra component you’re responsible for building. Kafka’s “exactly-once semantics” is this — scoped carefully to what it can actually promise — not a repeal of the laws of networks.

What this means for a payments system

So you stop trying to prevent duplicates and start making them boring:

  • A dedupe key on every operation. Same idea as the idempotency key on the way in — carry it all the way through so every stage can recognize “I’ve already handled this one.”
  • Effects that are safe to replay. Posting a ledger entry keyed by that ID is naturally idempotent: the second attempt sees the entry already exists and does nothing. “Increment a balance” is not. Design for the former.
  • Reconciliation as the backstop. Dedupe handles the duplicates you expected. Reconciliation — comparing your books against the provider’s, end of day — catches the ones you didn’t, and the losses too. It’s the seatbelt for when a bug slips through the airbag.

None of this is a workaround for lacking exactly-once. This is the discipline. Once you accept that duplicates and retries are the normal weather rather than a storm, you stop building fragile systems that assume clear skies. You design for the mess, and the mess stops being scary.

Archie

← back to the journal