Reluvate

Engineering

·7 min read

Exception-First Design

Most AI systems are designed for the happy path and handle exceptions as an afterthought. We do it backwards: design the exception path first, then automate the rest. This approach has saved us from catastrophic failures more times than I can count.

Ken Guo, Founder & CEO · 2026-03-22

Here is a pattern I see in almost every AI system that fails in production: the team spent 90% of their effort on the happy path — the case where everything works perfectly — and bolted on exception handling as an afterthought. When the system encounters something unexpected, it either fails silently, produces garbage output, or worse, produces confident-looking garbage output that nobody catches until it has caused real damage.

We flipped this at Reluvate about three years ago. Now, on every project, the first thing we design is the exception path. Before we write a single line of automation code, we answer: what happens when this system encounters something it cannot handle? Who gets notified? How fast? What does the fallback look like? Can the human override be executed in under five minutes?

This came from a painful experience. We had deployed an invoice processing system for a client that handled 95% of invoices perfectly. But that remaining 5% included things like handwritten invoices from small vendors, invoices in languages the OCR did not support, and invoices with non-standard layouts. The system processed them anyway and got them wrong. By the time anyone noticed, incorrect payments had been made. The client was not happy. Neither were we.

Now, every system we build has a confidence threshold. Below that threshold, the item goes to a human queue, not through the automation pipeline. The threshold is set conservatively at launch and adjusted based on real-world performance. We would rather have a system that automates 70% of volume accurately than one that automates 95% of volume with errors hiding in the output.

The human exception path needs to be designed as carefully as the automation path. This means a proper interface, not an email with a spreadsheet attached. The person handling exceptions needs to see what the AI attempted, why it was flagged, and what the likely correct answer is. They should be able to approve, correct, or reject with minimal friction. We have found that a well-designed exception interface actually makes the human faster at handling exceptions than they were at processing the items manually in the first place.

There is a feedback loop benefit that most people miss. Every exception that a human resolves becomes training data. The system learns from its failures, but only if you capture the human's correction in a structured way. Over time, the exception rate drops. We have seen systems go from 30% exception rate at launch to under 5% within six months, purely from this feedback loop. But you only get this benefit if you designed for exceptions from day one.

Exception-first design also changes how you scope projects. Instead of promising a client that the AI will handle everything, you promise that it will handle the straightforward cases and make the hard cases easier for humans. This is a much more honest pitch, and it sets realistic expectations. Clients who expect 100% automation are always disappointed. Clients who expect 80% automation with graceful handling of the remaining 20% are usually delighted.

The operational cost of exceptions is something you need to model explicitly. If your system processes ten thousand items per month and has a 10% exception rate, that is a thousand items for humans to review. At five minutes per item, that is roughly 83 hours of human work per month. You need to staff for that. You need to budget for that. If you pretend the exception rate will be zero, you will be scrambling to find people when the system goes live.

One thing we have learned about confidence scoring: it is not enough to have a single score. Different types of errors require different thresholds. An invoice with an unusual layout might score low on structural confidence but high on amount extraction. A system that uses a single threshold will either flag too many items or miss too many errors. We typically implement three to five confidence dimensions per system.

The broader principle is this: design for failure before you design for success. In any AI system, the failure mode is more important than the success mode, because successes take care of themselves and failures can compound. If you get the exception path right, the automation path is the easy part.

system-designexception-handlinghuman-in-the-loopreliability