Flags only, never approvals — The Trestle Group

The category error

A vendor pitches a finance team an "AI approval system." The system reads invoices, checks them against approval rules, and approves the ones that pass.

The pitch is clean. The system "approves" routine items. Humans only review the exceptions. Headcount goes down, throughput goes up.

This is a category error. The AI is not approving anything. The AI is making a recommendation. The accountability sits with the human who reviews the recommendation. Calling the recommendation an "approval" muddles the accountability chain and produces compliance exposure that the buyer often does not see until an audit.

Where the chain breaks

Approval, in finance and compliance, is a legal and accountability act. When a person approves an invoice, they are attesting that the invoice meets the conditions for payment. If the invoice is fraudulent, the approver bears responsibility. The audit trail names the approver.

When an "AI" approves the invoice, the question of who attested becomes ambiguous:

Is it the engineer who built the system? They did not see the invoice.
Is it the executive who deployed the system? They did not see the invoice either.
Is it the AI? The AI is not a person and cannot bear accountability.
Is it the human who reviewed the exceptions? They reviewed the exceptions, not the items the system "approved."

The audit trail cannot name a responsible party. When something goes wrong, the team finds out that nobody was attesting to anything. The internal control was a fiction.

The pattern that works: flags only

The corrective pattern is structural. The AI never approves. The AI surfaces flags. Humans approve.

In practice:

The system reads the input (invoice, transaction, receipt, document).
The system applies the rules and decides whether the input passes or fails the rule set.
For items that pass: the system places them in a queue marked "ready for approval."
For items that fail: the system places them in a queue marked "needs review," with the rule that fired, the offending value, and the source quote.
A human approves both queues. The "ready for approval" queue is approved faster, because the human is verifying the system's work, not making the original judgment. The "needs review" queue takes longer, because the human is exercising actual judgment.

The accountability is clear. The human approves. The system surfaces. The audit trail names the approver, not the algorithm.

The economics still work

The objection from buyers is usually: "if a human still has to approve every item, what does the system save us?"

The answer is: a lot, but not what they thought.

The savings are not from removing the approval. The savings are from removing the work of constructing the approval. The human approving items in the "ready for approval" queue is reading a structured summary, not assembling the case from raw documents. They are verifying, not investigating. The marginal time per item goes from 10–20 minutes to 30–60 seconds.

The "needs review" queue takes longer per item, but the items in that queue are the ones that genuinely need human judgment. The system has sorted the work into "fast" and "considered," and the human's time is spent where it matters.

The throughput gain is real. The accountability chain stays intact.

What gets lost when this pattern is skipped

We have seen the alternative pattern (AI approves, human reviews exceptions) fail in three specific ways.

1. The exception queue becomes the only review. The "approved" items pass through without human attention. When one of them is wrong, the team finds out at audit. The audit finds that there is no record of a human attesting to the item. The internal control fails its test.

2. Trust collapses asymmetrically. The team trusts the system as long as nothing goes wrong. The first time something goes wrong (a fraudulent invoice approved, a non-compliant transaction missed), trust collapses entirely. The team reverts to reviewing every item manually, including the items that previously cleared the system. The throughput gain disappears.

3. The audit trail cannot answer "why." When an auditor asks "why did this invoice get approved?", the answer needs to be "X person reviewed it on Y date based on Z information." If the answer is "the system approved it," the auditor's next question is "who reviewed the system's logic?" If nobody can answer that, the control is documented as failed.

The implementation rules

When we build this pattern, the rules are:

The system places items into queues. It never writes to production records that touch money or compliance.
Queues are sorted by severity and rule. Items needing the most attention surface first.
Every queue item carries the rule that triggered it. Plain language, rule ID, source quote, suggested remediation. The human does not need to reconstruct the system's reasoning.
The human's decision is recorded with their identity, timestamp, and any notes. The audit trail is built into the system, not bolted on later.
The system reports on itself. How many items did it surface? How many were approved as-is? How many were corrected? How many were rejected? The team can see the system's accuracy and adjust the rules over time.

This is not a different system from "AI approval." It is the same engineering work with the accountability chain corrected.

The harder version of this conversation

Some clients want the AI to approve. They have been pitched the headcount-reduction version and the throughput numbers look better. When we propose flags-only governance, the conversation gets sharper.

The honest answer is: yes, flags-only is more conservative. Yes, it leaves a human in the loop for every item. Yes, the throughput numbers are slightly less aggressive. And no, we will not build the other version.

The reason: the other version creates compliance exposure the client does not always see at the point of decision. By the time the exposure is visible (an audit finding, a regulatory inquiry, a board question), the cost of the exposure exceeds the cost of the conservative architecture by an order of magnitude.

It is not a sales-friendly position. It is the position the work requires.

The summary

Three sentences:

The AI surfaces. Humans approve. The accountability chain stays intact.
The economics work because the human is verifying, not investigating. The marginal time per item drops by an order of magnitude without removing the human attestation.
The pattern is not optional in our practice. The alternative creates compliance exposure we will not build into a client's system.