The default pattern, and what it costs

A team has a workflow that feels automatable. Receipts come in, they need to be matched to ledger entries, the matches need to be checked against funder rules, and the resulting flags need to be triaged.

The default pattern when someone builds automation for this is a single end-to-end pipeline. The LLM reads the receipt, decides what it is, matches it against the ledger, checks the funder rule, and outputs a flag if something looks off. Five steps, one pass.

This works in a demo. It fails in production.

When the system flags a transaction wrongly in production, the team has no way to figure out where the error happened. Did the LLM misread the receipt? Did it match to the wrong ledger entry? Did it apply the rule incorrectly? The single-pass architecture entangles all of these decisions, so debugging requires either re-running the entire pipeline with full logging, or guessing.

In a finance or compliance context, "guessing" is not an acceptable position. The team loses trust in the system, reverts to manual review, and the engagement is effectively over.

The fix: separate the layers

The three-phase pipeline is the structural fix. Each phase has a clear input contract, a clear output contract, and an operating mode. The phases run in sequence. Failures are traceable to a specific phase.

Phase 1: Consolidation

The first phase pulls source data from systems of record, converts formats, and matches by deterministic keys. No language model involved. No judgment calls.

The deliverable of consolidation is a structured store where every record has a stable ID, a canonical format, and a traceable provenance. The downstream phases consume this store.

What this phase is doing: establishing what data exists, in what form, with what relationships. If a receipt does not have a matching ledger entry, the consolidation phase reports the gap. It does not guess. It does not infer a match. The output is a clean dataset with gaps explicitly marked.

The reason consolidation is its own phase: most automation failures we have seen at the data layer come from someone trying to consolidate and judge in the same step. The system reads a receipt, decides what it is, and writes it to a downstream store all at once. When the decision is wrong, the downstream store carries the wrong data, and every subsequent step compounds the error. Separating consolidation from judgment means consolidation can be tested in isolation.

Phase 2: Structured extraction

The second phase reads the consolidated documents and extracts structured fields. This is where the LLM earns its place. It reads each document and returns: amounts, dates, vendor names, document types, named entities. It classifies each document by content.

What this phase is NOT doing: it is not validating against rules. It is not approving. It is not writing to production records. It has one job: turn semi-structured text into structured data.

The reason this is its own phase: classification and rule-application are different operations with different reliability requirements. Classification can be tested against a labeled validation set. Rule-application is deterministic logic over the classified output. Separating them means the classification model can be improved without touching the rule logic, and the rules can be updated without retraining the model.

This phase is also where the "narrow LLM" principle applies. The LLM is doing reading, not deciding. It is allowed to fail in well-bounded ways (a wrong classification gets caught by downstream validation), and the failures are debuggable because the input/output contract is explicit.

Phase 3: QC and audit-trail generation

The third phase validates the structured data against the rule set. This is deterministic logic. Each rule fires on a specific condition, generates a flag if the condition is met, and records the rule ID, the offending value, the source quote, and a suggested remediation.

What this phase is NOT doing: it is not modifying the source data. It is not approving anything. It is surfacing.

The reason this is its own phase: rules change. Funder requirements update. Regulatory regimes evolve. When the rules live in a separate, versioned phase, they can be updated without touching the consolidation or extraction layers. A new rule that fires on transactions categorized by Phase 2 can be added without re-running Phase 1.

The audit-trail generation is the load-bearing output. Every flag from this phase carries the information an auditor or reviewer needs to evaluate it: which rule, what value, what evidence, what fix. The reviewer does not need to ask the system how it got there. The flag already says.

What this architecture buys you

The pattern looks more elaborate than the single-pass version. It is. The elaboration is the point.

Debuggability. When a flag is wrong, the team can trace it: was the consolidation wrong (Phase 1), was the classification wrong (Phase 2), or was the rule wrong (Phase 3)? Each phase has separable test cases. The error isolates.

Updateability. Rules change without code changes. Classification models can be improved without rule changes. The architecture supports independent evolution of each layer.

Auditability. The audit trail is structural. Every flag has its rule, its source, its evidence. The system can answer "why did you flag this?" without an engineer's involvement.

Defensibility. When an auditor asks the team a question, the team has an answer. The system's outputs hold up under review because the system was built to be reviewed.

What this architecture costs

The pattern is more work upfront than a single-pass build. The consolidation phase requires real engineering effort even when no judgment is happening. The structured extraction phase requires input/output contracts that the single-pass version skips. The QC phase requires the rule catalog to be explicit and versioned.

The upfront cost pays off in operating life. A system that is debuggable in month nine of production is worth substantially more than a system that hit the demo bar in week eight of build.

When the pattern does not apply

Some automation candidates do not need this architecture. If the workflow is purely deterministic (RPA-only), Phase 2 is not needed and the pipeline collapses to two phases. If the workflow has no audit exposure (informational dashboards, internal reminders, low-stakes scheduling), the QC phase can be lighter.

The pattern is calibrated for workflows where mistakes touch money, compliance, or reputation. Most of the work we take on does. So most of our builds follow the three-phase pattern, with the discipline of separable layers and explicit contracts.

The discipline is the deliverable.