data.day

OCR Accuracy Numbers Are Lies Until We Define “Correct”

Vendors claim '99% Accuracy.' They are lying. We define what 'Correct' actually means in finance: Supplier, Date, Total, Currency, and VAT.

“Mostly Correct” is “Totally Wrong”

You are paying for automation, but you are still checking every line. Why? Because you don’t trust the tool. And you shouldn’t.

A software vendor told me recently that their tool was “99% accurate.” I uploaded an invoice for $1,000 USD. The tool read: €1,000 EUR.

I asked the vendor: “Is this accurate?” They said: “Well, it got the number 1,000 right! It just missed the symbol.”

To a finance professional, that is not a minor detail. That is a currency exposure error. That is a bank reconciliation nightmare. That is a failure.

The Hidden Cost: The “Review” Bottleneck

If I cannot trust the data blindly, I have to open the PDF. If I have to open the PDF, the automation has failed.

The cost of bad accuracy is the salary of the person verifying the robot’s work. You are paying €50/hour for a human to proofread a machine. This defeats the purpose of buying the machine.

The ROI: The “Five Point” Definition of Correct

We do not grade on a curve. We grade on a binary pass/fail. For an invoice extraction to be counted as “Correct,” it must capture ALL FIVE of these fields perfectly:

  1. Supplier Identity: Not just the text string “Uber,” but a link to the existing Vendor Card in the ledger (ID: 4002).
  2. Invoice Date: Correct format (DD/MM/YYYY).
  3. Total Amount: Exact match.
  4. Currency: The correct ISO code (EUR, USD, GBP).
  5. VAT/Tax Amount: The exact tax value, not a calculated guess.

If the tool misses one of these, the score is zero.

The Test: When you evaluate a tool, ignore the sales deck. Run the Five Point Test on 20 of your worst invoices.

  • If the tool scores 20/20 on field extraction: Buy it.
  • If it misses the Currency or VAT on more than 2 invoices: Walk away.

Summary

Accuracy is not a feeling. It is a metric. Do not let vendors gaslight you with “high level” accuracy stats.

Meten is weten. Measure the fields that matter. If the robot can’t read the currency symbol, fire the robot.

FAQs

What is the most dangerous OCR error?

Currency. Mistaking GBP for EUR on a large invoice can destroy your budget and reconciliation instantly.

Why do tools struggle with Supplier IDs?

Because 'Amazon' is not a legal entity. 'Amazon EU S.à r.l.' is. The tool needs to match the text to your specific Vendor Contact card.

How do I test a tool's accuracy?

Give it 10 invoices. Count the fields, not the documents. If it misses the Due Date on 5 invoices, that is a 50% fail rate.