data.day

OCR That “Learns” But Still Misses VAT: The Lie We Keep Buying

OCR tools claim to use AI to 'learn' your invoices. Usually, they just learn to make the same mistakes faster. Here is how to test them properly.

If I Have to Fix It, It Is Not Automation

I am tired of software companies selling me “Artificial Intelligence” that has the reading comprehension of a toddler.

You buy an OCR tool to stop typing data. You scan an invoice. The tool reads it. It populates the fields. You feel productive.

Then you look closer. The Total is correct: €121.00. The VAT is wrong: €0.00.

The tool saw “Total” but missed the tiny “incl. 21% VAT” at the bottom. It coded the whole amount as a net expense.

If you post this, you just lost €21 in reclaimable tax. You paid a software subscription fee for the privilege of losing money. This is not automation; this is an expensive way to generate errors.

The Hidden Cost: The “Review Tax”

The sales rep told you the tool has “99% accuracy.” That sounds high.

Let’s do the math on 1,000 invoices.

  • 99% accuracy = 10 errors.
  • If those 10 errors are VAT mistakes on large software bills, you could lose hundreds of Euros a month.

But the real cost is not the money lost; it is the time spent verifying.

If I cannot trust the tool 100%, I have to check every single line. If I have to check every line, why did I buy the tool? I am now paying a human to supervise a robot. That is backwards. The robot should be working for me.

“Learning” features are often a scam. You correct the vendor name once. The tool says “I learned!” Next month, the vendor moves the date to the left side of the page. The tool breaks again.

[TO EDITOR: Draw a flow chart. Top path: “Good OCR” -> “Scan” -> “Post” -> “Sleep”. Bottom path: “Bad OCR” -> “Scan” -> “Error” -> “Human Fix” -> “New Error” -> “Headache”. Mark the bottom path with ‘Money Pit’.]

The ROI: The “Torture Test” Before You Sign

Do not believe the demo. The demo uses perfect, high-contrast PDF invoices. Of course it works.

Before you sign a contract with an automation tool, give them the Folder From Hell.

Gather 20 of your worst receipts:

  1. A crumpled coffee receipt from a pocket.
  2. An Uber invoice where the VAT is hidden in a footer link.
  3. A blurry photo of a parking ticket.
  4. A foreign currency invoice with two different tax rates.

Upload them. Watch the tool choke.

  • Pass: It captures the Date, Total, Vendor, and VAT Split correctly.
  • Fail: It misses the VAT or gets the currency wrong.

If the tool fails more than 2 of these, walk away.

We measure ROI by “Touchless Processing.”

  • If I touch it 0 times: High ROI.
  • If I touch it 1 time: Low ROI.
  • If I touch it 2 times (once to fix, once to verify): Negative ROI.

Conclusion

Accuracy is binary. It is right, or it is wrong. There is no “mostly right” in accounting. “Mostly right” is how you get audited.

Demand tools that actually read. Or save your money and hire a fast typist. At least the typist knows what VAT is.

FAQs

But the software highlights the data in green?

Green means 'I found a number.' It does not mean 'I found the right number.' Never trust the color coding.

Is manual entry better than bad OCR?

Yes. Bad OCR creates a false sense of security. You skim it and miss the error. Manual entry forces you to look.

Why is VAT such a problem for AI?

Because receipts are designed by graphic designers, not accountants. The 'Total' is big. The 'Tax' is tiny. Robots struggle with hierarchy.