The Fix: PDF Invoice Capture Pipeline (Save, OCR, Extract, Log)
Invoices arrive as unstructured PDFs and become tedious labor. Here is the exact 4-step pipeline I use to turn a PDF into a database row without typing.
The “Files to Rows” Problem
Invoices are useful data trapped in a useless format (PDF). To a computer, a PDF is just a picture. It doesn’t know that “$500” is a number. It just sees pixels.
So we bridge the gap with humans. We pay smart people to act as “Middleware,” translating pixels into database rows.
The Friction: The Black Box of Spend
When invoices live in email inboxes or local folders, you have no visibility.
- You cannot sum the total spend for the month.
- You cannot see which vendors are raising prices.
- You cannot search by date.
You have to open every single file to answer a simple question.
The Flow: Save, OCR, Extract, Log
I built a standardized pipeline using Power Automate. This is my “Monster Hack” for Finance teams. It runs silently in the background.
Step 1: Save (The Capture)
- Trigger: Email arrives in
[email protected]. - Action: Save Attachment to SharePoint Library
/Invoices/Raw. - Why: We never lose the source document. It is our audit trail.
Step 2: OCR (The Reader)
- Action: Run “Extract information from invoices” (AI Builder).
- Input: The file content from Step 1.
- Why: This unlocks the data.
Step 3: Extract (The Selection)
We define what we care about. I usually grab four things:
Vendor NameInvoice DateInvoice TotalInvoice ID
Step 4: Log (The Database)
- Action: Add Row into Excel Table (or SharePoint List).
- Mapping:
- Column A =
Vendor Name - Column B =
Invoice Total - Column C =
Link to File(This is crucial. One click to see the proof).
- Column A =
The Result
Now, the Finance Manager opens a spreadsheet.
- It is always up to date.
- It has a “Total” at the bottom.
- There are no typos.
If they need to see the original PDF, they click the link in Column C. If they just need the numbers, they have them.
Data should flow like water. Don’t build dams with manual entry.
FAQs
What do I do with the original PDF?
We save it to a secure folder with a standardized name. It is the evidence linked to the data row.
My invoices all look different.
Start with the 'General Document' model. It is surprisingly good at finding 'Total Amount' even on layouts it hasn't seen.
Can it handle multiple currencies?
Yes, it extracts the symbol (€, $, £). You can add a logic step to convert it if needed.