Performance & Accuracy | ScanToExcel — Measured on Real Documents

How accurate is ScanToExcel?

We measure ScanToExcel against a fixture set of 3000 real documents — invoices, receipts, bank and credit-card statements, paystubs and image tables — comparing every extracted field to a hand-checked answer key. The numbers below are the actual results from our latest evaluation run, not marketing claims.

Headline numbers: We score per field with F1, precision and recall (industry-standard accuracy metrics for structured OCR). On this page we surface F1 as a single "accuracy" percentage to keep things readable. p50 / p95 are median and 95th-percentile end-to-end conversion times.

Accuracy by document type

Each document type is scored across all of its fields. "Accuracy" is the F1 score expressed as a percentage — F1 combines how often we say something correctly (precision) and how often we catch what's actually there (recall). Speed is end-to-end, including upload and AI processing.

Document type	Fixtures	Accuracy	Precision	Recall	Median (p50)	Slow case (p95)
Invoices	600	99.5%	0.993	0.997	3.3 s	5.1 s
Image tables	200	99.2%	0.992	0.992	2.6 s	7.0 s
Credit-card statements	500	98.8%	0.989	0.987	6.4 s	8.8 s
Bank statements	650	98.7%	0.989	0.985	10.2 s	17.2 s
Receipts	600	97.3%	0.959	0.988	2.3 s	3.1 s
Paystubs	450	94.6%	0.952	0.940	5.5 s	8.7 s

These numbers come from our internal test set on this date — they describe how the engine performed on those fixtures, not a guaranteed result for any specific document you upload. Real-world accuracy varies with image quality, layout, language, and document format.

Documents extracted perfectly

Field-level F1 tells you about average correctness, but not how often a document came out fully correct end to end. So we also track that. Below is the share of fixtures where every evaluated field matched the hand-checked answer key — i.e. the document came back ready to use, with nothing to fix.

Across all 3,000 fixtures, 94.8% came back with every field correct. Most uploads in our test set come back ready to use — but a minority still need a quick review.

Document type	Fixtures	Fully correct
Invoices	600	96.2%
Image tables	200	95.9%
Credit-card statements	500	95.7%
Bank statements	650	95.2%
Receipts	600	93.9%
Paystubs	450	92.1%

These are still test-set numbers, not a guarantee for any specific upload — the same caveats from the table above apply.

Held-out validation

Beyond the main fixture set, we keep a small held-out set the model has never seen during development. Numbers in parentheses are the number of fixtures.

Document type	Main set	Held-out set	Δ
Invoices	99.5% (n=600)	97.8% (n=150)	−0.017
Image tables	99.2% (n=200)	96.7% (n=50)	−0.025
Credit-card statements	98.8% (n=500)	99.9% (n=130)	+0.011
Bank statements	98.7% (n=650)	99.8% (n=160)	+0.011
Receipts	97.3% (n=600)	98.6% (n=150)	+0.013
Paystubs	94.6% (n=450)	95.3% (n=110)	+0.007

Receipts, bank statements, credit-card statements and paystubs actually score higher on the unseen set — strong evidence the model generalises rather than memorises. Invoices and image tables score a bit lower on the held-out set, which is honest signal that the small held-out set has tougher edge cases (and that we still have headroom on those types).

What we score

Every fixture is scored against a hand-checked answer key, field by field. These are the fields evaluated for each document type.

Invoices (16 fields)

issue date, due date, currency, vendor info, customer info, line items, subtotal, tax, tax-inclusive flag, tax rows, line tax fields, total and withholdings.

Image tables (5 fields)

header rows, header cell, row identification, row type and row cell.

Credit-card statements (20 fields)

card network, holder, statement period, payment due date, minimum payment due, credit limit, available credit, summary purchases, summary payments, summary fees, summary interest, opening and closing balances, transactions, posted date, transaction type, transaction amount and currency.

Bank statements (16 fields)

account holder, account currency, statement period, opening and closing balances, interest paid, fees charged, transactions, posted date, transaction type, transaction amount, running balance, credit total and debit total.

Receipts (11 fields)

date, currency, vendor, line items, subtotal, tax, tax row breakdown, tip, total and payment details.

Paystubs (40 fields)

employee, employer, country, currency, pay period, base pay rate and unit, annual salary, current net pay, summary YTD totals (gross, taxes, net), and per-row earnings, taxes, deductions and employer contributions (each with hours, rate, current amount and YTD).

How we measure

Real documents — never yours

Our 3000-fixture set is a mix of publicly available documents (open OCR datasets and public samples), data we have licensed and purchased, and synthetic documents we generate ourselves to stress-test edge cases. What is never in there: your files. Documents you upload to ScanToExcel are processed in-memory and deleted the moment your download is ready — they are never saved, never used to train models, and never added to this benchmark.

Hand-checked answer keys

Every fixture has a hand-checked ground-truth answer key. Each output field from the model is compared field by field.

F1, precision, recall — at the field level

We score every field as a true positive, false positive or false negative. F1 (the harmonic mean of precision and recall) is reported as the headline accuracy. Industry-standard for structured OCR.

Held-out validation set

A small held-out set is kept aside that the model has never seen during development, to spot overfitting. Numbers are published above.

End-to-end speed

p50 and p95 are wall-clock times from the moment the file is uploaded to the moment the output file is ready, including AI processing.

Continuously re-evaluated

Every model and prompt change is re-run against the full fixture set before shipping. Numbers on this page reflect the production model on 2026-05-05.

Frequently Asked Questions

Do you use my uploads to train AI or to grow this benchmark?

No. Files you upload to ScanToExcel are processed in-memory and permanently discarded the moment your download is ready. We never store them, train on them, or add them to the evaluation set behind these numbers. The 3000 fixtures here come from public datasets and documents we own — not from user uploads. See our Privacy Policy for the full statement.

How accurate is ScanToExcel?

ScanToExcel achieves 98.0% weighted accuracy across 3000 real-world documents spanning six document types. Invoices score 99.5%, image tables 99.2%, credit-card statements 98.8%, receipts 97.3%, bank statements 98.7% and paystubs 94.6%.

Which document type is most accurate?

Invoices, at 99.5% F1 across 600 real fixtures. Image tables (99.2%) and credit-card statements (98.8%) are close behind.

Which document type is least accurate?

Paystubs, at 94.6% F1 — primarily because paystubs have the largest schema (40 fields per fixture) and the most variation between employer layouts. Headers, totals and line items still score above 94%; the weak spots are pay-rate parsing and SALARY-vs-HOURLY classification.

How fast is a single conversion?

Most documents convert in 2 – 7 seconds. Receipts and image tables are fastest (median 2.3 – 2.6 s); bank statements take longest (median 10.2 s, 95th percentile 17.2 s) because they are the longest documents.

What is F1 score?

F1 is the harmonic mean of precision and recall. Precision is "of what we returned, how much was right" and recall is "of what was actually there, how much did we catch". F1 captures both in a single number — and it is the industry standard for evaluating structured OCR.

Do you test on real documents?

Yes — and to be clear, none of them are user uploads. The set is mostly real receipts, invoices, statements, and paystubs from public datasets and documents we own or have licensed, plus a smaller portion of synthetic documents we generate ourselves to stress-test edge cases like unusual layouts, low-quality scans, and tricky totals.

Are the numbers on this page real?

Yes. They come straight from the latest evaluation run against our 3000-fixture set. The model and prompts are re-evaluated against the full set before every release.

Try it on your own documents

Numbers are useful, but the only benchmark that matters is your documents. Upload a file — no signup needed for free conversions — and see how it does.

Start a free conversion