top of page
The Aryn Team

New and Improved OCR

By: Karan Sampath and Abhijit Pujare


We’ve made a major update to the Aryn Partitioning Service (APS)! We’ve significantly improved the Optical Character Recognition (OCR) capabilities of APS, with our latest release showcasing a 30x lower error rate than our previous offering on standard benchmarks. Whether you are processing complex documents for LLM-based RAG applications or invoices for business process workflows, it’s imperative to ensure that your OCR technology is extracting information correctly.


We conducted a comparison of our latest model against its predecessor on the Invoices and Receipts dataset, and the results have greatly improved. Our latest model shows a 20x improvement on character error rate** for text in tables and nearly 40x improvement on character error rate for regular text.


Improvement Example

Take the following table embedded within a 10k filing:



Here is the output from APS :




As you can see, several characters that are traditionally difficult for OCR solutions to detect (such as ‘$’, ‘%’ and ‘) are correctly identified. Moreover, all the alphanumeric characters are detected correctly (we avoid several of the common mistakes such as detecting a ‘1’ as an ‘L’ or a 5 as an ‘S’).


Speed

In addition to the accuracy improvements shown above, we’ve also improved the speed of the APS. The latest release of APS is on average 3.5x faster on document processing workloads due to our new OCR capabilities and server architecture improvements.


Get Started Today!

Try out our latest OCR capabilities through the Aryn Partitioning Service - all you need is an API key to get started (sign up here for free). You can access the cloud service through the Aryn Playground, Aryn SDK or through a Sycamore script.


Email us: info@aryn.ai

Join our Slack channel!



**Character error rate (CER): Measures the normalized number of character errors in the text. Can be > 1.0





bottom of page