top of page

AI-powered document ETL
with a few lines of code.

Accurately extract and transform tables, images, and text for LLM-based apps, RAG frameworks, and vector databases. 

Welcome

Up to 6x more accurate, 5x faster, 5x cheaper 

PDF partitioning
PDF to JSON
PDF partitioning
PDF partitioning

1. Use the Aryn Partitioning Service to easily chunk and extract data from your documents into structured JSON.

2. Take the JSON output and run additional ETL steps with the open source Aryn Sycamore processing engine. Load your vector databases with high-quality data.

Try your own doc in the Aryn Playground

Chunk, embed, and load your data 

table extraction

Aryn

Partitioning

Service

Doc to JSON

Sycamore

Engine

ETL for Docs

JSON

{...}

Pinecone logo
OpenSearch logo
ElasticSearch logo
Weaviate logo
DuckDB logo

Why Aryn?

Higher quality chunking

The Aryn Partitioning Service is up to 6x more accurate and 5x faster than alternatives. 

Structure and extract data from PDFs, HTML, presentations and more using purpose-built AI models. Tackle complex documents with tables, images, text, graphs, and infographics.

Use declarative dataflows

Sycamore, Aryn's data processing engine, uses a declarative abstraction called a DocSet. It's like an Apache Spark DataFrame, but for collections of unstructured documents. Sycamore uses LLM-based transforms to enrich, clean, and transform your data.

Reliably load
vector databases

Easily load vector databases and hybrid search engines, such as Pinecone, OpenSearch, Weaviate, Elasticsearch, and DuckDB. Choose your vector embedding model, and use Sycamore's embed and load functions to reliably and scalably add data to your indexes.

Open source and 
cloud native

The Sycamore data processing engine is 100% open source (Apache License v2.0) with no lock-in. Highly customizable with your choice of AI models, prompts, and UDFs. The Aryn Partitioning Service is a serverless endpoint, with its

base AI model on Hugging Face.

Use cases

Developers use Aryn in financial services, healthcare, manufacturing, eCommerce, and customer support. 

Research and discovery

Prepare data for apps that enable analysts and researchers to ask hard questions on complex documents that include tables, infographics, and complicated layouts. Discover and use critical information that would otherwise be missed.

Reporting on unstructured data feeds

Create structured reports from unstructured data to answer key business questions. Run scheduled pipelines that extract, enrich, and store information from diverse datasets, such as Salesforce data, health records, or contracts.

Technical knowledge bases

Empower technical knowledge workers with AI-assistants by processing manuals, technical documents, installation guides, and catalogs for RAG systems. Answer technical questions and find information from properly chunked data.

Customer support

Deliver high-quality data to co-pilots to empower customer support teams, healthcare professionals, or empower customers to directly query knowledge bases, support tickets, FAQs, healthcare records, and other info sources. 

Installation

Installing the SDK for the Partitioning Service and the Sycamore library is quick and simple. Learn more

​

bottom of page