February 11, 2025
Enhance your Document Processing workflows with our Asynchronous API
You can now simplify your document processing workloads using the asynchronous version of Aryn DocParse’s API! Using the newly released partition_file_async_submit and partition_file_async_result functions in the aryn-sdk, you can submit many tasks at once and not hold up your workflow waiting for them to complete.
Let’s dive into how you can do this through the Aryn SDK.
Using it through the Aryn SDK
Let’s say you want to parse several invoice documents that you have saved under the ‘/Documents/invoices’ directory. The code below is an example of how you can process all the documents in your directory asynchronously.
First, let's call the partition_file_async_submit function on all the files in our directory and get the corresponding partition_file task IDs back.
# note that code here will require you to tweak the path to a directory
# available in your environment.
import os
import time
from aryn_sdk.partition import partition_file_async_submit,
partition_file_async_result
YOUR_DIRECTORY_NAME = '/Documents/invoices/'
## Get a list of all files you are interested in parsing
files = os.listdir(YOUR_DIRECTORY_NAME)
task_ids = [None] * len(files)
## Iterate through the files to submit a task to partition each one
## and create a list of running tasks
for i, file_name in enumerate(files):
file_path = f"{YOUR_DIRECTORY_NAME}/{file_name}"
if os.path.isfile(file_path):
try:
task_ids[i] = partition_file_async_submit(full_path)["task_id"]
except Exception as e:
print(f"Failed to submit {f}: {e}")
Then in a loop let's call the partition_file_async_result function on each of the tasks to get the result. If partition_file_async_result returns "done" the code adds the result to a list, otherwise it sleeps.
results = [None] * len(files)
## Wait for all tasks to finish
for i, task_id in enumerate(task_ids):
while True:
result = partition_file_async_result(task_id)
# if particular task is done, break
if result["task_status"] != "pending":
print(f"Task {task_id} done.")
break
print(f"Waiting for {task_id} to complete.")
# else sleep
time.sleep(1)
if result["task_status"] == "done":
results[i] = result["result"]
## print the results will be None if task failed
for file_name, result in zip(files, results):
print(file_name,": ", result)
The code above will print the results as follows:
invoice_1:
{
"type": "Section-header",
"bbox": [
0.5366226375804228,
0.10792368802157315,
0.8547122730928309,
0.18332747025923296
],
"properties": {
"score": 0.4614534378051758,
"page_number": 1
},
"text_representation": "Invoice"
...
}
Using a webhook
Optionally, you can also set a webhook for Aryn's services to call when your task is completed:
partition_file_async_submit('/Documents/invoices/invoice_1.pdf',
webhook_url="")
Aryn will POST a request containing a body like the below to the webhook URL:
{"done": [{"task_id": "aryn:t-47gpd3604e5tz79z1jro5fc"}]}
If you want to list all the asynchronous partition_file tasks that are running or queued in your account, you can call the following function:
partition_file_async_list()
{'aryn:t-wewbyn5zyh9uxzgghgi5ehf': {'task_status': 'pending'},
'aryn:t-3kuln0wm0zqex2ks7ue0kvi': {'task_status': 'pending'},
'aryn:t-o38deeglw3hkl6p939gdyyk': {'task_status': 'pending'},
'aryn:t-zxzjrwmaifj8ql5ar1zttye': {'task_status': 'pending'},
'aryn:t-fn6j1sbuhsohrx51r4eom9n': {'task_status': 'pending'},
'aryn:t-luldbdt5d2kn8cact61mao8': {'task_status': 'pending'}}
If you want to cancel a particular asynchronous partition_file task you can call the following function:
partition_file_async_cancel("aryn:t-47gpd3604e5tz79z1jro5fc")
Getting Started
To get started visit aryn.ai/get-started to get your API key! You can view the SDK documentation for the asynchronous APIs here and for the HTTP APIs here.
We’d love to hear how you're using DocParse and any feedback you have on the asynchronous APIs. Drop us a note at info@aryn.ai or on Slack.