February 11, 2025

Enhance your Document Processing workflows with our Asynchronous API

By Abhijit Pujare, Product Mangagement and Mark Lindblad, Engineer

You can now simplify your document processing workloads using the asynchronous version of Aryn DocParse’s API! Using the newly released partition_file_async_submit and partition_file_async_result functions in the aryn-sdk, you can submit many tasks at once and not hold up your workflow waiting for them to complete.

Let’s dive into how you can do this through the Aryn SDK.

Using it through the Aryn SDK

Let’s say you want to parse several invoice documents that you have saved under the ‘/Documents/invoices’ directory. The code below is an example of how you can process all the documents in your directory asynchronously.

First, let's call the partition_file_async_submit function on all the files in our directory and get the corresponding partition_file task IDs back.


# note that code here will require you to tweak the path to a directory 
 # available in your environment.       

  import os
  import time

  from aryn_sdk.partition import partition_file_async_submit, 
  partition_file_async_result


  YOUR_DIRECTORY_NAME = '/Documents/invoices/'
  ## Get a list of all files you are interested in parsing
  files = os.listdir(YOUR_DIRECTORY_NAME)
  task_ids = [None] * len(files)

  ## Iterate through the files to submit a task to partition each one
  ## and create a list of running tasks
  for i, file_name in enumerate(files):
    file_path = f"{YOUR_DIRECTORY_NAME}/{file_name}"
    if os.path.isfile(file_path):
      try:
        task_ids[i] = partition_file_async_submit(full_path)["task_id"]
      except Exception as e:
        print(f"Failed to submit {f}: {e}")

Then in a loop let's call the partition_file_async_result function on each of the tasks to get the result. If partition_file_async_result returns "done" the code adds the result to a list, otherwise it sleeps.

  results = [None] * len(files)

  ## Wait for all tasks to finish
  for i, task_id in enumerate(task_ids):
    while True:
      result = partition_file_async_result(task_id)
      # if particular task is done, break 
      if result["task_status"] != "pending":
        print(f"Task {task_id} done.")
        break
    
      print(f"Waiting for {task_id} to complete.")
      # else sleep
      time.sleep(1)
       
    if result["task_status"] == "done":
      results[i] = result["result"]

  ## print the results will be None if task failed
  for file_name, result in zip(files, results):
    print(file_name,": ", result)

The code above will print the results as follows:

  invoice_1: 
  {
     "type": "Section-header",
     "bbox": [
        0.5366226375804228,
        0.10792368802157315,
        0.8547122730928309,
        0.18332747025923296
      ],
      "properties": {
         "score": 0.4614534378051758,
          "page_number": 1
      },
      "text_representation": "Invoice" 
        ...
  }

Using a webhook

Optionally, you can also set a webhook for Aryn's services to call when your task is completed:

  partition_file_async_submit('/Documents/invoices/invoice_1.pdf', 
  webhook_url="")

Aryn will POST a request containing a body like the below to the webhook URL:

  {"done": [{"task_id": "aryn:t-47gpd3604e5tz79z1jro5fc"}]}

If you want to list all the asynchronous partition_file tasks that are running or queued in your account, you can call the following function:

  partition_file_async_list()
  {'aryn:t-wewbyn5zyh9uxzgghgi5ehf': {'task_status': 'pending'},
   'aryn:t-3kuln0wm0zqex2ks7ue0kvi': {'task_status': 'pending'},
   'aryn:t-o38deeglw3hkl6p939gdyyk': {'task_status': 'pending'},
   'aryn:t-zxzjrwmaifj8ql5ar1zttye': {'task_status': 'pending'},
   'aryn:t-fn6j1sbuhsohrx51r4eom9n': {'task_status': 'pending'},
   'aryn:t-luldbdt5d2kn8cact61mao8': {'task_status': 'pending'}}

If you want to cancel a particular asynchronous partition_file task you can call the following function:

  partition_file_async_cancel("aryn:t-47gpd3604e5tz79z1jro5fc")

Getting Started

To get started visit aryn.ai/get-started to get your API key! You can view the SDK documentation for the asynchronous APIs here and for the HTTP APIs here.

We’d love to hear how you're using DocParse and any feedback you have on the asynchronous APIs. Drop us a note at info@aryn.ai or on Slack.