February 26, 2025

Summarize images in documents with DocParse's new GenAI feature

By Karan Sampath, Engineer, and Jon Fritz, CPO

When processing documents containing images, it's often useful to add metadata about that image for downstream processes to use. For instance, if the end target for your data is a search or RAG application powered by vector search, your vector embedding model might only take text as an input. If so, the images in your document would need a description in text to supply to the embedding model, so the image could be retrieved in the vector search. But, how do you create and add a summary to each image? We've got an answer.

We're excited to add image summarization as a new option in DocParse for Pay As You Go (PAYG) customers, making it easy to enrich your images with text descriptions! When enabled, DocParse will use GenAI to create a summary of each image in your document, and store that summary in the text representation field in the element for the image. This can range from descriptions of images, charts, graphs, and more. You can then use these summaries with downstream applications and vector embedding models.

Let's go through examples enabling image summarization using the DocParse Playground and Aryn SDK! 

For the DocParse UI, go to the Aryn Console. Go to DocParse in the left nav, and then add a document. Enable "Summarize Images" along with the other DocParse settings you need for your document. Note that you do not need to extract images, which stores the binary of the image in the DocParse output, to use image summarization, which creates and stores the LLM-generated summary in the DocParse output.

Next, click "Parse Document" to process the document. For the document in this example, DocParse has segmented and labeled three of the document chunks (the bar graph, line graph, and map) as images.

We can download the JSON output and view the summaries DocParse created for our images. There is a button on the right panel to do so. Next, open the JSON output in a text editor, and find an element that is type Image. DocParse includes the summary of the image in the text_representation field for that element.

This part of the DocParse output the image summary of the image of the bar graph in the top left of the document.

You can also enable image summarization in the Aryn SDK when processing documents:

file = path/to/document.pdf

partitioned_file = partition_file(file, aryn_api_key, 
	extract_table_structure=True, use_ocr=True, 
	summarize_images=True)

We're excited to see how customers integrate image summarization into their document processing pipelines. If you have any questions or feedback, please contact us on Slack!