Articles on: Tutorials

Create a custom extractor

This tutorial explains how to create a Smart Extractor in Cogniflow to extract any specific text from images, text, or PDF documents.

From the dashboard view, click “Create new experiment”
In this tutorial, we will use an Image smart extractor to extract from images of PDF documents, so click on "Image based". You can use Text if you are integrating Cogniflow and getting the info from an external service like email, Twitter, WhatsApp, etc.

Image Smart Extractor
Choose a title and click “Next step”.
In this step, you have to specify the entities that you want to use as extraction criteria. For this example, let’s say we want to extract information from a receipt that you have taken a photo with your cell phone. Click on "Add manually"

Add Entity
Then, in the modal that pops up, you have to specify each field.

Define entity

- Name: This is the entity or field name you want to extract, and it will be used in the output as the key identifier.
- Description: This is optional, but it could be very helpful to give extra instructions to be more effective in identifying an entity that is not a common type like a date, number, or currency.
- Output format: You can use this to convert an extracted value to a specific format. For example, for dates, you can use “MM/DD/YYYY” or “MM-DD-YYYY,” etc.

In this example, we added receipt_date and total_amount entities, as you can see in the image below:

Initial entities definition

Click on the button “Next step” and then on “Create and run experiment”.

Use your custom extractor

Once the experiment is created, click the “Use this model” button
Upload an image of a receipt to test the model. Let's imagine you forgot to add the name of seller, so let's add that entity as well. You can use the "Edit entites" or the "Settings" tab.

Edit entities from test page

Add the entity "seller_name" and click save.

Under Settings, you can change the GPT model used. To use GPT-4, you need a Cogniflow's Team plan

Settings Page

Try again on the Test tab with the same receipt, and voilá, the new information has been extracted:

Final Extraction

Updated on: 18/09/2023

Was this article helpful?

Share your feedback


Thank you!