Create a custom extractor
This tutorial explains how to create a Smart Extractor in Cogniflow to extract any specific text from images, text, or PDF documents.
From the dashboard view, click “Create new experiment”
In this tutorial, we will use an Image smart extractor to extract from images of PDF documents, so click on "Image based". You can use Text if you are integrating Cogniflow and getting the info from an external service like email, Twitter, WhatsApp, etc.

Choose a title and click “Next step”.
In this step, you have to specify the entities that you want to use as extraction criteria. For this example, let’s say we want to extract information from a receipt that you have taken a photo with your cell phone. Click on "Add manually"

Then, in the modal that pops up, you have to specify each field.

- Name: This is the entity or field name you want to extract, and it will be used in the output as the key identifier.
- Description: This is optional, but it could be very helpful to give extra instructions to be more effective in identifying an entity that is not a common type like a date, number, or currency.
- Output format: You can use this to convert an extracted value to a specific format. For example, for dates, you can use “MM/DD/YYYY” or “MM-DD-YYYY,” etc.
In this example, we added receipt_date and total_amount entities, as you can see in the image below:

Click on the button “Next step” and then on “Create and run experiment”.
Once the experiment is created, click the “Use this model” button
Upload an image of a receipt to test the model. Let's imagine you forgot to add the name of seller, so let's add that entity as well. You can use the "Edit entites" or the "Settings" tab.

Add the entity "seller_name" and click save.
Under Settings, you can change the GPT model used. To use GPT-4, you need a Cogniflow's Team plan

Try again on the Test tab with the same receipt, and voilá, the new information has been extracted:

From the dashboard view, click “Create new experiment”
In this tutorial, we will use an Image smart extractor to extract from images of PDF documents, so click on "Image based". You can use Text if you are integrating Cogniflow and getting the info from an external service like email, Twitter, WhatsApp, etc.

Choose a title and click “Next step”.
In this step, you have to specify the entities that you want to use as extraction criteria. For this example, let’s say we want to extract information from a receipt that you have taken a photo with your cell phone. Click on "Add manually"

Then, in the modal that pops up, you have to specify each field.

- Name: This is the entity or field name you want to extract, and it will be used in the output as the key identifier.
- Description: This is optional, but it could be very helpful to give extra instructions to be more effective in identifying an entity that is not a common type like a date, number, or currency.
- Output format: You can use this to convert an extracted value to a specific format. For example, for dates, you can use “MM/DD/YYYY” or “MM-DD-YYYY,” etc.
In this example, we added receipt_date and total_amount entities, as you can see in the image below:

Click on the button “Next step” and then on “Create and run experiment”.
Use your custom extractor
Once the experiment is created, click the “Use this model” button
Upload an image of a receipt to test the model. Let's imagine you forgot to add the name of seller, so let's add that entity as well. You can use the "Edit entites" or the "Settings" tab.

Add the entity "seller_name" and click save.
Under Settings, you can change the GPT model used. To use GPT-4, you need a Cogniflow's Team plan

Try again on the Test tab with the same receipt, and voilá, the new information has been extracted:

Updated on: 18/09/2023
Thank you!