How to transcribe an audio with speaker recognition using Cogniflow API and Webhooks

Cogniflow audio transcription model API and Webhooks

This feature is available from the Starter Plan. If you need more information about our plans, please check the Cogniflow pricing page.

A Long audio transcription takes time to process, so we are introducing a new way of sending transcriptions requests using Webhooks. We process your audio in the background, and when it finishes, we will notify the Webhook you specify so you do not need to wait.

This article will explain how to do this using Python from a developer's perspective. If you are non-technical, a similar approach work using this model from Zapier, Make, Bubble, or any no-code tool that supports Webhooks.

What is speaker diarization?

Speaker diarization is a process that involves identifying and separating different speakers in an audio recording. It is a technique used in speech processing and transcription to determine who speaks at a given time in an audio file. Speaker diarization can be done automatically using algorithms that analyze various features such as pitch, intensity, and timing to differentiate between speakers.

How it works

As any of our AI models endpoint, you need to send a POST request.

Below are the required elements in the body of the request, along with an example of how the JSON request structure would look like:

Request body parameters:

format: The format of the audio file. You can use mp3, wav, flac, ogg, wma and any other audio type supported by ffmpeg.
url: The audio file public url. We support google drive public link also, as well as Youtube urls.
ground_truth: The reference transcription of the audio, if known. This value is used as a reference to calculate the Word Error Rate (WER) in the response.
long_transcription: A boolean value indicating whether long audio transcription should be enabled. Set to true to use our new feature, allowing the transcription service to process longer audio files. If set to false, the 30-second limitation will still be used.
number_of_speakers: The number of speakers in the audio. If the number of speakers in the audio is known in advance, please set this value to help the service perform better speaker identification. If not provided or set to -1, the service will attempt to detect the number of speakers automatically.
webhook_url: The webhook URL to which a notification will be sent when the transcription is ready. This step is only necessary if the long_transcription parameter was set to true.

Here's an example of how a request would look like to transcribe an mp3 audio file using the new endpoint:

import requests

url = 'https://predict.cogniflow.ai/audio/speech2text/transcribe-from-web/511b0ead-c671-4c39-b98d-48408737545d'  # URL of the API endpoint
headers = {
  "accept": "application/json",
  "Content-Type": "application/json",
  "x-api-key": "<YOUR-API-KEY-HERE>"
}
payload = {
  "format": "mp3",
  "url": "https://drive.google.com/file/d/9uC80ICnjYAnbAPIVdOfN-GATRzBQQ0ah/view?usp=sharing",
  "ground_truth": "if we know what has been said in the audio this is the text, it helps to evaluate model performance",
  "long_transcription": True,
  "number_of_speakers": -1,
  "webhook_url": "https://www.your_webhook.com/"  # URL of your webhook
}
response = requests.post(url, headers=headers, json=payload).json()

Once the transcription job is completed, you will receive a Webhook notification, informing you that the results are ready to be retrieved.

Service Response:

Once the data is sent to the service, you will immediately receive a JSON response containing a transcription_id, which can be used in the future to track the transcription process.

{
  "processing_time": 2.35,
  "result": "Hello world",
  "wer": 0.5,
  "transcription_id": "07c85f34-54c4-4fdd-8b9c-b993d1e78574",
  "result_description": "Your audio file is being processed in 07c85f34-54c4-4fdd-8b9c-b993d1e78574 transcription..."
}

Where:

processing_time: The processing time in seconds.
result: The transcription result. Will be empty if it is a long transcript
wer: The Word Error Rate (WER) of the transcription.
transcription_id: The generated transcription ID.
result_description: A description of the transcription status. Once the transcription job is initiated in the background, you will receive the results using the provided webhook.

Obtaining Results at the Webhook

Once the transcription job has been initiated in the background, you can obtain the results using a webhook. The webhook is a URL that you provide to the service when making the transcription request, and it will be used to notify you once the results are available. The service will send a POST request to the webhook URL with the transcription results in the request body.

Once the results are available, you will receive a POST request to your webhook with the following JSON response structure:

{
  "result": [
    {
      "start": "0:00:00",
      "end": "0:00:03",
      "speaker": "SPEAKER 3",
      "text": " Audio 6.5 "
    },
    {
      "start": "0:00:04",
      "end": "0:00:07",
      "speaker": "SPEAKER 1",
      "text": " Tell us more about controlling the weather. "
    },
    {
      "start": "0:00:07",
      "end": "0:00:24",
      "speaker": "SPEAKER 2",
      "text": " Well, scientists have been working on techniques to create or prevent rain for quite some time now.  The best known method is called cloud seeding.  This involves putting chemicals into the air to encourage any water in the air to form clouds and hopefully rain. "
    },
    {
      "start": "0:00:24",
      "end": "0:00:30",
      "speaker": "SPEAKER 1",
      "text": " So, if we can make it rain when we want it to, why do we still have problems with droughts? "
    },
    {
      "start": "0:00:30",
      "end": "0:00:46",
      "speaker": "SPEAKER 2",
      "text": " Well, unfortunately it isn't quite as simple as that.  If there is a drought, there probably won't be any clouds in the sky at all.  The only thing you could do is to do cloud seeding when there are clouds and then save the water for when there is a drought. "
    },
    {
      "start": "0:00:46",
      "end": "0:00:51",
      "speaker": "SPEAKER 1",
      "text": " That could be helpful, I guess. And can it help with storms and hurricanes as well? "
    },
    {
      "start": "0:00:51",
      "end": "0:01:07",
      "speaker": "SPEAKER 2",
      "text": " Yes, hurricanes form in warm tropical waters. That's why global warming is having an impact.  As the seas get warmer, there are likely to be more hurricanes.  But it seems possible that we could use cloud seeding to cool the seas down. "
    },
    {
      "start": "0:01:07",
      "end": "0:01:20",
      "speaker": "SPEAKER 1",
      "text": " That sounds incredible, but is it actually a good idea to try and change the weather?  I mean, what about putting chemicals into the atmosphere? That can't be a good idea, can it? "
    },
    {
      "start": "0:01:20",
      "end": "0:01:46",
      "speaker": "SPEAKER 2",
      "text": " Well, this is one of the things we need to find out.  There is some concern that creating rain in one area of the world might take it away from somewhere else.  But in terms of the chemicals, it seems that one group of scientists have found a solution.  Professor Jean-Pierre Wolfe and Dr. Jerome Casparian at the University of Geneva have been experimenting with using lasers to control the weather. "
    },
    {
      "start": "0:01:46",
      "end": "0:01:47",
      "speaker": "SPEAKER 1",
      "text": " Lasers? "
    },
    {
      "start": "0:01:47",
      "end": "0:02:01",
      "speaker": "SPEAKER 2",
      "text": " Their experiments have shown that pulses of light from a laser can be used to make rain clouds without using any chemicals.  They also think that lasers can be used to direct storms away from certain buildings, such as airports. "
    },
    {
      "start": "0:02:01",
      "end": "0:02:09",
      "speaker": "SPEAKER 1",
      "text": " Wow, that is quite amazing. I still feel that perhaps we shouldn't be playing with the weather like this. "
    },
    {
      "start": "0:02:09",
      "end": "0:02:17",
      "speaker": "SPEAKER 2",
      "text": " Yes, a lot of people would agree with you, but you've got to remember that we have been changing the weather for a long time anyway through global warming.  This type of technology is nothing compared with that, and it could be helpful rather than harmful. "
    }
  ],
  "detected_language": "en",
  "duration": 146.7,
  "complete_text": " Audio 6.5 Tell us more about controlling the weather. Well, scientists have been working on techniques to create or prevent rain for quite some time now. The best known method is called cloud seeding. This involves putting chemicals into the air to encourage any water in the air to form clouds and hopefully rain. So, if we can make it rain when we want it to, why do we still have problems with droughts? Well, unfortunately it isn't quite as simple as that. If there is a drought, there probably won't be any clouds in the sky at all. The only thing you could do is to do cloud seeding when there are clouds and then save the water for when there is a drought. That could be helpful, I guess. And can it help with storms and hurricanes as well? Yes, hurricanes form in warm tropical waters. That's why global warming is having an impact. As the seas get warmer, there are likely to be more hurricanes. But it seems possible that we could use cloud seeding to cool the seas down. That sounds incredible, but is it actually a good idea to try and change the weather? I mean, what about putting chemicals into the atmosphere? That can't be a good idea, can it? Well, this is one of the things we need to find out. There is some concern that creating rain in one area of the world might take it away from somewhere else. But in terms of the chemicals, it seems that one group of scientists have found a solution. Professor Jean-Pierre Wolfe and Dr. Jerome Casparian at the University of Geneva have been experimenting with using lasers to control the weather. Lasers? Their experiments have shown that pulses of light from a laser can be used to make rain clouds without using any chemicals. They also think that lasers can be used to direct storms away from certain buildings, such as airports. Wow, that is quite amazing. I still feel that perhaps we shouldn't be playing with the weather like this. Yes, a lot of people would agree with you, but you've got to remember that we have been changing the weather for a long time anyway through global warming. This type of technology is nothing compared with that, and it could be helpful rather than harmful.",
  "audio_url": "https://tmpfiles.org/dl/1726889/t_6_5.mp3",
  "transcription_id": "a8bfb006-473b-4eb3-8c9b-3889efcdc58b",
  "message": "Audio converted to wav, 16 KHz sampling rate, one channel",
  "complete_text_with_speakers": "SPEAKER 3\n Audio 6.5 \nSPEAKER 1\n Tell us more about controlling the weather. \nSPEAKER 2\n Well, scientists have been working on techniques to create or prevent rain for quite some time now.  The best known method is called cloud seeding.  This involves putting chemicals into the air to encourage any water in the air to form clouds and hopefully rain. \nSPEAKER 1\n So, if we can make it rain when we want it to, why do we still have problems with droughts? \nSPEAKER 2\n Well, unfortunately it isn't quite as simple as that.  If there is a drought, there probably won't be any clouds in the sky at all.  The only thing you could do is to do cloud seeding when there are clouds and then save the water for when there is a drought. \nSPEAKER 1\n That could be helpful, I guess. And can it help with storms and hurricanes as well? \nSPEAKER 2\n Yes, hurricanes form in warm tropical waters. That's why global warming is having an impact.  As the seas get warmer, there are likely to be more hurricanes.  But it seems possible that we could use cloud seeding to cool the seas down. \nSPEAKER 1\n That sounds incredible, but is it actually a good idea to try and change the weather?  I mean, what about putting chemicals into the atmosphere? That can't be a good idea, can it? \nSPEAKER 2\n Well, this is one of the things we need to find out.  There is some concern that creating rain in one area of the world might take it away from somewhere else.  But in terms of the chemicals, it seems that one group of scientists have found a solution.  Professor Jean-Pierre Wolfe and Dr. Jerome Casparian at the University of Geneva have been experimenting with using lasers to control the weather. \nSPEAKER 1\n Lasers? \nSPEAKER 2\n Their experiments have shown that pulses of light from a laser can be used to make rain clouds without using any chemicals.  They also think that lasers can be used to direct storms away from certain buildings, such as airports. \nSPEAKER 1\n Wow, that is quite amazing. I still feel that perhaps we shouldn't be playing with the weather like this. \nSPEAKER 2\n Yes, a lot of people would agree with you, but you've got to remember that we have been changing the weather for a long time anyway through global warming.  This type of technology is nothing compared with that, and it could be helpful rather than harmful. \n",
  "transcription_start": "0:00:00",
  "transcription_end": "0:02:17"
}

Where:

transcription_id: A unique identifier representing the ID of the transcription job. This identifier can be used to query the status and retrieve the results of the transcription in future requests.
detected_language: The language of the transcription, which in this case is en for English.
duration: The total duration of the audio file in seconds.
complete_text: The complete transcription of the audio content.
complete_text_with_speakers: The complete transcription of the audio content including each identified speaker at the start of each segment.
audio_url: The URL of the uploaded audio file.
message: A message providing additional information about the transcription, such as the audio file format and sampling rate.
transcription_start: The start time of the transcription in the format hh:mm:ss.
transcription_end: The end time of the transcription in the format hh:mm:ss.
result: An array of objects representing the transcribed segments of the audio content. Each object in the array includes the start and end times of the segment, the speaker information and the transcribed text of the segment.

In summary, this JSON response from the audio transcription service contains detailed information about a completed transcription, including the transcription ID, language, audio duration, and a list of speech segments with speaker identifiers, start and end times, and transcribed text for each segment.

Upcoming Endpoints

For those who find it inconvenient to set up a webhook to listen for result notifications, we are working on two new endpoints in our API that will allow you to query the status of a transcription and retrieve the results once they are ready.

The first endpoint will enable you to query the status of a particular transcription by providing the ID generated when creating or queuing it. The response will include information about the current status of the transcription, such as whether it is in progress, completed, or if there has been any error.

The second endpoint will allow you to retrieve the results of a particular transcription by also providing the ID of the previously generated transcription. The response will include the complete transcription of the audio, as well as other details such as processing time, word error rate (WER), and a description of the result.

Updated on: 14/07/2023

Was this article helpful?

Thank you!