How to Create a Dataset for an Audio Classification Experiment

This tutorial shows how to create a dataset for an audio classification experiment in Cogniflow.

In this example, a two-class experiment is created to train models capable of recognizing whether an audio belongs to a dog or a cat. So dog_bark and cat_meow are the two classes.

1. Create a Folder for Each Audio Class

On your computer or on any equipment, you have your audios, create a folder named the same way you want to identify its class or type. So in this case two folders are created: dog_bark and cat_meow.

After all folders are created, move your audios to each corresponding folder. If the audio is a cat, it goes to cat_meow, if it is a dog, it goes to dog_bark.

We recommend to use audio files of short duration between 1 and 15 seconds.
You can use mp3, wav, flac, ogg, wma and any other audio type supported by ffmpeg.

2. Create the ZIP File

Select all folders and compress them into a single ZIP file. Now the dataset is ready to be uploaded to Cogniflow.

3. Upload your Dataset

When creating your experiment you will reach the step when a dataset has to be uploaded. Click on "Browse your files" and upload the ZIP file generated before. Cogniflow will automatically split your dataset into train and validation subsets, with 80% of the data for training and 20% for validation. The split is done randomly.

If you prefer, you also can upload a ZIP file containing your already generated validation subset by clicking on "Advanced options" and again "Browse your files". This way Cogniflow is not going to run the dataset split. This validation subset is created the same way as explained in 1.

When the file/files upload is complete, click on "Next step".

4. Check if Everything is OK

After uploading your dataset, the training process is started. You can double-check if the dataset was correctly created, uploaded, and split by clicking on the "Dataset" tab. Here you can see how much data there is for each category and subset and how it is distributed. Also, it is possible to download the data.

5. How Many Examples do I Need?

We recommend at least between 300-500 audios per category. The more examples, the more accurate the model will later be.

Example Datasets

Environmental Sounds Classification: Train a model capable of a recognizing environmental sounds. The dataset has sixteen classes, from birds singing to a helicopter flying, trough sirens, clock ticking or people sneezing. Each of the sixteen classes has 32 examples, achieving a total of 512 sounds to train.
Was this article helpful?
Thank you!