python - How to classify continuous audio

Question

Welcome To Ask or Share your Answers For Others

python - How to classify continuous audio

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to classify continuous audio

I have a audio data set and each of them has different length. There are some events in these audios, that I want to train and test but these events are placed randomly, plus the lengths are different, it is really hard to build a machine learning system with using that dataset. I thought fixing a default size of length and build a multilayer NN however, the length's of events are also different. Then I thought about using CNN, like it is used to recognise patterns or multiple humans on an image. The problem for that one is I am really struggling when I try to understand the audio file.

So, my questions, Is there anyone who can give me some tips about building a machine learning system that classifies different types of defined events with training itself on a dataset that has these events randomly(1 data contains more than 1 events and they are different from each other.) and each of them has different lenghts?

I will be so appreciated if anyone helps.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:59:18+0000

First, you need to annotate your events in the sound streams, i.e. specify bounds and labels for them.

Then, convert your sounds into sequences of feature vectors using signal framing. Typical choices are MFCCs or log-mel filtebank features (the latter corresponds to a spectrogram of a sound). Having done this, you will convert your sounds into sequences of fixed-size feature vectors that can be fed into a classifier. See this. for better explanation.

Since typical sounds have a longer duration than an analysis frame, you probably need to stack several contiguous feature vectors using sliding window and use these stacked frames as input to your NN.

Now you have a) input data and b) annotations for each window of analysis. So, you can try to train a DNN or a CNN or a RNN to predict a sound class for each window. This task is known as spotting. I suggest you to read Sainath, T. N., & Parada, C. (2015). Convolutional Neural Networks for Small-footprint Keyword Spotting. In Proceedings INTERSPEECH (pp. 1478–1482) and to follow its references for more details.

Categories

python - How to classify continuous audio

python - How to classify continuous audio

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags