First, observe that your code plots up to 100 spectrograms (if processBlock
is called multiple times) on top of each other and you only see the last one. You may want to fix that. Furthermore, I assume you know why you want to work with 30ms audio recordings. Personally, I can't think of a practical application where 30ms recorded by a laptop microphone could give interesting insights. It hinges on what you are recording and how you trigger the recording, but this issue is tangential to the actual question.
Otherwise the code works perfectly. With just a few small changes in the processBlock
function, applying some background knowledge, you can get informative and aesthetic spectrograms.
So let's talk about actual spectrograms. I'll take the SoX output as reference. The colorbar annotation says that it is dBFS
1, which is a logarithmic measure (dB is short for Decibel). So, let's first convert the spectrogram to dB:
f, t, Sxx = signal.spectrogram(snd_block, RATE)
dBS = 10 * np.log10(Sxx) # convert to dB
plt.pcolormesh(t, f, dBS)
This improved the color scale. Now we see noise in the higher frequency bands that was hidden before. Next, let's tackle time resolution. The spectrogram divides the signal into segments (default length is 256) and computes the spectrum for each. This means we have excellent frequency resolution but very poor time resolution because only a few such segments fit into the signal window (which is about 1300 samples long). There is always a trade-off between time and frequency resolution. This is related to the uncertainty principle. So let's trade some frequency resolution for time resolution by splitting the signal into shorter segments:
f, t, Sxx = signal.spectrogram(snd_block, RATE, nperseg=64)
Great! Now we got a relatively balanced resolution on both axes - but wait! Why is the result so pixelated?! Actually, this is all the information there is in the short 30ms time window. There are only so many ways 1300 samples can be distributed in two dimensions. However, we can cheat a bit and use higher FFT resolution and overlapping segments. This makes the result smoother although it does not provide additional information:
f, t, Sxx = signal.spectrogram(snd_block, RATE, nperseg=64, nfft=256, noverlap=60)
Behold pretty spectral interference patterns. (These patterns depend on the window function used, but let's not get caught in details, here. See the window
argument of the spectrogram function to play with these.) The result looks nice, but actually does not contain any more information than the previous image.
To make the result more SoX-lixe observe that the SoX spectrogram is rather smeared on the time axis. You get this effect by using the original low time resolution (long segments) but let them overlap for smoothness:
f, t, Sxx = signal.spectrogram(snd_block, RATE, noverlap=250)
I personally prefer the 3rd solution, but you will need to find your own preferred time/frequency trade-off.
Finally, let's use a colormap that is more like SoX's:
plt.pcolormesh(t, f, dBS, cmap='inferno')
A short comment on the following line:
THRESHOLD = 40 # dB
The threshold is compared against the RMS of the input signal, which is not measured in dB but raw amplitude units.
1 Apparently FS is short for full scale. dBFS means that the dB measure is relative to the maximum range. 0 dB is the loudest signal possible in the current representation, so actual values must be <= 0 dB.