Indeed, if the frame lasts T
seconds, the frequencies of the DFT are k/T
Hz, where k is an integer. As a consequence, oversampling does not improve the accuracy of the estimated frequency, as long as these frequencies are identifed as maxima of the magnitude of the DFT. On the contrary, considering longer frames lasting 100s would induce a spacing between the DFT frequencies of 0.01Hz, which might be good enough to produce the expected frequency. It is possible to due much better, by estimating the frequency of a peak as its mean frequency wih respect to power density.
Figure 1: even after applying a Tuckey window, the DFT of the windowed signal is not a sum of Dirac: there is still some spectral leakage at the bottom of the peaks. This power must be accounted for as the frequencies are estimated.
Another issue is that the length of the frame is not a multiple of the period of the signal, which may not be periodic anyway. Nevertheless, the DFT is computed as if the signal were periodic but discontinuous at the edge of the frame. It induce spurous frequencies described as spectral leakage. Windowing is the reference method to deal with such problems and mitigate the problem related to the artificial discontinuity. Indeed, the value of a window continuously decrease to zero near the edges of the frame. There is a list of window functions and a lot of window functions are available in scipy.signal. A window is applied as:
tuckey_window=signal.tukey(len(data),0.5,True)
data=data*tuckey_window
At that point, the frequencies exibiting the largest magnitude still are 262, 330 and 392. Applying a window only makes the peaks more visible: the DFT of the windowed signal features three distinguished peaks, each featuring a central lobe and side lobes, depending on the DFT of the window. The lobes of these windows are symmetric: the central frequency can therefore be computed as the mean frequency of the peak, with respect to power density.
import numpy as np
from scipy import signal
import scipy
sr = 44100 # sample rate
x = np.linspace(0, 1, sr) # one second of signal
tpi = 2 * np.pi
data = np.sin(261.63 * tpi * x) + np.sin(329.63 * tpi * x) + np.sin(392.00 * tpi * x)
#a window...
tuckey_window=signal.tukey(len(data),0.5,True)
data=data*tuckey_window
data -= np.mean(data)
fft = np.fft.rfft(data, norm="ortho")
def abs2(x):
return x.real**2 + x.imag**2
fftmag=abs2(fft)[:1000]
peaks, _= signal.find_peaks(fftmag, height=np.max(fftmag)*0.1)
print "potential frequencies ", peaks
#compute the mean frequency of the peak with respect to power density
powerpeak=np.zeros(len(peaks))
powerpeaktimefrequency=np.zeros(len(peaks))
for i in range(1000):
dist=1000
jnear=0
for j in range(len(peaks)):
if dist>np.abs(i-peaks[j]):
dist=np.abs(i-peaks[j])
jnear=j
powerpeak[jnear]+=fftmag[i]
powerpeaktimefrequency[jnear]+=fftmag[i]*i
powerpeaktimefrequency=np.divide(powerpeaktimefrequency,powerpeak)
print 'corrected frequencies', powerpeaktimefrequency
The resulting estimated frequencies are 261.6359 Hz, 329.637Hz and 392.0088 Hz: it much better than 262, 330 and 392Hz and it satisfies the required 0.01Hz accuracy for such a pure noiseless input signal.