I have this question regarding scipy's splrep function, which I think is a bug, so I'll post every piece of code so you can reproduce it on your computers. Suppose I want to find the b-spline representation of some data, say, the one obtained by the following code, which creates a dataset as a mixture of ten gaussians with some addeded noise:
import numpy as np
# First we define the number of datapoints:
ndata = 100
x = np.arange(0,1,1./np.double(ndata))
means = np.random.uniform(0,1,10)
y = 0.0
for i in range(len(means)):
y = y+np.exp(-(x-means[i])**2./0.01)
# We add some noise to obtain the data:
data = y + np.random.normal(0,0.05,len(y))
Which should see like this:
Now, let's use splrep and splev functions to obtain the b-spline representation of this curve:
from scipy.interpolate import splrep,splev
# First define the number of knots. Let's put, say, 10 knots:
nknots = 10
# Now we crate the array of knots:
knots = np.arange(x[1],x[len(x)-1],(x[len(x)-1]-x[1])/np.double(nknots))
tck = splrep(x,data,t=knots)
fit = splev(x,tck)
If you plot everything up to here, everything seems ok:
However, there are problems with certain combinations of the number of datapoints and number of knots. For example, if you try the above code with ndata = 1931
and nknots = 796
, I get the following error:
File "/usr/lib/python2.7/dist-packages/scipy/interpolate/fitpack.py", line 465, in
splrep raise _iermess[ier][1](_iermess[ier][0])
ValueError: Error on input data
This is giving me problems because the above code is not automatizable. I'm playing with datasets which have ~19000 datapoints, where a while
cycle with try
and except
can be very computationally demanding. So my questions are:
- Can you reproduce this problem? And if you can...
- Do you know what's going on?
See Question&Answers more detail:
os