I'm trying to do the first exercise on scikit-learn, but even when I run their solution code (shown below) I get the error in the code block immediately following. Does anyone know why this is happening? How can I resolve this?
The predict method also fails when trying to use this dataset, for some reason it seems to work fine for the iris dataset using the code at the very bottom of the question. sorry if I am missing something very obvious, I am not an actual programmer.
Traceback (most recent call last):
File "C:Usersuser2491873Desktopscikit_exercise.py", line 30, in <module>
print(knn.fit(X_train, y_train).score(X_test, y_test))
File "C:Python33libsite-packagessklearnase.py", line 279, in score
return accuracy_score(y, self.predict(X))
File "C:Python33libsite-packagessklearn
eighborsclassification.py", line 131, in predict
neigh_dist, neigh_ind = self.kneighbors(X)
File "C:Python33libsite-packagessklearn
eighborsase.py", line 254, in kneighbors
warn_equidistant()
File "C:Python33libsite-packagessklearn
eighborsase.py", line 33, in warn_equidistant
warnings.warn(msg, NeighborsWarning, stacklevel=3)
File "C:Python33libidlelibPyShell.py", line 59, in idle_showwarning
file.write(warnings.formatwarning(message, category, filename,
AttributeError: 'NoneType' object has no attribute 'write'
here is the code:
"""
================================
Digits Classification Exercise
================================
This exercise is used in the :ref:`clf_tut` part of the
:ref:`supervised_learning_tut` section of the
:ref:`stat_learn_tut_index`.
"""
from sklearn import datasets, neighbors, linear_model
digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target
n_samples = len(X_digits)
X_train = X_digits[:.9 * n_samples]
y_train = y_digits[:.9 * n_samples]
X_test = X_digits[.9 * n_samples:]
y_test = y_digits[.9 * n_samples:]
knn = neighbors.KNeighborsClassifier()
logistic = linear_model.LogisticRegression()
print('KNN score: %f' % knn.fit(X_train, y_train).score(X_test, y_test))
print('LogisticRegression score: %f'
% logistic.fit(X_train, y_train).score(X_test, y_test))
This is the code for the Iris dataset which seems to work fine...
import numpy as np
>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> iris_X = iris.data
>>> iris_y = iris.target
>>> np.unique(iris_y)
array([0, 1, 2])
>>> # Split iris data in train and test data
>>> # A random permutation, to split the data randomly
>>> np.random.seed(0)
>>> indices = np.random.permutation(len(iris_X))
>>> iris_X_train = iris_X[indices[:-10]]
>>> iris_y_train = iris_y[indices[:-10]]
>>> iris_X_test = iris_X[indices[-10:]]
>>> iris_y_test = iris_y[indices[-10:]]
>>> # Create and fit a nearest-neighbor classifier
>>> from sklearn.neighbors import KNeighborsClassifier
>>> knn = KNeighborsClassifier()
>>> knn.fit(iris_X_train, iris_y_train)
KNeighborsClassifier(algorithm='auto', leaf_size=30, n_neighbors=5, p=2,
warn_on_equidistant=True, weights='uniform')
>>> knn.predict(iris_X_test)
array([1, 2, 1, 0, 0, 0, 2, 1, 2, 0])
>>> iris_y_test
array([1, 1, 1, 0, 0, 0, 2, 1, 2, 0])
See Question&Answers more detail:
os