I'm trying to do a clustering. I'm doing with pandas and sklearn.
import pandas
import pprint
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
from sklearn.feature_extraction.text import TfidfVectorizer
dataset = pandas.read_csv('text.csv', encoding='utf-8')
dataset_list = dataset.values.tolist()
vectors = TfidfVectorizer()
X = vectors.fit_transform(dataset_list)
clusters_number = 20
model = KMeans(n_clusters = clusters_number, init = 'k-means++', max_iter = 300, n_init = 1)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
clusters = {}
for comment, label in zip(dataset_list, labels):
print ('Comment:', comment)
print ('Label:', label)
try:
clusters[str(label)].append(comment)
except:
clusters[str(label)] = [comment]
pprint.pprint(clusters)
But I have the following error, even though I have never used the lower():
File "clustering.py", line 19, in <module>
X = vetorizer.fit_transform(dataset_list)
File "/usr/lib/python3/dist-packages/sklearn/feature_extraction/text.py", line 1381, in fit_transform
X = super(TfidfVectorizer, self).fit_transform(raw_documents)
File "/usr/lib/python3/dist-packages/sklearn/feature_extraction/text.py", line 869, in fit_transform
self.fixed_vocabulary_)
File "/usr/lib/python3/dist-packages/sklearn/feature_extraction/text.py", line 792, in _count_vocab
for feature in analyze(doc):
File "/usr/lib/python3/dist-packages/sklearn/feature_extraction/text.py", line 266, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "/usr/lib/python3/dist-packages/sklearn/feature_extraction/text.py", line 232, in <lambda>
return lambda x: strip_accents(x.lower())
AttributeError: 'list' object has no attribute 'lower'
I don't understand, my text (text.csv) is already lowercase. And I at no time called lower()
Data:
hello wish to cancel order thank you confirmation
hello would like to cancel order made today store house world
dimensions bed not compatible would like to know how to pass cancellation refund send today cordially
hello possible cancel order cordially
hello wants to cancel order request refund
hello wish to cancel this order can indicate process cordially
hello seen date delivery would like to cancel order thank you
hello wants to cancel matching order good delivery n ° 111111
hi would like to cancel this order
hello order product store cancel act doublon advance thank you cordially
hello wishes to cancel order thank you kindly refund greetings
hello possible cancel order please thank you in advance forward cordially
See Question&Answers more detail:
os