python - ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) when using silhouette_score

Question

Welcome To Ask or Share your Answers For Others

python - ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) when using silhouette_score

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) when using silhouette_score

I am trying to calculate silhouette score as I find the optimal number of clusters to create, but get an error that says:

ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive)

I am unable to understand the reason for this. Here is the code, that I am using to cluster and calculate silhouette score.

I read the csv that contains the text to be clustered and run K-Means on the n cluster values. What could be the reason I am getting this error?

  #Create cluster using K-Means
#Only creates graph
import matplotlib
#matplotlib.use('Agg')
import re
import os
import nltk, math, codecs
import csv
from nltk.corpus import stopwords
from gensim.models import Doc2Vec
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.metrics import silhouette_score

model_name = checkpoint_save_path
loaded_model = Doc2Vec.load(model_name)

#Load the test csv file
data = pd.read_csv(test_filename)
overview = data['overview'].astype('str').tolist()
overview = filter(bool, overview)
vectors = []

def split_words(text):
  return ''.join([x if x.isalnum() or x.isspace() else " " for x in text ]).split()

def preprocess_document(text):
  sp_words = split_words(text)
  return sp_words

for i, t in enumerate(overview):
  vectors.append(loaded_model.infer_vector(preprocess_document(t)))

sse = {}
silhouette = {}


for k in range(1,15):
  km = KMeans(n_clusters=k, max_iter=1000, verbose = 0).fit(vectors)
  sse[k] = km.inertia_
  #FOLLOWING LINE CAUSES ERROR
  silhouette[k] = silhouette_score(vectors, km.labels_, metric='euclidean')

best_cluster_size = 1
min_error = float("inf")

for cluster_size in sse:
    if sse[cluster_size] < min_error:
        min_error = sse[cluster_size]
        best_cluster_size = cluster_size

print(sse)
print("====")
print(silhouette)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

1.4m articles

1.4m replys

5 comments

57.0k users

Most popular tags

javascript python c# java How android c++ php ios html sql r c node.js .net iphone asp.net css reactjs jquery ruby What Android objective mysql linux Is git Python windows Why regex angular swift amazon excel algorithm macos Java visual how bash Can multithreading PHP Using scala angularjs typescript apache spring performance postgresql database flutter json rust arrays C# dart vba django wpf xml vue.js In go Get google jQuery xcode jsf http Google mongodb string shell oop powershell SQL C++ security assembly docker Javascript Android: Does haskell Convert azure debugging delphi vb.net Spring datetime pandas oracle math Django

联盟问答网站-Union QA website

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

DevDocs API Documentations

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

在这了问答社区

DevDocs API Documentations

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

DevDocs API Documentations

广告位招租

深蓝 · Answer 1 · 2021-10-23T17:49:57+0000

The error is produced because you have a loop for different number of clusters n. During the first iteration, n_clusters is 1 and this leads to all(km.labels_ == 0)to be True.

In other words, you have only one cluster with label 0 (thus, np.unique(km.labels_) prints array([0], dtype=int32)).

`silhouette_score` requires more than 1 cluster labels. This causes the error. The error message is clear.

Example:

from sklearn import datasets
from sklearn.cluster import KMeans
import numpy as np

iris = datasets.load_iris()
X = iris.data
y = iris.target

km = KMeans(n_clusters=3)
km.fit(X,y)

# check how many unique labels do you have
np.unique(km.labels_)
#array([0, 1, 2], dtype=int32)

We have 3 different clusters/cluster labels.

silhouette_score(X, km.labels_, metric='euclidean')
0.38788915189699597

The function works fine.

Now, let's cause the error:

km2 = KMeans(n_clusters=1)
km2.fit(X,y)

silhouette_score(X, km2.labels_, metric='euclidean')

ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive)

Categories

python - ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) when using silhouette_score

python - ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) when using silhouette_score

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

`silhouette_score` requires more than 1 cluster labels. This causes the error. The error message is clear.

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

python - ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) when using silhouette_score

python - ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) when using silhouette_score

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

silhouette_score requires more than 1 cluster labels. This causes the error. The error message is clear.

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

`silhouette_score` requires more than 1 cluster labels. This causes the error. The error message is clear.