Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.0k views
in Technique[技术] by (71.8m points)

numpy - Question related to Iris source- python. -Clustering

I just installed python and Im really a new beginner to it. My first task was to build a chart on Jupiter lab using iris data set. The below is the code I use to cluster it under python Jupiter notebook

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

In [2]:
iris_frame=pd.read_csv("Iris.csv")
iris_frame.head()

In [3]:
x=iris_frame.drop(columns=["Species","Id"] ,axis=1)
y=iris_frame.Species
from sklearn.preprocessing import LabelEncoder
encode=LabelEncoder()
y=encode.fit_transform(y)
y


In [4]:
model=KMeans(n_clusters=3,random_state=1)
y_pred=model.fit_predict(x)

x=x.values

In [5]:
# Visualising the clusters - On the last two columns(petal length, width)
plt.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], 
            s = 100, c = 'magenta', label = 'Iris-setosa')
plt.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], 
            s = 100, c = 'blue', label = 'Iris-versicolour')
plt.scatter(x[y_pred == 2, 0], x[y_pred == 2, 1],
            s = 100, c = 'green', label = 'Iris-virginica')

# Plotting the centroids of the clusters
plt.scatter(model.cluster_centers_[:, 0], model.cluster_centers_[:,1], 
            s = 100, c = 'black', label = 'Centroids')

plt.legend()
plt.show()

There are 6 columns in the data set as Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm and Species.

But as per the below code, the code only accepting two columns as X and Y axis right? such as

 In [3]:
    x=iris_frame.drop(columns=["Species","Id"] ,axis=1)
    y=iris_frame.Species
    from sklearn.preprocessing import LabelEncoder
    encode=LabelEncoder()
    y=encode.fit_transform(y)
    y

If it is yes, why re they don't use other columns in clustering. Because to cluster accurately, it has to use all data in all columns right? need some explanation.

PS. I know nothing about python. this is my first day.. :)

this is the link I have used to construct it

https://github.com/MeghanaKankanala/TSF/blob/main/Iris_clustering.ipynb

Thank you very much


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...