I have some interesting user data. It gives some information on the timeliness of certain tasks the users were asked to perform. I am trying to find out, if late
- which tells me if users are on time (0
), a little late (1
), or quite late (2
) - is predictable/explainable. I generate late
from a column giving traffic light information (green = not late, red = super late).
Here is what I do:
#imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn import svm
import sklearn.metrics as sm
#load user data
df = pd.read_csv('April.csv', error_bad_lines=False, encoding='iso8859_15', delimiter=';')
#convert objects to datetime data types
cols = ['Planned Start', 'Actual Start', 'Planned End', 'Actual End']
df = df[cols].apply(
pd.to_datetime, dayfirst=True, errors='ignore'
).join(df.drop(cols, 1))
#convert datetime to numeric data types
cols = ['Planned Start', 'Actual Start', 'Planned End', 'Actual End']
df = df[cols].apply(
pd.to_numeric, errors='ignore'
).join(df.drop(cols, 1))
#add likert scale for green, yellow and red traffic lights
df['late'] = 0
df.ix[df['End Time Traffic Light'].isin(['Yellow']), 'late'] = 1
df.ix[df['End Time Traffic Light'].isin(['Red']), 'late'] = 2
#Supervised Learning
#X and y arrays
# X = np.array(df.drop(['late'], axis=1))
X = df[['Planned Start', 'Actual Start', 'Planned End', 'Actual End', 'Measure Package', 'Measure' , 'Responsible User']].as_matrix()
y = np.array(df['late'])
#preprocessing the data
X = preprocessing.scale(X)
#Supper Vector Machine
clf = svm.SVC(decision_function_shape='ovo')
clf.fit(X, y)
print(clf.score(X, y))
I am now trying to understand how to plot the decision boundaries.My goal is to plot a 2-way scatter with Actual End
and Planned End
. Naturally, I checked the documentation (see e.g. here). But I can't wrap my head around it. How does this work?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…