Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
163 views
in Technique[技术] by (71.8m points)

python - Split dataset containing multiple labels

I have a dataset with multiple labels, ie for each X I have 2 y and I need to split into train and test set.

I tried with the sklearn function train_test_split():

import numpy as np
from sklearn.model_selection import train_test_split

X = np.random.randn(10)
y1 = np.random.randint(1,10,10)
y2 = np.random.randint(1,3,10)

X_train, X_test, [Y1_train, Y2_train], [Y1_test, Y2_test] = train_test_split(X, [y1, y2], test_size=0.4, random_state=42)

But I get an error message:

ValueError: Found input variables with inconsistent numbers of samples: [10, 2]
question from:https://stackoverflow.com/questions/66056596/split-dataset-containing-multiple-labels

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This code should work for you.

import numpy as np
from sklearn.model_selection import train_test_split

X = np.random.randn(10)
y1 = np.random.randint(1,10,10)
y2 = np.random.randint(1,3,10)
y = [[y1[i],y2[i]] for i in range(len(y1))] 

X_train, X_test, Y_train, Y_test  = train_test_split(X, y, test_size=0.4, random_state=42)

It will produce the following Output

print(X_train)
[ 0.42534237  1.35471168  0.00640736  1.34057234  0.50608562 -1.73341641]

and

print(Y_train)
[[3, 1], [7, 1], [6, 2], [4, 2], [6, 2], [2, 2]]

In your code your label array has the shape (2,10) but the input array has the shape (10,).

print([y1,y2])
[array([2, 3, 7, 6, 4, 9, 2, 3, 6, 6]), array([2, 2, 1, 2, 2, 2, 2, 1, 1, 2])]

print(np.array([y1,y2]).shape)
(2, 10)

print(X.shape)
(10,)

But your desired shape for the labels was (10,2):

print(y)
[[2, 2], [3, 2], [7, 1], [6, 2], [4, 2], [9, 2], [2, 2], [3, 1], [6, 1], [6, 2]]

print(np.array(y).shape)
(10, 2)

Input and Output must have the same shape.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...