Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
469 views
in Technique[技术] by (71.8m points)

python - How to get a non-shuffled train_test_split in sklearn

If I want a random train/test split, I use the sklearn helper function:

In [1]: from sklearn.model_selection import train_test_split
   ...: train_test_split([1,2,3,4,5,6])
   ...:
Out[1]: [[1, 6, 4, 2], [5, 3]]

What is the most concise way to get a non-shuffled train/test split, i.e.

[[1,2,3,4], [5,6]]

EDIT Currently I am using

train, test = data[:int(len(data) * 0.75)], data[int(len(data) * 0.75):] 

but hoping for something a little nicer. I have opened an issue on sklearn https://github.com/scikit-learn/scikit-learn/issues/8844

EDIT 2: My PR has been merged, in scikit-learn version 0.19, you can pass the parameter shuffle=False to train_test_split to obtain a non-shuffled split.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'm not adding much to Psidom's answer except an easy to copy paste function:

def non_shuffling_train_test_split(X, y, test_size=0.2):
    i = int((1 - test_size) * X.shape[0]) + 1
    X_train, X_test = np.split(X, [i])
    y_train, y_test = np.split(y, [i])
    return X_train, X_test, y_train, y_test

Update: At some point this feature became built in, so now you can do:

from sklearn.model_selection import train_test_split
train_test_split(X, y, test_size=0.2, shuffle=False)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...