python - How to split data into trainset and testset randomly?

Question

Welcome To Ask or Share your Answers For Others

python - How to split data into trainset and testset randomly?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to split data into trainset and testset randomly?

I have a large dataset and want to split it into training(50%) and testing set(50%).

Say I have 100 examples stored the input file, each line contains one example. I need to choose 50 lines as training set and 50 lines testing set.

My idea is first generate a random list with length 100 (values range from 1 to 100), then use the first 50 elements as the line number for the 50 training examples. The same with testing set.

This could be achieved easily in Matlab

fid=fopen(datafile);
C = textscan(fid, '%s','delimiter', '
');
plist=randperm(100);
for i=1:50
    trainstring = C{plist(i)};
    fprintf(train_file,trainstring);
end
for i=51:100
    teststring = C{plist(i)};
    fprintf(test_file,teststring);
end

But how could I accomplish this function in Python? I'm new to Python, and don't know whether I could read the whole file into an array, and choose certain lines.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:43:13+0000

This can be done similarly in Python using lists, (note that the whole list is shuffled in place).

import random

with open("datafile.txt", "rb") as f:
    data = f.read().split('
')

random.shuffle(data)

train_data = data[:50]
test_data = data[50:]

Categories

python - How to split data into trainset and testset randomly?

python - How to split data into trainset and testset randomly?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags