I have a really huge CSV files. There are about 1700 columns and 40000 rows like below:
x,y,z,x1,x2,x3,x4,x5,x6,x7,x8,x9,...(about 1700 more)...,x1700
0,0,0,a1,a2,a3,a4,a5,a6,a7,a8,a9,...(about 1700 more)...,a1700
1,1,1,b1,b2,b3,b4,b5,b6,b7,b8,b9,...(about 1700 more)...,b1700
// (about 40000 more rows below)
I need to split this CSV file into multiple files which contain a less number of columns like:
# file1.csv
x,y,z
0,0,0
1,1,1
... (about 40000 more rows below)
# file2.csv
x1,x2,x3,x4,x5,x6,x7,x8,x9,...(about 1000 more)...,x1000
a1,a2,a3,a4,a5,a6,a7,a8,a9,...(about 1000 more)...,a1000
b1,b2,b3,b4,b5,b6,b7,b8,b9,...(about 1000 more)...,b1700
// (about 40000 more rows below)
#file3.csv
x1001,x1002,x1003,x1004,x1005,...(about 700 more)...,x1700
a1001,a1002,a1003,a1004,a1005,...(about 700 more)...,a1700
b1001,b1002,b1003,b1004,b1005,...(about 700 more)...,b1700
// (about 40000 more rows below)
Is there any program or library doing this?
I've googled for it , but programs that I found only split a file by rows not by columns.
Or which language could I use to do this efficiently?
I can use R, shell script, Python, C/C++, Java
See Question&Answers more detail:
os