Here's a dataframe:
A B C
0 6 2 -5
1 2 5 2
2 10 3 1
3 -5 2 8
4 3 6 2
I could retrieve a column which is basically a tuple of columns from the original df
using df.apply
:
out = df.apply(tuple, 1)
print(out)
0 (6, 2, -5)
1 (2, 5, 2)
2 (10, 3, 1)
3 (-5, 2, 8)
4 (3, 6, 2)
dtype: object
But if I want a list of values instead of a tuple of them, I can't do it, because it doesn't give me what I expect:
out = df.apply(list, 1)
print(out)
A B C
0 6 2 -5
1 2 5 2
2 10 3 1
3 -5 2 8
4 3 6 2
Instead, I need to do:
out = pd.Series(df.values.tolist())
print(out)
0 [6, 2, -5]
1 [2, 5, 2]
2 [10, 3, 1]
3 [-5, 2, 8]
4 [3, 6, 2]
dtype: object
Why can't I use df.apply(list, 1)
to get what I want?
Appendix
Timings of some possible workarounds:
df_test = pd.concat([df] * 10000, 0)
%timeit pd.Series(df.values.tolist()) # original workaround
10000 loops, best of 3: 161 μs per loop
%timeit df.apply(tuple, 1).apply(list, 1) # proposed by Alexander
1000 loops, best of 3: 615 μs per loop
See Question&Answers more detail:
os