I just discovered the assign
method for pandas dataframes, and it looks nice and very similar to dplyr's mutate
in R. However, I've always gotten by by just initializing a new column 'on the fly'. Is there a reason why assign
is better?
For instance (based on the example in the pandas documentation), to create a new column in a dataframe, I could just do this:
df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
df['ln_A'] = np.log(df['A'])
but the pandas.DataFrame.assign
documentation recommends doing this:
df.assign(ln_A = lambda x: np.log(x.A))
# or
newcol = np.log(df['A'])
df.assign(ln_A=newcol)
Both methods return the same dataframe. In fact, the first method (my 'on the fly' method) is significantly faster (0.20225788200332318 seconds for 1000 iterations) than the .assign
method (0.3526602769998135 seconds for 1000 iterations).
So is there a reason I should stop using my old method in favour of df.assign
?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…