python - Why use pandas.assign rather than simply initialize new column?

Question

Welcome To Ask or Share your Answers For Others

python - Why use pandas.assign rather than simply initialize new column?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Why use pandas.assign rather than simply initialize new column?

I just discovered the assign method for pandas dataframes, and it looks nice and very similar to dplyr's mutate in R. However, I've always gotten by by just initializing a new column 'on the fly'. Is there a reason why assign is better?

For instance (based on the example in the pandas documentation), to create a new column in a dataframe, I could just do this:

df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
df['ln_A'] = np.log(df['A'])

but the pandas.DataFrame.assign documentation recommends doing this:

df.assign(ln_A = lambda x: np.log(x.A))
# or 
newcol = np.log(df['A'])
df.assign(ln_A=newcol)

Both methods return the same dataframe. In fact, the first method (my 'on the fly' method) is significantly faster (0.20225788200332318 seconds for 1000 iterations) than the .assign method (0.3526602769998135 seconds for 1000 iterations).

So is there a reason I should stop using my old method in favour of df.assign?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:40:17+0000

The difference concerns whether you wish to modify an existing frame, or create a new frame while maintaining the original frame as it was.

In particular, DataFrame.assign returns you a new object that has a copy of the original data with the requested changes ... the original frame remains unchanged.

In your particular case:

>>> df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})

Now suppose you wish to create a new frame in which A is everywhere 1 without destroying df. Then you could use .assign

>>> new_df = df.assign(A=1)

If you do not wish to maintain the original values, then clearly df["A"] = 1 will be more appropriate. This also explains the speed difference, by necessity .assign must copy the data while [...] does not.

Categories

python - Why use pandas.assign rather than simply initialize new column?

python - Why use pandas.assign rather than simply initialize new column?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags