python - What is the difference between combine_first and fillna?

Question

Welcome To Ask or Share your Answers For Others

python - What is the difference between combine_first and fillna?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - What is the difference between combine_first and fillna?

These two functions seem equivalent to me. You can see that they accomplish the same goal in the code below, as columns c and d are equal. So when should I use one over the other?

Here is an example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=list('ab'))
df.loc[::2, 'a'] = np.nan

Returns:

     a  b
0  NaN  4
1  2.0  6
2  NaN  8
3  0.0  4
4  NaN  4
5  0.0  8
6  NaN  7
7  2.0  2
8  NaN  9
9  7.0  2

This is my starting point. Now I will add two columns, one using combine_first and one using fillna, and they will produce the same result:

df['c'] = df.a.combine_first(df.b)
df['d'] = df['a'].fillna(df['b'])

Returns:

     a  b    c    d
0  NaN  4  4.0  4.0
1  8.0  7  8.0  8.0
2  NaN  2  2.0  2.0
3  3.0  0  3.0  3.0
4  NaN  0  0.0  0.0
5  2.0  4  2.0  2.0
6  NaN  0  0.0  0.0
7  2.0  6  2.0  2.0
8  NaN  4  4.0  4.0
9  4.0  6  4.0  4.0

Credit to this question for the data set: Combine Pandas data frame column values into new column

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:59:04+0000

combine_first is intended to be used when there is exists non-overlapping indices. It will effectively fill in nulls as well as supply values for indices and columns that didn't exist in the first.

dfa = pd.DataFrame([[1, 2, 3], [4, np.nan, 5]], ['a', 'b'], ['w', 'x', 'y'])

     w    x    y  
a  1.0  2.0  3.0  
b  4.0  NaN  5.0  

dfb = pd.DataFrame([[1, 2, 3], [3, 4, 5]], ['b', 'c'], ['x', 'y', 'z'])

     x    y    z
b  1.0  2.0  3.0
c  3.0  4.0  5.0

dfa.combine_first(dfb)

     w    x    y    z
a  1.0  2.0  3.0  NaN
b  4.0  1.0  5.0  3.0  # 1.0 filled from `dfb`; 5.0 was in `dfa`; 3.0 new column
c  NaN  3.0  4.0  5.0  # whole new index

Notice that all indices and columns are included in the results

Now if we fillna

dfa.fillna(dfb)

   w    x  y
a  1  2.0  3
b  4  1.0  5  # 1.0 filled in from `dfb`

Notice no new columns or indices from dfb are included. We only filled in the null value where dfa shared index and column information.

In your case, you use fillna and combine_first on one column with the same index. These translate to effectively the same thing.

Categories

python - What is the difference between combine_first and fillna?

python - What is the difference between combine_first and fillna?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags