pandas
is index
sensitive , which means they check the index
when assign
it , that is when you do the serise
assign , the whole df not change , since the index
is not change , after sort_index
, it still show the same order of values
, but if you do the numpy
array
assignment , the index
will not be considered , so that the value itself will be assign back to the original df
, which yield the output
An example of egde
df['string3']=pd.Series(['aaa','aaa','aaa','aaa'],index=[100,111,112,113])
df
Out[462]:
string1 string2 string3
0 abc vwx NaN
1 ghi jkl NaN
2 mno dfe NaN
3 stu pqr NaN
Because of that index sensitive when you do condition assignment with.loc
You can always do
df.loc[df.condition,'value']=df.value*100
# since the not selected one will not be change
Just same to what you do with np.where
df['value']=np.where(df.condition,df.value*100 ,df.value)
Some other use case
when I do groupby
apply
with none-agg function and try to assign it back ,why it is failed
df['String4']=df.groupby('string1').apply(lambda x :x['string2']+'aa')
TypeError: incompatible index of inserted column with frame index
Let us try to look at the return of groupby.apply
df.groupby('string1').apply(lambda x : x['string2']+'aa')
Out[466]:
string1
abc 0 vwxaa
ghi 1 jklaa
mno 2 dfeaa
stu 3 pqraa
Name: string2, dtype
Notice here it add the one more level into the index , so the return is multiple index ,and original df only have one dimension which will cause the error message .
How to fix it ?
reset
the index
and using the original index which is the second level of the groupby
product , then assign it back
df['String4']=df.groupby('string1').apply(lambda x : x['string2']+'aa').reset_index(level=0,drop=True)
df
Out[469]:
string1 string2 string3 String4
0 abc vwx NaN vwxaa
1 ghi jkl NaN jklaa
2 mno dfe NaN dfeaa
3 stu pqr NaN pqraa
As Erfan mentioned in the comment, how can we forbidden accidentally assign unwanted value to pandas.DataFrame
Two different ways of assign .
1st, with a array or list or tuple .. CANNOT ALIGN, which means when you have different length between df and assign object , it will fail
2nd assign with pandas
object
, ALWAYS aligns, no error will return, even the length different
However when the assign object have duplicated index , it will raise the error
df['string3']=pd.Series(['aaa','aaa','aaa','aaa'],index=[100,100,100,100])
ValueError: cannot reindex from a duplicate axis