Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
641 views
in Technique[技术] by (71.8m points)

python - How to create a new data frame based on conditions from another data frame

Just getting into Python, so hopefully I'm not asking a stupid question here...

So I have a pandas dataframe named "df_complete' with let's say 100 rows, and containing columns named: "type", "writer", "status", 'col a', 'col c'. I want to create/update a new dataframe named "temp_df" and create it based on conditions using "df_complete" values.

temp_df = pandas.DataFrame()

if ((df_complete['type'] == 'NDD') & (df_complete['writer'] == 'Mary') & (df_complete['status'] != '7')):
    temp_df['col A'] = df_complete['col a']
    temp_df['col B'] = 'good'
    temp_df['col C'] = df_complete['col c']

However, when I do this, I got the following error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I read this thread and changed my "and" to "&": Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I also read this thread here to put everything in parenthesis: comparing dtyped [float64] array with a scalar of type [bool] in Pandas DataFrame

But the error is still present. What is causing this? and how can I fix it?

** follow up question ** Also, how can I obtain the index values of those rows that met the condition?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think you need boolean indexing with loc for selecting only columns col a and col c:

temp_df = df_complete.loc[(df_complete['type'] == 'NDD') & 
                         (df_complete['writer'] == 'Mary') & 
                         (df_complete['status'] != '7'), ['col a','col c']]
#rename columns
temp_df = temp_df.rename(columns={'col a':'col A','col c':'col C'})
#add new column 
temp_df['col B'] = 'good'
#reorder columns
temp_df = temp_df[['col A','col B','col C']]

Sample:

df_complete = pd.DataFrame({'type':  ['NDD','NDD','NT'],
                            'writer':['Mary','Mary','John'],
                            'status':['4','5','6'],
                            'col a': [1,3,5],
                            'col b': [5,3,6],
                            'col c': [7,4,3]}, index=[3,4,5])

print (df_complete)
   col a  col b  col c status type writer
3      1      5      7      4  NDD   Mary
4      3      3      4      5  NDD   Mary
5      5      6      3      6   NT   John

temp_df = df_complete.loc[(df_complete['type'] == 'NDD') & 
                         (df_complete['writer'] == 'Mary') & 
                         (df_complete['status'] != '7'), ['col a','col c']]

print (temp_df)  
   col a  col c
3      1      7
4      3      4

temp_df = temp_df.rename(columns={'col a':'col A','col c':'col C'})
#add new column 
temp_df['col B'] = 'good'
#reorder columns
temp_df = temp_df[['col A','col B','col C']]
print (temp_df)  
   col A col B  col C
3      1  good      7
4      3  good      4

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...