Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
83 views
in Technique[技术] by (71.8m points)

python - In Pandas, how to assign a function manipulating strings on a column?

I am trying to create/modify a column by applying a function manipulating strings from one or two columns.

To give a concrete example, I have the following dataframe and function:

def get_sign(number:str, name:str) -> str:
    """Function to apply on two columns to produce another one"""
    if number.startswith("-"):
        return "Negative-" + name[0]
    else:
        return "Positive-" + name[0]

df = pd.DataFrame({'name': ["John", "Jack", "Jeff", "Kate"], "number":["123_456", "-123", "+456", "-0"], "age": [10, 20, 30, 36]})

I am trying to get this Dataframe:

    name    number  age sign
0   John    123_456 10  Positive-J
1   Jack    -123    20  Negative-J
2   Jeff    +456    30  Positive-J
3   Kate    -0      36  Negative-K

I tried to use assign with Series transformed to "strings" but got the following errors:

df.assign(sign=lambda x:get_sign(x["number"].str, x["name"].str))
<ipython-input-64-e72c1bf8f4bf> in <lambda>(x)
      7 
      8 df = pd.DataFrame({'name': ["John", "Jack", "Jeff", "Kate"], "number":["123_456", "-123", "+456", "-0"], "age": [10, 20, 30, 36]})
----> 9 df.assign(sign=lambda x:get_sign(x["number"].str, x["name"].str))
     10 df["sign"] = pd.Series([get_sign(el) for el in df["number"]])
     11 df
 
...

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can I achieve it? Thanks

question from:https://stackoverflow.com/questions/65848098/in-pandas-how-to-assign-a-function-manipulating-strings-on-a-column

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For your function , you can try:

def get_sign(number:str, name:str) -> str:
    """Function to apply on two columns to produce another one"""
    if number.startswith("-"):
        return "Negative-" + name[0]
    else:
        return "Positive-" + name[0]
out = df.assign(sign=df[['number','name']].apply(lambda x: get_sign(*x),axis=1))

Note that you can vectorize the function too so that you dont need apply which is slow:

import numpy as np
def get_sign_modified(dataframe,number:str, name:str) -> str:
    return np.where(dataframe[number].str.startswith("-"),
    "Negative-" + dataframe[name].str[0], "Positive-" +dataframe[name].str[0])
out = df.assign(sign=get_sign_modified(df,'number','name'))

   name   number  age        sign
0  John  123_456   10  Positive-J
1  Jack     -123   20  Negative-J
2  Jeff     +456   30  Positive-J
3  Kate       -0   36  Negative-K

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...