Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
344 views
in Technique[技术] by (71.8m points)

'date_sub' is not defined pyspark

I am trying to use date_sub in the below expression but getting error

def get_dateid_1(datetime):
    datetime = str(datetime).rsplit(" ")[0]
    return datetime

df123 = F.expr(date_sub(get_dateid_1(datetime.now())), 1)
print(df123)
ERROR:
date_sub() missing 1 required positional argument: 'days'
Traceback (most recent call last):
TypeError: date_sub() missing 1 required positional argument: 'days'

though have given the days, it gives me the error

Any help would be much appreciated

question from:https://stackoverflow.com/questions/65927011/date-sub-is-not-defined-pyspark

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your code contains many errors and I'm not sure I quite understand what you want to do.

expr takes a string expression but you're passing a "column". It should be like this:

df123 = F.expr(f"date_sub({get_dateid_1(datetime.now())}, 1)")
print(df123)

# Column<b'date_sub(((2021 - 1) - 27), 1)'>

Or if you want to use Pyspark functions (lit to pass the date returned by the function) :

df123 = F.date_sub(F.lit(get_dateid_1(datetime.now())), 1)
print(df123)

# Column<b'date_sub(2021-01-27, 1)'>

However, if your intent is to substract one day to the current date, you should be using the Spark builtin function current_date:

df123 = F.date_sub(F.current_date(), 1)
print(df123)

# Column<b'date_sub(current_date(), 1)'>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...