I'm trying to create a column and after that apply a filter but I'm getting an error:
TypeError: when() missing 1 required positional argument: 'value'
Here is the code used:
df = (
spark.table(f'nn_squad7_{country}.fact_table')
.filter(f.col('date_key').between(start,end))
.filter(f.col('is_client_plus')==1)
.filter(f.col('source')=='tickets')
.filter(f.col('subtype')=='trx')
.filter(f.col('is_trx_ok') == 1)
.withColumn('week', f.date_format(f.date_sub(f.col('date_key'), 1), 'YYYY-ww'))
.withColumn('month', f.date_format(f.date_sub(f.col('date_key'), 1), 'M'))
.withColumn('local_time',f.from_utc_timestamp(f.col('trx_begin_date_time'),'Europe/Brussels'))
.withColumn('Hour', f.hour(f.col('local_time')))
.filter(f.when(f.col('Hour')>= 4) & (f.col('Hour')<= 8))
)
The filter that I'm trying to apply it's related to the opening hours of the cliente, I want to know only data that took place between 4.00 and 8.00 pm for every friday. So if this filter works maybe I have to include another one for the day of the week.
Any clue about the mistake? Thanks
question from:
https://stackoverflow.com/questions/65917099/include-filter-after-withcolumn-pyspark 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…