Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
560 views
in Technique[技术] by (71.8m points)

apache spark - concat two columns in pyspark & add a text in between

Hi I'm using pyspark in 3.0.1 in Databricks. My pyspark dataframe ,df, contains a column Year with value like 2012 & another Column Quarter with number 1,2,3 & 4. I want to join Year & quarter & create another column year_qtr & it should contain value like 2012 Quarter-1 I tried following code

import pyspark.sql.functions as f
col_list = ['Year'," Quarter-",'Quarter']
df.withColumn("year_qtr", f.format_string('Year',' Quarter-','Quarter')).show()  

But I'm getting error message

AnalysisException: cannot resolve '` Quarter-`'

Can you help me to resolve the issue?

question from:https://stackoverflow.com/questions/65917586/concat-two-columns-in-pyspark-add-a-text-in-between

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to specify a format string in the first argument, just like how you would do it in Python. The subsequent arguments would correspond to the arguments inside the format string.

import pyspark.sql.functions as F

df2 = df.withColumn("year_qtr", F.format_string('%d Quarter-%d', 'Year', 'Quarter'))

Use %s if the columns are of string type. %d is only suitable for int type columns.


Alternatively, you can use concat:

import pyspark.sql.functions as F

df2 = df.withColumn("year_qtr", F.concat('Year', F.lit(' Quarter-'), 'Quarter'))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...