Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
223 views
in Technique[技术] by (71.8m points)

python - What is the method to add new column in existing dataframe in pyspark

I have added a new column to an existing datframe but it's not reflected in dataframe.

customerDf.withColumn("fullname",expr("concat(firstname,'|',lastname)"))

customerDf.show() # it's showing existing old df records without new columns.

we can see the results if we can assign the dataframe to another dataframe

test = customerDf.withColumn("fullname",expr("concat(firstname,'|',lastname)"))
test.show()

Is there any way to add a new column to an existing dataframe (without copy dataframe)? We will have one option (inplace=True in pandas). Do we have any similar function in pyspark?

question from:https://stackoverflow.com/questions/65896658/what-is-the-method-to-add-new-column-in-existing-dataframe-in-pyspark

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Short answer: no there is no such thing in pyspark.

Spark DataFrames are immutable. This means, when you add a new column (or any other transformation) you're not changing the data frame, but creating a new one.

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.withColumn:

Returns a new DataFrame by adding a column or replacing the existing column that has the same name.

In Python you can, however, re-assign the result to "same variable" :

customerDf = customerDf.withColumn("fullname",expr("concat(firstname,'|',lastname)"))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...