Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
257 views
in Technique[技术] by (71.8m points)

python - How can I add a row or replace in a specific index in Pyspark Dataframe?

I want to add this list L1 as a row in the first index, How to append a row in a specific index in Pyspark Dataframe?

L1=['na',5.6,2.4]

data=[('fr',8.8,6.6),
      ('nr',4.4,2.5),
      ('cc',2.3,3.9)]
data_schema=[StructField('loc',StringType(),True),StructField('col',FloatType(),True),StructField('io',FloatType(),True)]
final=StructType(fields=data_schema)


df=spark.createDataFrame(data,schema=final)

df=df.withColumn("idx", F.row_number().over(Window.orderBy('col'))) 

>>show
+---+----+---+---+
|loc| col| io|idx|
+---+----+---+---+
| fr| 8.8|6.6|  1|
| nr| 4.4|2.5|  2|
| cc| 2.3|3.9|  3|
question from:https://stackoverflow.com/questions/65868171/how-can-i-add-a-row-or-replace-in-a-specific-index-in-pyspark-dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can filter the rows with idx != 1, and add a row using union:

from pyspark.sql import functions as F, Window

L1 = ['na',5.6,2.4]
data = [('fr',8.8,6.6),
        ('nr',4.4,2.5),
        ('cc',2.3,3.9)]

df = spark.createDataFrame(data, ['loc', 'col', 'io'])

df2 = df.withColumn(
    "idx",
    F.row_number().over(Window.orderBy('loc'))
).filter('idx != 1').union(spark.createDataFrame([L1 + [1]]))

df2.show()
+---+---+---+---+
|loc|col| io|idx|
+---+---+---+---+
| fr|8.8|6.6|  2|
| nr|4.4|2.5|  3|
| na|5.6|2.4|  1|
+---+---+---+---+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...