python - Sum operation on PySpark DataFrame giving TypeError when type is fine

Question

Welcome To Ask or Share your Answers For Others

python - Sum operation on PySpark DataFrame giving TypeError when type is fine

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Sum operation on PySpark DataFrame giving TypeError when type is fine

I have such DataFrame in PySpark (this is the result of a take(3), the dataframe is very big):

sc = SparkContext()
df = [Row(owner=u'u1', a_d=0.1), Row(owner=u'u2', a_d=0.0), Row(owner=u'u1', a_d=0.3)]

the same owner will have more rows. What I need to do is summing the values of the field a_d per owner, after grouping, as

b = df.groupBy('owner').agg(sum('a_d').alias('a_d_sum'))

but this throws error

TypeError: unsupported operand type(s) for +: 'int' and 'str'

However, the schema contains double values, not strings (this comes from a printSchema()):

root
|-- owner: string (nullable = true)
|-- a_d: double (nullable = true)

So what is happening here?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:34:01+0000

You are not using the correct sum function but the built-in function sum (by default).

So the reason why the build-in function won't work is that's it takes an iterable as an argument where as here the name of the column passed is a string and the built-in function can't be applied on a string. Ref. Python Official Documentation.

You'll need to import the proper function from pyspark.sql.functions :

from pyspark.sql import Row
from pyspark.sql.functions import sum as _sum

df = sqlContext.createDataFrame(
    [Row(owner=u'u1', a_d=0.1), Row(owner=u'u2', a_d=0.0), Row(owner=u'u1', a_d=0.3)]
)

df2 = df.groupBy('owner').agg(_sum('a_d').alias('a_d_sum'))
df2.show()

# +-----+-------+
# |owner|a_d_sum|
# +-----+-------+
# |   u1|    0.4|
# |   u2|    0.0|
# +-----+-------+

Categories

python - Sum operation on PySpark DataFrame giving TypeError when type is fine

python - Sum operation on PySpark DataFrame giving TypeError when type is fine

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags