I have such DataFrame in PySpark (this is the result of a take(3), the dataframe is very big):
sc = SparkContext()
df = [Row(owner=u'u1', a_d=0.1), Row(owner=u'u2', a_d=0.0), Row(owner=u'u1', a_d=0.3)]
the same owner will have more rows. What I need to do is summing the values of the field a_d per owner, after grouping, as
b = df.groupBy('owner').agg(sum('a_d').alias('a_d_sum'))
but this throws error
TypeError: unsupported operand type(s) for +: 'int' and 'str'
However, the schema contains double values, not strings (this comes from a printSchema()):
root
|-- owner: string (nullable = true)
|-- a_d: double (nullable = true)
So what is happening here?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…