apache spark - pyspark Column is not iterable

Question

Welcome To Ask or Share your Answers For Others

apache spark - pyspark Column is not iterable

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

apache spark - pyspark Column is not iterable

Having this dataframe I am getting Column is not iterable when I try to groupBy and getting max:

linesWithSparkDF
+---+-----+
| id|cycle|
+---+-----+
| 31|   26|
| 31|   28|
| 31|   29|
| 31|   97|
| 31|   98|
| 31|  100|
| 31|  101|
| 31|  111|
| 31|  112|
| 31|  113|
+---+-----+
only showing top 10 rows


ipython-input-41-373452512490> in runlgmodel2(model, data)
     65     linesWithSparkDF.show(10)
     66 
---> 67     linesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg(max(col("cycle")))
     68     print "linesWithSparkGDF"
     69 

/usr/hdp/current/spark-client/python/pyspark/sql/column.py in __iter__(self)
    241 
    242     def __iter__(self):
--> 243         raise TypeError("Column is not iterable")
    244 
    245     # string methods

TypeError: Column is not iterable

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:53:34+0000

It's because, you've overwritten the max definition provided by apache-spark, it was easy to spot because max was expecting an iterable.

To fix this, you can use a different syntax, and it should work.

inesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg({"cycle": "max"})

or alternatively

from pyspark.sql.functions import max as sparkMax

linesWithSparkGDF = linesWithSparkDF.groupBy(col("id")).agg(sparkMax(col("cycle")))

Categories

apache spark - pyspark Column is not iterable

apache spark - pyspark Column is not iterable

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags