First off, I apologize if my issue is simple. I did spend a lot of time researching it.
I am trying to set up a scalar Pandas UDF in a PySpark script as described here.
Here is my code:
from pyspark import SparkContext
from pyspark.sql import functions as F
from pyspark.sql.types import *
from pyspark.sql import SQLContext
sc.install_pypi_package("pandas")
import pandas as pd
sc.install_pypi_package("PyArrow")
df = spark.createDataFrame(
[("a", 1, 0), ("a", -1, 42), ("b", 3, -1), ("b", 10, -2)],
("key", "value1", "value2")
)
df.show()
@F.pandas_udf("double", F.PandasUDFType.SCALAR)
def pandas_plus_one(v):
return pd.Series(v + 1)
df.select(pandas_plus_one(df.value1)).show()
# Also fails
#df.select(pandas_plus_one(df["value1"])).show()
#df.select(pandas_plus_one("value1")).show()
#df.select(pandas_plus_one(F.col("value1"))).show()
The script fails at the last statement:
An error occurred while calling o209.showString. :
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 2 in stage 8.0 failed 4 times, most recent failure: Lost task 2.3
in stage 8.0 (TID 30, ip-10-160-2-53.ec2.internal, executor 3):
java.lang.IllegalArgumentException at
java.nio.ByteBuffer.allocate(ByteBuffer.java:334) at
org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:543)
at
org.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:58)
at
org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:132)
at
org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:181)
at
org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:172)
at
org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:65)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:162)
at
org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:122)
at
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)
...
What am I missing here? I am just following the manual. Thanks for your help
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…