I've been searching for a while if there is any way to use a Scala
class in Pyspark
, and I haven't found any documentation nor guide about this subject.
Let's say I create a simple class in Scala
that uses some libraries of apache-spark
, something like:
class SimpleClass(sqlContext: SQLContext, df: DataFrame, column: String) {
def exe(): DataFrame = {
import sqlContext.implicits._
df.select(col(column))
}
}
- Is there any possible way to use this class in
Pyspark
?
- Is it too tough?
- Do I have to create a
.py
file?
- Is there any guide that shows how to do that?
By the way I also looked at the spark
code and I felt a bit lost, and I was incapable of replicating their functionality for my own purpose.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…