Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
390 views
in Technique[技术] by (71.8m points)

scala - Import custom udf from jar to Spark

I am using Jupyter notebook for running Spark. My problem is to register UDF from my custom imported jar.

This is how I create udf in jar:

package com.udf;
import org.apache.spark.sql.api.java.UDF1;
  
public class TestUDF implements UDF1<String,String> {

   public String call(String arg) throws Exception {
      return doSomehing(arg);
}
...

then I try to import it like this in Jupyter notebook

val spark = SparkSession.builder
.master("yarn")
.appName("Spark SQL")
.config("spark.jars", "/user/.../test.udf.jar")
.getOrCreate()

or like this

spark.sparkContext.addJar("/user/.../test.udf.jar")

Not sure if these imports work fine but there is no error message at least. Then trying to register my udf like this

spark.udf.register("myUDF", TestUDF.call)

I get an error message:

not found: value TestUDF

(Tried some other names but also not found)

This approach seems legit. But I couldn't find explanations about both importing jar and accessing udf from it. Am I missing something important? Could anyone help me with this?

Edit:

Maybe TestUDF should be explicitly imported like this

import com.udf.TestUDF

but this import attempt returns an error: object udf is not a member of package com

and registering

spark.udf.register("myUDF", new TestUDF(), StringType)

returns not found: type TestUDF

question from:https://stackoverflow.com/questions/65942156/import-custom-udf-from-jar-to-spark

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...