Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
783 views
in Technique[技术] by (71.8m points)

scala - Filter spark DataFrame on string contains

I am using Spark 1.3.0 and Spark Avro 1.0.0. I am working from the example on the repository page. This following code works well

val df = sqlContext.read.avro("src/test/resources/episodes.avro")
df.filter("doctor > 5").write.avro("/tmp/output")

But what if I needed to see if the doctor string contains a substring? Since we are writing our expression inside of a string. What do I do to do a "contains"?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use contains (this works with an arbitrary sequence):

df.filter($"foo".contains("bar"))

like (SQL like with SQL simple regular expression whith _ matching an arbitrary character and % matching an arbitrary sequence):

df.filter($"foo".like("bar"))

or rlike (like with Java regular expressions):

df.filter($"foo".rlike("bar"))

depending on your requirements. LIKE and RLIKE should work with SQL expressions as well.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...