Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
730 views
in Technique[技术] by (71.8m points)

scala - What is the best way to define custom methods on a DataFrame?

I need to define custom methods on DataFrame. What is the better way to do it? The solution should be scalable, as I intend to define a significant number of custom methods.

My current approach is to create a class (say MyClass) with DataFrame as parameter, define my custom method (say customMethod) in that and define an implicit method which converts DataFrame to MyClass.

implicit def dataFrametoMyClass(df: DataFrame): MyClass = new MyClass(df)

Thus I can call:

dataFrame.customMethod()

Is this the correct way to do it? Open for suggestions.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your way is the way to go (see [1]). Even though I solved it a little different, the approach stays similar:

Possibility 1

Implicits

object ExtraDataFrameOperations {
  object implicits {
    implicit def dFWithExtraOperations(df: DataFrame) = DFWithExtraOperations(df)
  }
}

case class DFWithExtraOperations(df: DataFrame) {
  def customMethod(param: String) : DataFrame = {
    // do something fancy with the df
    // or delegate to some implementation
    //
    // here, just as an illustrating example: do a select
    df.select( df(param) )
  }
}

Usage

To use the new customMethod method on a DataFrame:

import ExtraDataFrameOperations.implicits._
val df = ...
val otherDF = df.customMethod("hello")

Possibility 2

Instead of using an implicit method (see above), you can also use an implicit class:

Implicit class

object ExtraDataFrameOperations {
  implicit class DFWithExtraOperations(df : DataFrame) {
     def customMethod(param: String) : DataFrame = {
      // do something fancy with the df
      // or delegate to some implementation
      //
      // here, just as an illustrating example: do a select
      df.select( df(param) )
    }
  }
}

Usage

import ExtraDataFrameOperations._
val df = ...
val otherDF = df.customMethod("hello")

Remark

In case you want to prevent the additional import, turn the object ExtraDataFrameOperations into an package object and store it in in a file called package.scala within your package.

Official documentation / references

[1] The original blog "Pimp my library" by M. Odersky is available at http://www.artima.com/weblogs/viewpost.jsp?thread=179766


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...