Working with multiple column RDD in Spark?

Question

Welcome To Ask or Share your Answers For Others

Working with multiple column RDD in Spark?

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:16:07+0000

Multiple column RDD

There's no such thing really, but nor do you need one. You can create an RDD of objects with any type T. This type should model a record, so a record with multiple columns can be of type Array[String], Seq[AnyRef], or whatever best models your data. In Scala, the best choice (for type safety and code readability) is usually using a case class that represents a record.

For example, if your CSV looks like this:

+---------+-------------------+--------+-------------+
| ID      | Name              | Age    | Department  |
+---------+-------------------+--------+-------------+
| 1       | John Smith        | 29     | Dev         |
| 2       | William Black     | 31     | Finance     |
| 3       | Nancy Stevens     | 32     | Dev         |
+---------+-------------------+--------+-------------+

You could, for example:

case class Record(id: Long, name: String, age: Int, department: String)

val input: RDD[String] = sparkContext.textFile("./file")
val parsed: RDD[Record] = input.map(/* split string and create new Record */)

Now you can conveniently perform transformations on this RDD, for example if you want to transform it into a PairRDD with the ID as key, simply call keyBy:

val keyed: RDD[(Int, Record)] = parsed.keyBy(_.id)

That said - even though you're more interested in "batch processing" and not analysis - this could still be achieved more easily (and perhaps perform better, depending on what you do with this RDD) using the DataFrames API - it has good facilities for reading CSVs safely (e.g. spark-csv), and for treating data as columns without the need to create case classes matching each type of record.

Categories

Working with multiple column RDD in Spark?

Working with multiple column RDD in Spark?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags