Let's say you have this data setup (so that results are reproducible):
// declaring data types
case class Company(cName: String, cId: String, details: String)
case class Employee(name: String, id: String, email: String, company: Company)
// setting up example data
val e1 = Employee("n1", null, "[email protected]", Company("c1", "1", "d1"))
val e2 = Employee("n2", "2", "[email protected]", Company("c1", "1", "d1"))
val e3 = Employee("n3", "3", "[email protected]", Company("c1", "1", "d1"))
val e4 = Employee("n4", "4", "[email protected]", Company("c2", "2", "d2"))
val e5 = Employee("n5", null, "[email protected]", Company("c2", "2", "d2"))
val e6 = Employee("n6", "6", "[email protected]", Company("c2", "2", "d2"))
val e7 = Employee("n7", "7", "[email protected]", Company("c3", "3", "d3"))
val e8 = Employee("n8", "8", "[email protected]", Company("c3", "3", "d3"))
val employees = Seq(e1, e2, e3, e4, e5, e6, e7, e8)
val df = sc.parallelize(employees).toDF
Data is:
+----+----+---------+---------+
|name| id| email| company|
+----+----+---------+---------+
| n1|null|[email protected]|[c1,1,d1]|
| n2| 2|[email protected]|[c1,1,d1]|
| n3| 3|[email protected]|[c1,1,d1]|
| n4| 4|[email protected]|[c2,2,d2]|
| n5|null|[email protected]|[c2,2,d2]|
| n6| 6|[email protected]|[c2,2,d2]|
| n7| 7|[email protected]|[c3,3,d3]|
| n8| 8|[email protected]|[c3,3,d3]|
+----+----+---------+---------+
Now to filter employees with null
ids, you will do --
df.filter("id is null").show
which will correctly show you following:
+----+----+---------+---------+
|name| id| email| company|
+----+----+---------+---------+
| n1|null|[email protected]|[c1,1,d1]|
| n5|null|[email protected]|[c2,2,d2]|
+----+----+---------+---------+
Coming to the second part of your question, you can replace the null
ids with 0 and other values with 1 with this --
df.withColumn("id", when($"id".isNull, 0).otherwise(1)).show
This results in:
+----+---+---------+---------+
|name| id| email| company|
+----+---+---------+---------+
| n1| 0|[email protected]|[c1,1,d1]|
| n2| 1|[email protected]|[c1,1,d1]|
| n3| 1|[email protected]|[c1,1,d1]|
| n4| 1|[email protected]|[c2,2,d2]|
| n5| 0|[email protected]|[c2,2,d2]|
| n6| 1|[email protected]|[c2,2,d2]|
| n7| 1|[email protected]|[c3,3,d3]|
| n8| 1|[email protected]|[c3,3,d3]|
+----+---+---------+---------+