mercredi 8 février 2017

Matching Option in Spark SQL Row.unapply

Vote count: 0

I wonder whether I can match optional (nullable) fields using Row's unapply method in Spark 1.6 Scala API:

Consider the following example ( for simplicity, I only use 1 field):

case class MyRow(id: Option[Int])
val data = Seq(
      MyRow(Some(1)),
      MyRow(Some(2)),
      MyRow(None))

val df = sc.parallelize(data).toDF()
df.show()

+----+
|  id|
+----+
|   1|
|   2|
|null|
+----+

I can do:

val myUDF1 = udf((r: Row) =>
  r match {
    case r:Row if !r.isNullAt(0) => r.getInt(0)
    case r:Row if r.isNullAt(0) => 999
  }
)

df.withColumn("udf_result", myUDF(struct($"id")))

But this is kind uf ugly.

I've found out that I can also do:

val myUDF = udf((r: Row) =>
  r match {
    case Row(i: Int) => i
    case _ => 999
  }
)

Which gives the same result.

Both the above examples become ugly if Row consists of multiple nullable fields.

What I would like to have is this:

val myUDF = udf((r: Row) =>
  r match {
    case Row(i: Option[Int]) => i.getOrElse(999)
  }
)

Unforunately, this does not work, I get a

scala.MatchError

Is there a way to do this?

asked 26 secs ago

Let's block ads! (Why?)



Matching Option in Spark SQL Row.unapply

Aucun commentaire:

Enregistrer un commentaire