Scala Spark udf java.lang.UnsupportedOperationException - scala

I have created this currying function to check for null values for endDateStr inside an udf, the code is as follows:(Type of col x is ArrayType[TimestampType]):
def _getCountAll(dates: Seq[Timestamp]) = Option(dates).map(_.length)
def _getCountFiltered(endDate: Timestamp)(dates: Seq[Timestamp]) = Option(dates).map(_.count(!_.after(endDate)))
val getCountUDF = udf((endDateStr: Option[String]) => {
endDateStr match {
case None => _getCountAll _
case Some(value) => _getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
}
})
df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"))(col("x")))
But I am getting this exception while executing:
java.lang.UnsupportedOperationException: Schema for type
Seq[java.sql.Timestamp] => Option[Int] is not supported
Can anyone please help me to figure out my mistake?

You cannot curry udf like this. If you want curry-like behavior you should return udf from the outer function:
def getCountUDF(endDateStr: Option[String]) = udf {
endDateStr match {
case None => _getCountAll _
case Some(value) =>
_getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
}
}
df.withColumn("distinct_dx_count", getCountUDF(Some("2009-09-10"))(col("x")))
otherwise just drop currying and provide both arguments at the same time:
val getCountUDF = udf((endDateStr: String, dates: Seq[Timestamp]) =>
endDateStr match {
case null => _getCountAll(dates)
case _ =>
_getCountFiltered(Timestamp.valueOf(endDateStr + " 23:59:59"))(dates)
}
)
df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"), col("x")))

Related

ambiguous reference to overloaded definition in scala udf

I have the following overloaded method which input can be a Option[String] or Option[Seq[String]]:
def parse_emails(email: => Option[String]) : Seq[String] = {
email match {
case Some(e : String) if e.isEmpty() => null
case Some(e : String) => Seq(e)
case _ => null
}
}
def parse_emails(email: Option[Seq[String]]) : Seq[String] = {
email match {
case Some(e : Seq[String]) if e.isEmpty() => null
case Some(e : Seq[String]) => e
case _ => null
}
}
I want to use this method from Spark, so I tried to wrap them as a udf:
def parse_emails_udf = udf(parse_emails _)
But I am getting the following error:
error: ambiguous reference to overloaded definition,
both method parse_emails of type (email: Option[Seq[String]])Seq[String]
and method parse_emails of type (email: => Option[String])Seq[String]
match expected type ?
def parse_emails_udf = udf(parse_emails _)
Is it possible to define a udf which could wrap both alternative?
Or could it be possible to create two udfs with same name each pointing to one of the overloaded options? I tried below approach, but throws another error:
def parse_emails_udf = udf(parse_emails _ : Option[Seq[String]])
error: type mismatch;
found : (email: Option[Seq[String]])Seq[String] <and> (email: => Option[String])Seq[String]
required: Option[Seq[String]]
def parse_emails_udf = udf(parse_emails _ : Option[Seq[String]])
Option[String] and Option[Seq[String]] have the same erasure Option, so even if Spark supported udf overloading it wouldn't work.
What you can do is create one function that accepts anything, then match on the argument and handle the different cases:
def parseEmails(arg: Option[AnyRef]) = arg match {
case Some(x) =>
x match {
case str: String =>
??? // todo
case s: Seq[String] =>
??? // todo
case _ =>
throw new IllegalArgumentException()
}
case None =>
??? // todo
}

Scala get value from lambda

I want to get value from function that passed as parameter and returns Option[Int], after that if I have None throw an exception and in any other case return value
I tried to do like this:
def foo[T](f: T => Option[Int]) = {
def helper(x: T) = f(x)
val res = helper _
res match {
case None => throw new Exception()
case Some(z) => z
}
I call it like this:
val test = foo[String](myFunction(_))
test("Some string")
I have compilation error with mismatched types in match section (Some[A] passed - [T] => Option[Int] required)
As I understood res variable is reference to the function and I cannot match it with optional either call get\gerOrElse methods.
Moreover I probably just dont get how the underscore works and doing something really wrong, I'm using it here to pass a something as parameter to function f, can you explain me where I made a mistake?
helper is a method taking a T and returning an Option[Int].
res is a function T => Option[Int].
Difference between method and function in Scala
You can't match a function T => Option[Int] with None or Some(z).
You should have an Option[Int] (for example the function applied to some T) to make such matching.
Probably you would like to have
def foo[T](f: T => Option[Int]) = {
def helper(x: T) = f(x)
val res = helper _
(t: T) => res(t) match {
case None => throw new Exception()
case Some(z) => z
}
}
or just
def foo[T](f: T => Option[Int]): T => Int = {
t => f(t) match {
case None => throw new Exception()
case Some(z) => z
}
}
or
def foo[T](f: T => Option[Int]): T => Int =
t => f(t).getOrElse(throw new Exception())

in scala why does for yield return option instead of string

I'm new to scala I'm trying to understand for/yield and don't understand why the following code returns an option not a String
val opString: Option[String] = Option("test")
val optionStr : Option[String] = for {
op <- opString
} yield {
opString match {
case Some(s) => s
case _ => "error"
}
}
A for-expression is syntactic sugar for a series of map, flatMap and withFilter calls. Your specific for-expression is translated to something like this:
opString.map(op => opString match {
case Some(s) => s
case _ => "error"
})
As you can see, your expression will just map over opString and not unwrap it in any way.
Desugared expression for your for ... yield expression is:
val optionStr = opString.map {
op =>
opString match {
case Some(s) => s
case _ => "error"
}
}
The type of opString match {...} is String, so the result type of applying map (String => String) to Option[String] is Option[String]
What you're looking for is getOrElse:
opString.getOrElse("error")
This is equivalent to:
opString match {
case Some(s) => s
case _ => "error"
}

Scala: How to simplify nested pattern matching statements

I am writing a Hive UDF in Scala (because I want to learn scala). To do this, I have to override three functions: evaluate, initialize and getDisplayString.
In the initialize function I have to:
Receive an array of ObjectInspector and return an ObjectInspector
Check if the array is null
Check if the array has the correct size
Check if the array contains the object of the correct type
To do this, I am using pattern matching and came up with the following function:
override def initialize(genericInspectors: Array[ObjectInspector]): ObjectInspector = genericInspectors match {
case null => throw new UDFArgumentException(functionNameString + ": ObjectInspector is null!")
case _ if genericInspectors.length != 1 => throw new UDFArgumentException(functionNameString + ": requires exactly one argument.")
case _ => {
listInspector = genericInspectors(0) match {
case concreteInspector: ListObjectInspector => concreteInspector
case _ => throw new UDFArgumentException(functionNameString + ": requires an input array.")
}
PrimitiveObjectInspectorFactory.getPrimitiveWritableObjectInspector(listInspector.getListElementObjectInspector.asInstanceOf[PrimitiveObjectInspector].getPrimitiveCategory)
}
}
Nevertheless, I have the impression that the function could be made more legible and, in general, prettier since I don't like to have code with too many levels of indentation.
Is there an idiomatic Scala way to improve the code above?
It's typical for patterns to include other patterns. The type of x here is String.
scala> val xs: Array[Any] = Array("x")
xs: Array[Any] = Array(x)
scala> xs match {
| case null => ???
| case Array(x: String) => x
| case _ => ???
| }
res0: String = x
The idiom for "any number of args" is "sequence pattern", which matches arbitrary args:
scala> val xs: Array[Any] = Array("x")
xs: Array[Any] = Array(x)
scala> xs match { case Array(x: String) => x case Array(_*) => ??? }
res2: String = x
scala> val xs: Array[Any] = Array(42)
xs: Array[Any] = Array(42)
scala> xs match { case Array(x: String) => x case Array(_*) => ??? }
scala.NotImplementedError: an implementation is missing
at scala.Predef$.$qmark$qmark$qmark(Predef.scala:230)
... 32 elided
scala> Array("x","y") match { case Array(x: String) => x case Array(_*) => ??? }
scala.NotImplementedError: an implementation is missing
at scala.Predef$.$qmark$qmark$qmark(Predef.scala:230)
... 32 elided
This answer should not be construed as advocating matching your way back to type safety.

Can't deal with UUID in Play (jdbc)

I'm using joda DateTime and UUID in my Play project. I'm struggling trying to put and get them from Postgresql:
import org.joda.time.DateTime
case class MyClass(id: Pk[UUID], name: String, addedAt: DateTime)
object MyClass {
val simple =
SqlParser.get[Pk[UUID]]("id") ~
SqlParser.get[String]("name") ~
SqlParser.get[DateTime]("added_at") map {
case id ~ name ~ addedAt => MyClass(id, name, addedAt)
}
implicit def rowToId = Column.nonNull[UUID] { (value, meta) =>
maybeValueToUUID(value) match {
case Some(uuid) => Right(uuid)
case _ => Left(TypeDoesNotMatch( s"Cannot convert $value: ${value.asInstanceOf[Any].getClass} to UUID"))
}
}
implicit def idToStatement = new ToStatement[UUID] {
def set(s: PreparedStatement, index: Int, aValue: UUID): Unit = s setObject(index, toByteArray(aValue))
}
def getSingle(id: UUID): Option[MyClass] = {
DB withConnection {
implicit con =>
SQL("SELECT my_table.id, my_table.name, my_table.added_at FROM my_table WHERE id = {id}")
.on('id -> id)
.as(MyClass.simple.*)
} match {
case List(x) => Some(x)
case _ => None
}
}
Implicit functions for joda DateTime are omited because they don't cause any error at this point. What causes an error is a getSingle(...) - conversion from and to UUID. The error is
org.postgresql.util.PSQLException: operator does not exist: uuid = bytea
Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.
4 helper functions:
private def maybeValueToUUID(value: Any): Option[UUID] = maybeValueToByteArray(value) match {
case Some(bytes) => Some(fromByteArray(bytes))
case _ => None
}
private def maybeValueToByteArray(value: Any): Option[Array[Byte]] =
try {
value match {
case bytes: Array[Byte] => Some(bytes)
case clob: Clob => None //todo
case blob: Blob => None //todo
case _ => None
}
} catch {
case e: Exception => None
}
def toByteArray(uuid: UUID) = {
val buffer = ByteBuffer.wrap(new Array[Byte](16))
buffer putLong uuid.getMostSignificantBits
buffer putLong uuid.getLeastSignificantBits
buffer.array
}
def fromByteArray(b: Array[Byte]) = {
val buffer = ByteBuffer.wrap(b)
val high = buffer.getLong
val low = buffer.getLong
new UUID(high, low)
}
Note that a record I'm trying to retrieve exists and has a correct format.