ambiguous reference to overloaded definition in scala udf - scala

I have the following overloaded method which input can be a Option[String] or Option[Seq[String]]:
def parse_emails(email: => Option[String]) : Seq[String] = {
email match {
case Some(e : String) if e.isEmpty() => null
case Some(e : String) => Seq(e)
case _ => null
}
}
def parse_emails(email: Option[Seq[String]]) : Seq[String] = {
email match {
case Some(e : Seq[String]) if e.isEmpty() => null
case Some(e : Seq[String]) => e
case _ => null
}
}
I want to use this method from Spark, so I tried to wrap them as a udf:
def parse_emails_udf = udf(parse_emails _)
But I am getting the following error:
error: ambiguous reference to overloaded definition,
both method parse_emails of type (email: Option[Seq[String]])Seq[String]
and method parse_emails of type (email: => Option[String])Seq[String]
match expected type ?
def parse_emails_udf = udf(parse_emails _)
Is it possible to define a udf which could wrap both alternative?
Or could it be possible to create two udfs with same name each pointing to one of the overloaded options? I tried below approach, but throws another error:
def parse_emails_udf = udf(parse_emails _ : Option[Seq[String]])
error: type mismatch;
found : (email: Option[Seq[String]])Seq[String] <and> (email: => Option[String])Seq[String]
required: Option[Seq[String]]
def parse_emails_udf = udf(parse_emails _ : Option[Seq[String]])

Option[String] and Option[Seq[String]] have the same erasure Option, so even if Spark supported udf overloading it wouldn't work.
What you can do is create one function that accepts anything, then match on the argument and handle the different cases:
def parseEmails(arg: Option[AnyRef]) = arg match {
case Some(x) =>
x match {
case str: String =>
??? // todo
case s: Seq[String] =>
??? // todo
case _ =>
throw new IllegalArgumentException()
}
case None =>
??? // todo
}

Related

flink scala map with dead letter queue

i am trying to make some scala functions that would help making flink map and filter operations that redirect their error to a dead letter queue.
However, i'm struggling with scala's type erasure which prevents me from making them generic. The implementation of mapWithDeadLetterQueue below does not compile.
sealed trait ProcessingResult[T]
case class ProcessingSuccess[T,U](result: U) extends ProcessingResult[T]
case class ProcessingError[T: TypeInformation](errorMessage: String, exceptionClass: String, stackTrace: String, sourceMessage: T) extends ProcessingResult[T]
object FlinkUtils {
// https://stackoverflow.com/questions/1803036/how-to-write-asinstanceofoption-in-scala
implicit class Castable(val obj: AnyRef) extends AnyVal {
def asInstanceOfOpt[T <: AnyRef : ClassTag] = {
obj match {
case t: T => Some(t)
case _ => None
}
}
}
def mapWithDeadLetterQueue[T: TypeInformation,U: TypeInformation](source: DataStream[T], func: (T => U)): (DataStream[U], DataStream[ProcessingError[T]]) = {
val mapped = source.map(x => {
val result = Try(func(x))
result match {
case Success(value) => ProcessingSuccess(value)
case Failure(exception) => ProcessingError(exception.getMessage, exception.getClass.getName, exception.getStackTrace.mkString("\n"), x)
}
} )
val mappedSuccess = mapped.flatMap((x: ProcessingResult[T]) => x.asInstanceOfOpt[ProcessingSuccess[T,U]]).map(x => x.result)
val mappedFailure = mapped.flatMap((x: ProcessingResult[T]) => x.asInstanceOfOpt[ProcessingError[T]])
(mappedSuccess, mappedFailure)
}
}
I get:
[error] FlinkUtils.scala:35:36: overloaded method value flatMap with alternatives:
[error] [R](x$1: org.apache.flink.api.common.functions.FlatMapFunction[Product with Serializable with ProcessingResult[_ <: T],R], x$2: org.apache.flink.api.common.typeinfo.TypeInformation[R])org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator[R] <and>
[error] [R](x$1: org.apache.flink.api.common.functions.FlatMapFunction[Product with Serializable with ProcessingResult[_ <: T],R])org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator[R]
[error] cannot be applied to (ProcessingResult[T] => Option[ProcessingSuccess[T,U]])
[error] val mappedSuccess = mapped.flatMap((x: ProcessingResult[T]) => x.asInstanceOfOpt[ProcessingSuccess[T,U]]).map(x => x.result)
Is there a way to make this work ?
Ok, i'm going to answer my own question. I made a couple of mistakes:
First of all, i accidentically included the java DataStream class instead of the scala DataStream class (this happens all the time). The java variant obviously doesn't accept a scala lambda for map/filter/flatmap
Second, sealed traits are not supported by flinks serialisation. There is a project that should solve it but I didn't try it yet.
Solution: first i didn't use sealed trait but a simple case class with two Options (bit less expressive, but still works):
case class ProcessingError[T](errorMessage: String, exceptionClass: String, stackTrace: String, sourceMessage: T)
case class ProcessingResult[T: TypeInformation, U: TypeInformation](result: Option[U], error: Option[ProcessingError[T]])
Then, i could have everything working like so:
object FlinkUtils {
def mapWithDeadLetterQueue[T: TypeInformation: ClassTag,U: TypeInformation: ClassTag]
(source: DataStream[T], func: (T => U)):
(DataStream[U], DataStream[ProcessingError[T]]) = {
implicit val typeInfo = TypeInformation.of(classOf[ProcessingResult[T,U]])
val mapped = source.map((x: T) => {
val result = Try(func(x))
result match {
case Success(value) => ProcessingResult[T, U](Some(value), None)
case Failure(exception) => ProcessingResult[T, U](None, Some(
ProcessingError(exception.getMessage, exception.getClass.getName,
exception.getStackTrace.mkString("\n"), x)))
}
} )
val mappedSuccess = mapped.flatMap((x: ProcessingResult[T,U]) => x.result)
val mappedFailure = mapped.flatMap((x: ProcessingResult[T,U]) => x.error)
(mappedSuccess, mappedFailure)
}
}
the flatMap and filter functions look very similar, but they use a ProcessingResult[T,List[T]] and a ProcessingResult[T,T] respectively.
I use the functions like this:
val (result, errors) = FlinkUtils.filterWithDeadLetterQueue(input, (x: MyMessage) => {
x.`type` match {
case "something" => throw new Exception("how how how")
case "something else" => false
case _ => true
}
})

Generic function to Spary JSON data types, throws type mismatch error

I'm using spray-json and I need to parse the given request body (PATCH, POST), request body attributes can have following possibilities represented by Either[Unit.type, Option[A]]
value Not given Left[Unit.type]
value=null Null Right[None]
value=XXX Some value is provided Right[Some(value)]
Using the above possibilities I need to create a entity from the request body. While parsing I need to validate each field with some business logic (String length, integer range ...).
I have a following function for the business logic validation.
def validateValue[T](fieldName: String,
maybeValue: Try[T],
businessValidation: T => Boolean): Option[T] = {
maybeValue match {
case Success(value) if businessValidation(value) => Some(value)
case _ => None
}
}
Similarly another function readFieldWithValidation, here I will be parsing each attribute based on the input type and apply the business validation.
def readFieldWithValidation[S, T](fields: Map[String, JsValue], fieldName: String, businessValidation: T => Boolean)(
parse: S => T
): Option[T] = {
fields.get(fieldName) match {
case None => None
case Some(jsValue) =>
jsValue match {
case jsString: JsString =>
validateValue(fieldName, Try(parse(jsString.value)), businessValidation)
case JsNumber(jsNumber) =>
validateValue(fieldName, Try(parse(jsNumber.intValue)), businessValidation)
case _ => None
}
}
}
I have S ( Source ) and T ( Target ) which is used for given a JsValue returns T type. Here I only care about JsString and JsNumber.
The above lines of code is giving type mismatch error,
<console>:112: error: type mismatch;
found : jsString.value.type (with underlying type String)
required: S
validateValue(fieldName, Try(parse(jsString.value)), businessValidation)
^
<console>:114: error: type mismatch;
found : Int
required: S
validateValue(fieldName, Try(parse(jsNumber.intValue)), businessValidation)
Can someone help me how to overcome this error?
This is how I can use above function
val attributes = Map("String" -> JsString("ThisIsString"), "Int" -> JsNumber(23))
def stringLengthConstraint(min: Int, max: Int)(value: String) = value.length > min && value.length < max
readFieldWithValidation[JsString, String](attributes, "String", stringLengthConstraint(1, 10))(_.toString)
Your example is still not quite clear because it does not show the role of parse and actually looks contradictory to the other code: particularly you specify the generic parameter S as JsString in readFieldWithValidation[JsString, String] but given current (borken) readFieldWithValidation implementation your parse argument is probably expected to be of type String => String because jsString.value is String.
Anyway here is a piece of code that seem to implement something that is hopefully sufficiently close to what you want:
trait JsValueExtractor[T] {
def getValue(jsValue: JsValue): Option[T]
}
object JsValueExtractor {
implicit val decimalExtractor = new JsValueExtractor[BigDecimal] {
override def getValue(jsValue: JsValue) = jsValue match {
case JsNumber(jsNumber) => Some(jsNumber)
case _ => None
}
}
implicit val intExtractor = new JsValueExtractor[Int] {
override def getValue(jsValue: JsValue) = jsValue match {
case JsNumber(jsNumber) => Some(jsNumber.intValue)
case _ => None
}
}
implicit val doubleExtractor = new JsValueExtractor[Double] {
override def getValue(jsValue: JsValue) = jsValue match {
case JsNumber(jsNumber) => Some(jsNumber.doubleValue)
case _ => None
}
}
implicit val stringExtractor = new JsValueExtractor[String] {
override def getValue(jsValue: JsValue) = jsValue match {
case JsString(string) => Some(string)
case _ => None
}
}
}
def readFieldWithValidation[S, T](fields: Map[String, JsValue], fieldName: String, businessValidation: T => Boolean)(parse: S => T)(implicit valueExtractor: JsValueExtractor[S]) = {
fields.get(fieldName)
.flatMap(jsValue => valueExtractor.getValue(jsValue))
.flatMap(rawValue => Try(parse(rawValue)).toOption)
.filter(businessValidation)
}
and usage example:
def test(): Unit = {
val attributes = Map("String" -> JsString("ThisIsString"), "Int" -> JsNumber(23))
def stringLengthConstraint(min: Int, max: Int)(value: String) = value.length > min && value.length < max
val value = readFieldWithValidation[String, String](attributes, "String", stringLengthConstraint(1, 10))(identity)
println(value)
}
Your current code uses Option[T] as your return type. If I were using a code like this I'd probably added some error logging and/or handling for a case where the code contains a bug and attributes do contain a value for key fieldName but of some different, unexpected type (like JsNumber instead of JsString).
Update
It is not clear from your comment whether you are satisfied with my original answer or want to add some error handling. If you want to report the type mismatch errors, and since you are using cats, something like ValidatedNel is an obvious choice:
type ValidationResult[A] = ValidatedNel[String, A]
trait JsValueExtractor[T] {
def getValue(jsValue: JsValue, fieldName: String): ValidationResult[T]
}
object JsValueExtractor {
implicit val decimalExtractor = new JsValueExtractor[BigDecimal] {
override def getValue(jsValue: JsValue, fieldName: String): ValidationResult[BigDecimal] = jsValue match {
case JsNumber(jsNumber) => jsNumber.validNel
case _ => s"Field '$fieldName' is expected to be decimal".invalidNel
}
}
implicit val intExtractor = new JsValueExtractor[Int] {
override def getValue(jsValue: JsValue, fieldName: String): ValidationResult[Int] = jsValue match {
case JsNumber(jsNumber) => Try(jsNumber.toIntExact) match {
case scala.util.Success(intValue) => intValue.validNel
case scala.util.Failure(e) => s"Field $fieldName is expected to be int".invalidNel
}
case _ => s"Field '$fieldName' is expected to be int".invalidNel
}
}
implicit val doubleExtractor = new JsValueExtractor[Double] {
override def getValue(jsValue: JsValue, fieldName: String): ValidationResult[Double] = jsValue match {
case JsNumber(jsNumber) => jsNumber.doubleValue.validNel
case _ => s"Field '$fieldName' is expected to be double".invalidNel
}
}
implicit val stringExtractor = new JsValueExtractor[String] {
override def getValue(jsValue: JsValue, fieldName: String): ValidationResult[String] = jsValue match {
case JsString(string) => string.validNel
case _ => s"Field '$fieldName' is expected to be string".invalidNel
}
}
}
def readFieldWithValidation[S, T](fields: Map[String, JsValue], fieldName: String, businessValidation: T => Boolean)
(parse: S => T)(implicit valueExtractor: JsValueExtractor[S]): ValidationResult[T] = {
fields.get(fieldName) match {
case None => s"Field '$fieldName' is required".invalidNel
case Some(jsValue) => valueExtractor.getValue(jsValue, fieldName)
.andThen(rawValue => Try(parse(rawValue).validNel).getOrElse("".invalidNel))
.andThen(parsedValue => if (businessValidation(parsedValue)) parsedValue.validNel else s"Business validation for field '$fieldName' has failed".invalidNel)
}
}
And the test example remains the same. Probably in your real code you want to use something more specific than just String for errors but that's up to you.

Scala Spark udf java.lang.UnsupportedOperationException

I have created this currying function to check for null values for endDateStr inside an udf, the code is as follows:(Type of col x is ArrayType[TimestampType]):
def _getCountAll(dates: Seq[Timestamp]) = Option(dates).map(_.length)
def _getCountFiltered(endDate: Timestamp)(dates: Seq[Timestamp]) = Option(dates).map(_.count(!_.after(endDate)))
val getCountUDF = udf((endDateStr: Option[String]) => {
endDateStr match {
case None => _getCountAll _
case Some(value) => _getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
}
})
df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"))(col("x")))
But I am getting this exception while executing:
java.lang.UnsupportedOperationException: Schema for type
Seq[java.sql.Timestamp] => Option[Int] is not supported
Can anyone please help me to figure out my mistake?
You cannot curry udf like this. If you want curry-like behavior you should return udf from the outer function:
def getCountUDF(endDateStr: Option[String]) = udf {
endDateStr match {
case None => _getCountAll _
case Some(value) =>
_getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
}
}
df.withColumn("distinct_dx_count", getCountUDF(Some("2009-09-10"))(col("x")))
otherwise just drop currying and provide both arguments at the same time:
val getCountUDF = udf((endDateStr: String, dates: Seq[Timestamp]) =>
endDateStr match {
case null => _getCountAll(dates)
case _ =>
_getCountFiltered(Timestamp.valueOf(endDateStr + " 23:59:59"))(dates)
}
)
df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"), col("x")))

Getting (Long, String) from (String) => Try[(Long, String)]

I have a method that return (String) => Try[(Long, String)] type and I want to get (Long, String). Any suggestion?
I thought map/flatMap will help but looks like they doesn't.
Update
def someMethod():(Long, String) = {
val result: (String) => Try[(Long, String)] = someOperation()
//Need to get (Long, String) from result
}
There are several options
val exceptional: Try[(Long, String)] = ???
val default: (Long, String) = (0, "")
Providing fallback value
exceptional.getOrElse(default)
handling exception and then safely get
exceptional.recover { case exception => default }.get
or using pattern matching
exceptional match {
case Success(v) => v
case Failure(exception) => default
}

enrich PartialFunction with unapply functionality

PartialFunction is a natural extractor, its lift method provides exact extractor functionality. So it would be very convenient to use partial functions as extractors. That would allow to combine pattern matching expressions in more complicated way than plain orElse that is available for PartialFunction
So I tried to use pimp my library approach and had failed
Here goes update: As #Archeg shown, there is another approach to conversion that works. So I'm including it to the provided code.
I'm tried also some more complex solutions and they failed
object Test {
class UnapplyPartial[-R, +T](val fun : PartialFunction[R,T]) {
def unapply(source : R) : Option[T] = fun.lift(source)
}
implicit def toUnapply[R,T](fun : PartialFunction[R,T]) : UnapplyPartial[R,T] = new UnapplyPartial(fun)
class PartialFunOps[-R, +T](val fun : PartialFunction[R,T]) {
def u : UnapplyPartial[R, T] = new UnapplyPartial(fun)
}
implicit def toPartFunOps[R,T](fun : PartialFunction[R,T]) : PartialFunOps[R,T] = new PartialFunOps(fun)
val f : PartialFunction[String, Int] = {
case "bingo" => 0
}
val u = toUnapply(f)
def g(compare : String) : PartialFunction[String, Int] = {
case `compare` => 0
}
// error while trying to use implicit conversion
def testF(x : String) : Unit = x match {
case f(i) => println(i)
case _ => println("nothing")
}
// external explicit conversion is Ok
def testU(x : String) : Unit = x match {
case u(i) => println(i)
case _ => println("nothing")
}
// embedded explicit conversion fails
def testA(x : String) : Unit = x match {
case toUnapply(f)(i) => println(i)
case _ => println("nothing")
}
// implicit explicit conversion is Ok
def testI(x : String) : Unit = x match {
case f.u(i) => println(i)
case _ => println("nothing")
}
// nested case sentences fails
def testInplace(x : String) : Unit = x match {
case { case "bingo" => 0 }.u(i) => println(i)
case _ => println("nothing")
}
// build on the fly fails
def testGen(x : String) : Unit = x match {
case g("bingo").u(i) => println(i)
case _ => println("nothing")
}
// implicit conversion without case is also Ok
def testFA(x : String) : Option[Int] =
f.unapply(x)
}
I got the following error messages:
UnapplyImplicitly.scala:16: error: value f is not a case class, nor does it have an unapply/unapplySeq member
case f(i) => println(i)
UnapplyImplicitly.scala:28: error: '=>' expected but '(' found.
case toUnapply(f)(i) => println(i)
This errors may be avoided with supposed form as TestI shown. But I'm curious if it is possible to avoid testInplace error:
UnapplyImplicitly.scala:46: error: illegal start of simple pattern
case { case "bingo" => 0 }.u(i) => println(i)
^
UnapplyImplicitly.scala:47: error: '=>' expected but ';' found.
case _ => println("nothing")
UnapplyImplicitly.scala:56: error: '=>' expected but '.' found.
case g("bingo").u(i) => println(i)
^
I'm not sure what you are trying to achieve in the end, but as far as I understand extractors should always be objects, there is no way you can get it with a class. It is actually called Extractor Object in the documentation. Consider this:
class Wrapper[R, T](fun: PartialFunction[R, T]) {
object PartialExtractor {
def unapply(p: R): Option[T] = fun.lift(p)
}
}
implicit def toWrapper[R,T](fun : PartialFunction[R,T]) : Wrapper[R, T] = new Wrapper(fun)
val f : PartialFunction[String, Int] = {
case "bingo" => 0
}
def testFF(x : String) : Unit = x match {
case f.PartialExtractor(i) => println(i)
case _ => println("nothing")
}
Update
The best I could think of:
def testInplace(x : String) : Unit ={
val ff = { case "bingo" => 0 } : PartialFunction[String, Int]
x match {
case ff.PartialExtractor(Test(i)) => println(i)
case "sd" => println("nothing") }
}