Task not serializable exception in dataframe map function - scala

I need to convert the datatypes of columns in dataframe and catch all data type conversion failures. I have tried the below option but it throws "Task not serializable".
var errorListBuffer = new ListBuffer[Map[String, String]]()
df.map(r => {
val value = r.getAs(columnName).toString
val index = r.fieldIndex(columnName)
Try {
val cleanValue = value match {
case n if r.isNullAt(index) => null
case x => x.trim
}
new_type match {
case "date" => new SimpleDateFormat("yyyy-MM-dd").format(new SimpleDateFormat(dateFormat).parse(cleanValue))
case "datetime" => new SimpleDateFormat("yyyy-MM-dd hh:mm:ss").format(new SimpleDateFormat(dateFormat).parse(cleanValue))
case "string" => toLower match {
case "1" => cleanValue.toLowerCase
case _ => cleanValue
}
case _ => cleanValue
}
} match {
case Success(v) => org.apache.spark.sql.Row.fromSeq(r.toSeq ++ v)
case Failure(e) => errorListBuffer += Map(
LOADER_COLUMN_NAME -> columnName,
LOADER_LEVEL -> "ERROR",
LOADER_ERROR_MESSAGE -> e.getMessage,
LOADER_RECORD_UUID -> r.getAs(LOADER_UUID).toString)
org.apache.spark.sql.Row.fromSeq(r.toSeq ++ null)
}
})
var dfnew = sqlContext.createDataFrame(df, schema)
Please let me know how I can resolve this.

Related

scala pattern matching on value in sequence of strings

variable someKey can be either "a", "b" or "c".
I can do this:
someKey match {
case "a" => someObjectA.execute()
case "b" => someOther.execute()
case "c" => someOther.execute()
case _ => throw new IllegalArgumentException("Unknown")
}
how can I compress this pattern matching so I can check if someKey with e.g. Seq("b", "c") and if it is in the sequence then replace two lines of pattern match with one?
EDIT:
someKey match {
case "a" => someObjectA.execute()
case someKey if Seq("b","c").contains(someKey) => someOther.execute()
case _ => throw new IllegalArgumentException("Unknown")
}
You can have "or" in the case clause:
someKey match {
case "a" => someObjectA.execute()
case "b"|"c" => someOther.execute()
case _ => ???
}
For this particular case, I'd probably go to
// likely in some companion object so these get constructed once
val otherExecute = { () => someOther.execute() }
val keyedTasks = Map(
"a" -> { () => someObjectA.execute() },
"b" -> otherExecute,
"c" -> otherExecute
)
// no idea on the result type of the execute calls? Unit?
def someFunction(someKey: String) = {
val resultOpt = keyedTasks.get(someKey).map(_())
if (resultOpt.isDefined) resultOpt.get
else throw new IllegalArgumentException("Unknown")
}

How to sort a List[HashMap[String, Any]] with dynamic fields

The following function takes in keys and the events to sort. Keys can be a maximum of size 4, for now.
def sortNow(keys: List[(String, String), events:List[java.util.HashMap[String, Any]]): Any = {
keys.length match {
case 1 => events.sortBy(event => {
val values = getValues(keys, event)
values match {
case List(a: String) => a.asInstanceOf[String]
}
})
case 2 => events.sortBy(event => {
val values = getValues(keys, event)
values match {
case List(a, b) => (a.asInstanceOf[String], b.asInstanceOf[String])
}
})
case 3 => events.sortBy(event => {
val values = getValues(keys, event)
values match {
case List(a, b, c) => (a.asInstanceOf[String], b.asInstanceOf[String], c.asInstanceOf[String])
}
})(Ordering[(String, String, String)].reverse)
case 4 => events.sortBy(event => {
val values = getValues(keys, event)
values match {
case List(a, b, c, d) => (a.asInstanceOf[String], b.asInstanceOf[String], c.asInstanceOf[String], d.asInstanceOf[String])
}
})
case default => throw new NotImplementedException
}
}

Why is it the empty element of a List are not replaced with default value in Scala?

I have the below Scala Code
Code:
object ReplaceNulls {
def main(args:Array[String]) = {
val myList = List("surender", "", null)
val myUpdatedList = myList.map {
case a: String => a
case null => "OTHERS"
case "" => "OTHERS"
}
println(myUpdatedList)
}
This above Code gives me the below Output
List(surender, , OTHERS)
But the expected output is below
List(surender,OTHERS,OTHERS)
What went wrong in my code ?
Because "" is also of type string and will match the first case i.e case a:String. You can try changing the order of case statements
object ReplaceNulls {
def main(args:Array[String])={
val myList = List("surender","",null)
val myUpdatedList = myList.map { x => x match{
case "" =>"OTHERS"
case a:String => a
case null => "OTHERS"
}
}
println(myUpdatedList)
}
}

type mismatch scala.concurrent.Future with Slick & Play Framework

Hello,
def getMessages(code:String,offset:Option[Int]) = Action.async(parse.json){request =>
val forPC = (request.body \ "forPC").asOpt[Boolean]
Talk.findByCode(code).map(_ match{
case Success(talk) =>
val talkId = talk.id
Message.getMessages(talkId=talkId,offset=offset.getOrElse(0), forPC ).map(_ match {
case Seq(Tables.Messages) => Ok("Cool")
case _ => Ok("No")
})
case Failure(e) =>
Logger.error(e.getMessage)
NotAcceptable(Json.toJson(Map(
"error" -> "failed"
)))
})
And in the Model i have :
// talks by Code
def findByCode(code: String, conferenceid : Int) = {
val query = talks.filter(talk => talk.conferenceId === conferenceid && talk.code === code)
db.run(query.result.head.asTry)
}
def getMessages(talkId:Int ,offset:Int, forPC: Option[Boolean]) = {
val forPCVal = forPC.getOrElse(false)
//ordering by talkId because it's faster than timestamp
val query = messages.filter(msg => msg.talkId === talkId && msg.forpc === forPCVal ).sortBy(_.idmessage.desc).drop(offset).take(10).result
db.run(query)
}
So Play is waiting for Result ( Action ) , and it displays this error :
type mismatch;
found : scala.concurrent.Future[play.api.mvc.Result]
required: play.api.mvc.Result
and this :
Can anyone explain to why this error and give me some hints to solve it ?
Thank you
It seems that your Message.getMessages returns a Future[Something] which in turn make your whole match block attempt to return a Future[Result] in the Success case and a Result in the Failure case.
You should try something like the following (notice the flatMap that makes sure you end up with Future[Result] and not a Future[Future[Result]])
Talk.findByCode(code).flatMap(_ match{
case Success(talk) =>
val talkId = talk.id
Message.getMessages(talkId=talkId,offset=offset.getOrElse(0), forPC ).map(_ match {
case Seq(Tables.Messages) => Ok("Cool")
case _ => Ok("No")
})
case Failure(e) =>
Logger.error(e.getMessage)
Future.successful(NotAcceptable(Json.toJson(Map(
"error" -> "failed"
))))
})

Simplification or alternative for this Scala pattern match

I have implemented a Play! 2 QueryStringBindable in Scala for a Range type. A Range consists of either a min or max value or both (of type Float). In my QueryBindable implementation I use the internalBinder to convert the two possible parameters min and max to Option[Either[String, Float]], combine them in a tuple, do a pattern match over this and finally return an Option[Either[String, Range]]. This works but as you can see in the code below the pattern match is very verbose. Is there a more concise way of doing this in Scala?
Maybe leverage higher order functions somehow to get the same result structure back?
import play.api.mvc.QueryStringBindable
case class Range(min: Option[Float], max: Option[Float])
object Range {
implicit def rangeQueryStringBindable(implicit intBinder: QueryStringBindable[Float]) = new QueryStringBindable[Range] {
override def bind(key: String, params: Map[String, Seq[String]]): Option[Either[String, Range]] = {
val minOpt = intBinder.bind("min", params)
val maxOpt = intBinder.bind("max", params)
(minOpt, maxOpt) match {
case (None, None) => None
case (Some(Right(min)), Some(Right(max))) => Some(Right(Range(Some(min), Some(max))))
case (None, Some(Right(max))) => Some(Right(Range(None, Some(max))))
case (Some(Right(min)), None) => Some(Right(Range(Some(min), None)))
case (Some(Left(minError)), Some(Left(maxError))) => Some(Left(minError))
case (Some(Left(minError)), None) => Some(Left(minError))
case (None, Some(Left(maxError))) => Some(Left(maxError))
case (Some(Right(_)), Some(Left(maxError))) => Some(Left(maxError))
case (Some(Left(minError)), Some(Right(_))) => Some(Left(minError))
}
}
override def unbind(key: String, range: Range): String = {
(range.min, range.max) match {
case (Some(min), Some(max)) => intBinder.unbind("min", min) + "&" + intBinder.unbind("max", max)
case (Some(min), None) => intBinder.unbind("min", min)
case (None, Some(max)) => intBinder.unbind("max", max)
case (None, None) => throw new IllegalArgumentException("Range without values makes no sense")
}
}
}
}
(minOpt,maxOpt) match {
case (None,None) => None
case (Some(Left(m)),_) => Some(Left(m))
case (_,Some(Left(m))) => Some(Left(m))
case (_,_) => Some(Right(Range(minOpt.map(_.right.get),maxOpt.map(_.right.get))))
}
With a couple of functions to convert an Option[Either[Error, A]] to Either[Error, Option[A]] you can end up with something a bit cleaner in my view. I also recommend renaming Range since it conflicts with a class with the same name in scala.collections.immutable.
import play.api.mvc.QueryStringBindable
case class RealRange(min: Option[Float], max: Option[Float])
object BindingEitherUtils {
implicit class OptionWithEitherFlatten[A, B](value: Option[Either[A, B]]) {
def flattenRight: Either[A, Option[B]] = {
value.map { either =>
either.right.map{ right => Some(right) }
}.getOrElse{ Right(None) }
}
}
implicit class EitherWithUnflatten[A, B](value: Either[A, Option[B]]) {
def unflattenRight: Option[Either[A, B]] = {
value.fold(left => Some(Left(left)), _.map{ right => Right(right) })
}
}
}
object RealRange {
import BindingEitherUtils._
val minError = "Invalid minimum value for RealRange"
val maxError = "Invalid maximum value for RealRange"
implicit def rangeQueryStringBindable(implicit floatBinder: QueryStringBindable[Float]) = new QueryStringBindable[RealRange] {
override def bind(key: String, params: Map[String, Seq[String]]): Option[Either[String, RealRange]] = {
val minOpt = floatBinder.bind("min", params).flattenRight
val maxOpt = floatBinder.bind("max", params).flattenRight
minOpt.left.map{ _ => minError }.right.flatMap { min =>
maxOpt.left.map{ _ => maxError }.right.flatMap { max =>
(min, max) match {
case (None, None ) =>
Right(None)
case (Some(minVal), Some(maxVal)) if minVal > maxVal =>
Left("Minimum value is larger than maximum value")
case _ =>
Right(Some(RealRange(min, max)))
}
}
}.unflattenRight
}
override def unbind(key: String, range: RealRange): String = {
(range.min, range.max) match {
case (Some(min), Some(max)) => floatBinder.unbind("min", min) + "&" + floatBinder.unbind("max", max)
case (Some(min), None) => floatBinder.unbind("min", min)
case (None, Some(max)) => floatBinder.unbind("max", max)
case (None, None) => throw new IllegalArgumentException("RealRange without values makes no sense")
}
}
}
def test(): Unit = {
val binder = rangeQueryStringBindable
Seq[(String, String)](
("10", "20"),
("10", null),
(null, "10"),
(null, null),
("asd", "asd"),
("10", "asd"),
("asd", "10"),
("asd", null),
(null, "asd"),
("20", "10")
).foreach{ case (min, max) =>
val params = Seq(
Option(min).map{ m => "min" -> Seq(m) },
Option(max).map{ m => "max" -> Seq(m) }
).flatten.toMap
val result = binder.bind("", params)
println(s"$params => $result" )
}
}
}
Which results in:
Map(min -> List(10), max -> List(20)) =>
Some(Right(RealRange(Some(10.0),Some(20.0))))
Map(min -> List(10)) =>
Some(Right(RealRange(Some(10.0),None)))
Map(max -> List(10)) =>
Some(Right(RealRange(None,Some(10.0))))
Map() =>
None
Map(min -> List(asd), max -> List(asd)) =>
Some(Left(Invalid minimum value for RealRange))
Map(min -> List(10), max -> List(asd)) =>
Some(Left(Invalid maximum value for RealRange))
Map(min -> List(asd), max -> List(10)) =>
Some(Left(Invalid minimum value for RealRange))
Map(min -> List(asd)) =>
Some(Left(Invalid minimum value for RealRange))
Map(max -> List(asd)) =>
Some(Left(Invalid maximum value for RealRange))
Map(min -> List(20), max -> List(10)) =>
Some(Left(Minimum value is larger than maximum value))
Yes, it can be simplified.
For the bind method you can place a few wildcards, when you have errors to simplify it. That way you only have 4 permutations for the Range assembly logic. I wouldn't do too much magic here as it would complicate understanding your code.
override def bind(key: String, params: Map[String, Seq[String]]): Option[Either[String, Range]] = {
val minOpt = intBinder.bind("min", params)
val maxOpt = intBinder.bind("max", params)
(minOpt, maxOpt) match {
case (None, None) => None
case (Some(Right(min)), Some(Right(max))) => Some(Right(Range(Some(min), Some(max))))
case (None, Some(Right(max))) => Some(Right(Range(None, Some(max))))
case (Some(Right(min)), None) => Some(Right(Range(Some(min), None)))
// Error handling
case (Some(Left(minError)), _) => Some(Left(minError))
case (_, Some(Left(maxError))) => Some(Left(maxError))
}
}
For the unbind I would use a different approach, by utilizing Option's map function and then combining them into a Iterable you can call mkString and it will do nothing for 1 string and append a & if there are two strings. The code example has types, so you can understand easier.
def unbind(key: String, range: Range): String = {
val minString: Option[String] = range.min.map(min => intBinder.unbind("min", min))
val maxString: Option[String] = range.max.map(max => intBinder.unbind("max", max))
val strings: Iterable[String] = minString ++ maxString
strings match {
case Nil => throw new IllegalArgumentException("Range without values makes no sense")
case _ => strings.mkString("&")
}
}
And if you're into short code:
def unbind(key: String, range: Range): String = {
val minString = range.min.map(min => intBinder.unbind("min", min))
val maxString = range.max.map(max => intBinder.unbind("max", max))
minString ++ maxString match {
case Nil => throw new IllegalArgumentException("Range without values makes no sense")
case strings => strings.mkString("&")
}
}