Scala Pickling usage MyObject -> Array[Byte] -> MyObject - scala

I was trying to get into the new Scala Pickling library that was presented at the ScalaDays 2013: Scala Pickling
What I am really missing are some simple examples how the library is used.
I understood that I can pickle some object an unpickle it again like that:
import scala.pickling._
val pckl = List(1, 2, 3, 4).pickle
val lst = pckl.unpickle[List[Int]]
In this example, pckl is of the type Pickle. What exactly is the use of this type and how can I get for example get an Array[Byte] of it?

If you want wanted to pickle into bytes, then the code will look like this:
import scala.pickling._
import binary._
val pckl = List(1, 2, 3, 4).pickle
val bytes = pckl.value
If you wanted json, the code would look almost the exact same with a minor change of imports:
import scala.pickling._
import json._
val pckl = List(1, 2, 3, 4).pickle
val json = pckl.value
How the object is pickled depends on the import type that you chose under scala.pickling (being either binary or json). Import binary and the value property is an Array[Byte]. Import json and it's a json String.

Related

Change Java List to Scala Seq?

I have the following list from my configuration:
val markets = Configuration.getStringList("markets");
To create a sequence out of it I write this code:
JavaConverters.asScalaIteratorConverter(markets.iterator()).asScala.toSeq
I wish I could do it in a less verbose way, such as:
markets.toSeq
And then from that list I get the sequence. I will have more configuration in the near future; is there a solution that provides this kind of simplicity?
I want a sequence regardless of the configuration library I am using. I don't want to have the stated verbose solution with the JavaConverters.
JavaConversions is deprecated since Scala 2.12.0. Use JavaConverters; you can import scala.collection.JavaConverters._ to make it less verbose:
import scala.collection.JavaConverters._
val javaList = java.util.Arrays.asList("one", "two")
val scalaSeq = javaList.asScala.toSeq
Yes. Just import implicit conversions:
import java.util
import scala.collection.JavaConversions._
val jlist = new util.ArrayList[String]()
jlist.toSeq

scala json4s, can't convert LocalDate

I'm having an issue with org.json4s (scala), joda.time.LocalDate and org.json4s.ext.JodaTimeSerializers. Given that JodaTimeSerializers.all has a LocalDate conversion in it, i was hoping that i could do the following, but I get the exception shown after
scala> import org.json4s.JString
import org.json4s.JString
scala> import org.joda.time.LocalDate
import org.joda.time.LocalDate
scala> import org.json4s.ext.JodaTimeSerializers
import org.json4s.ext.JodaTimeSerializers
scala> import org.json4s._
import org.json4s._
scala> implicit val formats: Formats = DefaultFormats ++ JodaTimeSerializers.all
formats: org.json4s.Formats = org.json4s.Formats$$anon$3#693d3d7f
scala> val jDate = JString("2016-01-26")
jDate: org.json4s.JsonAST.JString = JString(2016-01-26)
scala> jDate.extract[LocalDate]
org.json4s.package$MappingException: Can't convert JString(2016-01-26) to class org.joda.time.LocalDate
Other the other hand, this works (not surprisingly)
scala> val jodaDate = LocalDate.parse(jDate.values)
I've tried to create a custom Serializer, which never gets called b/c it falls into the JodaSerializer realm it seems. I have also created a custom Deserializer that will work with Java.time.LocalDate (int and bytes from strings), but java.time.LocalDate messes with some other code which is likely a different question...this one is: I'm looking for clues by JodaTimeSerializers.all can not parse JString(2016-01-26), or any date string.
The top of the exception is: org.json4s.package$MappingException:
Can't convert JString(2016-01-01) to class org.joda.time.LocalDate (JodaTimeSerializers.scala:126)
Edit
This is still biting me, so dug a bit further and its reproducible with the following.
import org.joda.time.LocalDate
import org.json4s.ext.JodaTimeSerializers
import org.json4s._
implicit val formats: Formats = DefaultFormats ++ JodaTimeSerializers.all
import org.joda.time.LocalDate
case class MyDate(myDate: LocalDate)
val stringyDate =
"""
{
"myDate" : "2016-01-01"
}
"""
import org.json4s.jackson.JsonMethods.parse
parse(stringyDate).extract[MyDate]
org.json4s.package$MappingException: No usable value for myDate
Can't convert JString(2016-01-01) to class org.joda.time.LocalDate
This seems to happen b/c on line 125 of JodaTimeSerializers.scala, its not a JObject, it is a JString, so it falls into the value case on line 126, which throws the error.
Adding this here in case it bites someone else and hopefully get some assistance fixing it...but now i'm late. I have moved the code locally hopefully to come up with a fix tomorrow.
This works. I define a custom serializer for LocalDate.
import org.json4s.JString
import org.joda.time.LocalDate
import org.json4s._
case object LocalDateSerializer
extends CustomSerializer[LocalDate](
format =>
({
case JString(s) => LocalDate.parse(s)
}, Map() /* TO BE IMPLEMENTED */)
)
implicit val formats: Formats = DefaultFormats + LocalDateSerializer
val jDate = JString("2016-01-26")
jDate.extract[LocalDate] // res173: org.joda.time.LocalDate = 2016-01-26
The new serializers are included in the library, but not in the default formats:
implicit val formats: Formats = DefaultFormats ++ JavaTimeSerializers.all

Scala Queue import error

If I import scala.collection._ and create a queue:
import scala.collection._
var queue = new Queue[Component]();
I get the following error:
error: not found: type Queue
However, if I also add
import scala.collection.mutable.Queue
The error will disappear.
Why is this happening?
Shouldn't scala.collection._ contain scala.collection.mutable.Queue?
You have to know how the Scala collections library is structured. It splits collections based on whether they are mutable or immutable.
Queue lives in the scala.collection.mutable package and scala.collection.immutable package. You have to specify which one you want e.g.
scala> import scala.collection.mutable._
import scala.collection.mutable._
scala> var q = new Queue[Int]()
q: scala.collection.mutable.Queue[Int] = Queue()
scala> import scala.collection.immutable._
import scala.collection.immutable._
scala> var q = Queue[Int]()
q: scala.collection.immutable.Queue[Int] = Queue()
After import scala.collection._ you can use mutable.Queue; you could write Queue if there was a scala.collection.Queue (or one of scala.Queue, java.lang.Queue, and scala.Predef.Queue, since all of their members are imported by default), but there isn't.
It should be easy to see why it works this way: otherwise, the compiler (or anyone reading your code) would have no idea where they should look for the type: do you want scala.collection.Queue, scala.collection.mutable.Queue, or scala.collection.some.subpackage.from.a.library.Queue?

getOrElse method not being found in Scala Spark

Attempting to follow example in Sandy Ryza's book Advanced Analytics with Spark, coding using IntelliJ. Below I seem to have imported all the right libraries, but why is it not recognizing getOrElse?
Error:(84, 28) value getOrElse is not a member of org.apache.spark.rdd.RDD[String]
bArtistAlias.value.getOrElse(artistID, artistID)
^
Code:
import org.apache.spark.rdd.RDD
import org.apache.spark.rdd._
import org.apache.spark.rdd.PairRDDFunctions
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.mllib.recommendation._
val trainData = rawUserArtistData.map { line =>
val Array(userID, artistID, count) = line.split(' ').map(_.toInt)
val finalArtistID = bArtistAlias.value.getOrElse(artistID, artistID)
Rating(userID, finalArtistID, count)
}.cache()
I can only make an assumption as the code listed is missing pieces, but my guess is that bArtistAlias is supposed to be a Map that SHOULD be broadcast, but isnt.
I went and found the piece of code in Sandy's book and it corroborates my guess. So, you seem to be missing this piece:
val bArtistAlias = sc.broadcast(artistAlias)
I am not even sure what you did without the code, but it looks like you broadcast an RDD[String], thus the error.....this would not even work anyway as you cannot work with another RDD inside of an RDD

Creating a scala Reader from a file

How do you instantiate a scala.util.parsing.input.Reader to read from a file? The API mentions in passing something about PagedSeq and java.io.Reader, but it's not clear at all how to accomplish that.
You create a FileInputStream, pass that to an InputStreamReader and pass that to the apply method of the StreamReader companion object, which returns a StreamReader, a subtype of Reader.
scala> import scala.util.parsing.input.{StreamReader,Reader}
import scala.util.parsing.input.{StreamReader, Reader}
scala> import java.io._
import java.io._
scala> StreamReader(new InputStreamReader(new FileInputStream("test")))
res0: scala.util.parsing.input.StreamReader = scala.util.parsing.input.StreamReader#1e5a0cb