Scala Pickling: Writing a custom pickler / unpickler for nested structures - scala

I'm trying to write a custom SPickler / Unpickler pair to work around some the current limitations of scala-pickling.
The data type I'm trying to pickle is a case class, where some of the fields already have their own SPickler and Unpickler instances.
I'd like to use these instances in my custom pickler, but I don't know how.
Here's an example of what I mean:
// Here's a class for which I want a custom SPickler / Unpickler.
// One of its fields can already be pickled, so I'd like to reuse that logic.
case class MyClass[A: SPickler: Unpickler: FastTypeTag](myString: String, a: A)
// Here's my custom pickler.
class MyClassPickler[A: SPickler: Unpickler: FastTypeTag](
implicit val format: PickleFormat) extends SPickler[MyClass[A]] with Unpickler[MyClass[A]] {
override def pickle(
picklee: MyClass[A],
builder: PBuilder) {
builder.beginEntry(picklee)
// Here we save `myString` in some custom way.
builder.putField(
"mySpecialPickler",
b => b.hintTag(FastTypeTag.ScalaString).beginEntry(
picklee.myString).endEntry())
// Now we need to save `a`, which has an implicit SPickler.
// But how do we use it?
builder.endEntry()
}
override def unpickle(
tag: => FastTypeTag[_],
reader: PReader): MyClass[A] = {
reader.beginEntry()
// First we read the string.
val myString = reader.readField("mySpecialPickler").unpickle[String]
// Now we need to read `a`, which has an implicit Unpickler.
// But how do we use it?
val a: A = ???
reader.endEntry()
MyClass(myString, a)
}
}
I would really appreciate a working example.
Thanks!

Here is a working example:
case class MyClass[A](myString: String, a: A)
Note that the type parameter of MyClass does not need context bounds. Only the custom pickler class needs the corresponding implicits:
class MyClassPickler[A](implicit val format: PickleFormat, aTypeTag: FastTypeTag[A],
aPickler: SPickler[A], aUnpickler: Unpickler[A])
extends SPickler[MyClass[A]] with Unpickler[MyClass[A]] {
private val stringUnpickler = implicitly[Unpickler[String]]
override def pickle(picklee: MyClass[A], builder: PBuilder) = {
builder.beginEntry(picklee)
builder.putField("myString",
b => b.hintTag(FastTypeTag.ScalaString).beginEntry(picklee.myString).endEntry()
)
builder.putField("a",
b => {
b.hintTag(aTypeTag)
aPickler.pickle(picklee.a, b)
}
)
builder.endEntry()
}
override def unpickle(tag: => FastTypeTag[_], reader: PReader): MyClass[A] = {
reader.hintTag(FastTypeTag.ScalaString)
val tag = reader.beginEntry()
val myStringUnpickled = stringUnpickler.unpickle(tag, reader).asInstanceOf[String]
reader.endEntry()
reader.hintTag(aTypeTag)
val aTag = reader.beginEntry()
val aUnpickled = aUnpickler.unpickle(aTag, reader).asInstanceOf[A]
reader.endEntry()
MyClass(myStringUnpickled, aUnpickled)
}
}
In addition to the custom pickler class, we also need an implicit def which returns a pickler instance specialized for concrete type arguments:
implicit def myClassPickler[A: SPickler: Unpickler: FastTypeTag](implicit pf: PickleFormat) =
new MyClassPickler

Related

Instantiate a class with a specified name

I'm writing a Scala library to operate upon Spark DataFrames. I have a bunch of classes, each of which contain a function that operates upon the supplied DataFrame:
class Foo(){val func = SomeFunction(,,,)}
class Bar(){val func = SomeFunction(,,,)}
class Baz(){val func = SomeFunction(,,,)}
The user of my library passes a parameter operation: String to indicate class to instantiate, the value passed has to be the name of one of those classes hence I have code that looks something like this:
operation match {
case Foo => new Foo().SomeFunction
case Bar => new Bar().SomeFunction
case Baz => new Baz().SomeFunction
}
I'm a novice Scala developer but this seems rather like a clunky way of achieving this. I'm hoping there is a simpler way to instantiate the desired class based on the value of operation given that it will be the same as the name of the desired class.
The reason I want to do this is that I want external contributors to contribute their own classes and I want to make it at easy as possible for them to do that, I don't want them to have to know they also need to go and change a pattern match.
For
case class SomeFunction(s: String)
class Foo(){val func = SomeFunction("Foo#func")}
class Bar(){val func = SomeFunction("Bar#func")}
class Baz(){val func = SomeFunction("Baz#func")}
//...
reflection-based version of
def foo(operation: String) = operation match {
case "Foo" => new Foo().func
case "Bar" => new Bar().func
case "Baz" => new Baz().func
// ...
}
is
import scala.reflect.runtime.universe
import scala.reflect.runtime.universe._
def foo(operation: String): SomeFunction = {
val runtimeMirror = universe.runtimeMirror(getClass.getClassLoader)
val classSymbol = runtimeMirror.staticClass(operation)
val constructorSymbol = classSymbol.primaryConstructor.asMethod
val classMirror = runtimeMirror.reflectClass(classSymbol)
val classType = classSymbol.toType
val constructorMirror = classMirror.reflectConstructor(constructorSymbol)
val instance = constructorMirror()
val fieldSymbol = classType.decl(TermName("func")).asTerm
val instanceMirror = runtimeMirror.reflect(instance)
val fieldMirror = instanceMirror.reflectField(fieldSymbol)
fieldMirror.get.asInstanceOf[SomeFunction]
}
Testing:
foo("Foo") //SomeFunction(Foo#func)
foo("Bar") //SomeFunction(Bar#func)
foo("Baz") //SomeFunction(Baz#func)

Access Spark broadcast variable in different classes

I am broadcasting a value in Spark Streaming application . But I am not sure how to access that variable in a different class than the class where it was broadcasted.
My code looks as follows:
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreachRDD(rdd => {
val obj = AppObject1
rdd.filter(p => obj.apply(p))
rdd.count
}
}
object AppObject1: Boolean{
def apply(str: String){
AnotherObject.process(str)
}
}
object AnotherObject{
// I want to use broadcast variable in this object
val B = broadcastA.Value // compilation error here
def process(): Boolean{
//need to use B inside this method
}
}
Can anyone suggest how to access broadcast variable in this case?
There is nothing particularly Spark specific here ignoring possible serialization issues. If you want to use some object it has to be available in the current scope and you can achieve this the same way as usual:
you can define your helpers in a scope where broadcast is already defined:
{
...
val x = sc.broadcast(1)
object Foo {
def foo = x.value
}
...
}
you can use it as a constructor argument:
case class Foo(x: org.apache.spark.broadcast.Broadcast[Int]) {
def foo = x.value
}
...
Foo(sc.broadcast(1)).foo
method argument
case class Foo() {
def foo(x: org.apache.spark.broadcast.Broadcast[Int]) = x.value
}
...
Foo().foo(sc.broadcast(1))
or even mixed-in your helpers like this:
trait Foo {
val x: org.apache.spark.broadcast.Broadcast[Int]
def foo = x.value
}
object Main extends Foo {
val sc = new SparkContext("local", "test", new SparkConf())
val x = sc.broadcast(1)
def main(args: Array[String]) {
sc.parallelize(Seq(None)).map(_ => foo).first
sc.stop
}
}
Just a short take on performance considerations that were introduced earlier.
Options proposed by zero233 are indeed very elegant way of doing this kind of things in Scala. At the same time it is important to understand implications of using certain patters in distributed system.
It is not the best idea to use mixin approach / any logic that uses enclosing class state. Whenever you use a state of enclosing class within lambdas Spark will have to serialize outer object. This is not always true but you'd better off writing safer code than one day accidentally blow up the whole cluster.
Being aware of this, I would personally go for explicit argument passing to the methods as this would not result in outer class serialization (method argument approach).
you can use classes and pass the broadcast variable to classes
your psudo code should look like :
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreach(rdd => {
val obj = new AppObject1(broadcastA)
rdd.filter(p => obj.apply(p))
rdd.count
})
}
}
class AppObject1(bc : Broadcast[String]){
val anotherObject = new AnotherObject(bc)
def apply(str: String): Boolean ={
anotherObject.process(str)
}
}
class AnotherObject(bc : Broadcast[String]){
// I want to use broadcast variable in this object
def process(str : String): Boolean = {
val a = bc.value
true
//need to use B inside this method
}
}

How to define a parametric type alias

I try to define a parametric type alias :
case class A
case class B
case class C
// We need an Int to load instances of A and B, and a String to load C
object Service {
def loadA(i: Int) : A = ???
def loadB(i: Int) : B = ???
def loadC(s: String) : C = ???
}
trait Location[T] { def get : T}
class IntLocation(val i: Int)
class StringLocation(val s: String)
trait EntityLocation[E] extends Location[_]
// Aim : make the loader typesafe
// Problem : I need something like that : type EntityLocation[Composite] = IntLocation
object Family {
trait EntityLoader[EntityT] extends (EntityLocation[EntityT] => EntityT)
val ALoader = new EntityLoader[A] {def load[A](l: EntityLocation[A]) = Service.loadA(l.get)
}
I am not sure what you are trying to achieve here. Could you please explain how you want to use these types in your code?
Assuming just want to use the types IdLocation and FileLocation in your code, maybe you want to try
trait Location[T] { def get : T }
type IdLocation = Location[Id]
type FileLocation = Location[java.io.File]
Seems rather convoluted, so I'm not sure I follow exactly what your purpose here is. You seem to go into many layers of factories that create factories, that call factory methods, etc.
Seems to me that at the end of the day you need you want to have a val ALoader value that you can use to get instances of A from Location[Int] objects, so I'll go with that assumption:
// Not sure what you want this one, but let's assume that you need a wrapper class per your example.
trait Location[P] { def get: P }
class IntLocation(val i: Int) extends Location[Int]
{
override def get: Int = i
}
// P for parameter, O for output class.
def loader[O, P](creator: P => O)(param: Location[P]) = { creator(param.get) }
object Service
{
// A function somewhere, capable of taking your parameter and creating something else (in your example, an Int to an 'A')
// here Int to String to make something concrete.
// This could be any function, anywhere
def loadA(someParam: Int) = someParam.toString
}
def main(args: Array[String])
{
val myStringLoader: Location[Int] => String = loader(Service.loadA)
// Alternatively, you could have written `val myStringLoader = loader(Service.loadA)(_)`. Either the type or the underscore are needed to tell the compiler that you expect a function, not a value.
// Some definition for you wrapper class
val location3 = new Location[Int]{
override def get: Int = 3
}
// ... or just a plain old instance of it.
val otherLocation = new IntLocation(5)
// This would 'load' the kind of thing you want using the method you specified.
val myString = myStringLoader(location3)
val myOtherString = myStringLoader(otherLocation)
// This prints "3 - 5"
print(myString + " - " + myOtherString)
}
This might seem like a long answer, but in truth the line def loader[O, P](creator: P => O)(param: Location[P]) = { creator(param.get) } is the one that does it all, the rest is to make it as similar to your sample as possible and to provide a working main you can use to start from.
Of course, this would be even simpler if you don't really need the Location wrapper for your integer.

Scala Reflection Conundrum: Can you explain these weird results?

I wrote some Scala code, using reflection, that returns all vals in an object that are of a certain type. Below are three versions of this code. One of them works but is ugly. Two attempts to improve it don't work, in very different ways. Can you explain why?
First, the code:
import scala.reflect.runtime._
import scala.util.Try
trait ScopeBase[T] {
// this version tries to generalize the type. The only difference
// from the working version is [T] instead of [String]
def enumerateBase[S: universe.TypeTag]: Seq[T] = {
val mirror = currentMirror.reflect(this)
universe.typeOf[S].decls.map {
decl => Try(mirror.reflectField(decl.asMethod).get.asInstanceOf[T])
}.filter(_.isSuccess).map(_.get).filter(_ != null).toSeq
}
}
trait ScopeString extends ScopeBase[String] {
// This version works but requires passing the val type
// (String, in this example) explicitly. I don't want to
// duplicate the code for different val types.
def enumerate[S: universe.TypeTag]: Seq[String] = {
val mirror = currentMirror.reflect(this)
universe.typeOf[S].decls.map {
decl => Try(mirror.reflectField(decl.asMethod).get.asInstanceOf[String])
}.filter(_.isSuccess).map(_.get).filter(_ != null).toSeq
}
// This version tries to avoid passing the object's type
// as the [S] type parameter. After all, the method is called
// on the object itself; so why pass the type?
def enumerateThis: Seq[String] = {
val mirror = currentMirror.reflect(this)
universe.typeOf[this.type].decls.map {
decl => Try(mirror.reflectField(decl.asMethod).get.asInstanceOf[String])
}.filter(_.isSuccess).map(_.get).filter(_ != null).toSeq
}
}
// The working example
object Test1 extends ScopeString {
val IntField: Int = 13
val StringField: String = "test"
lazy val fields = enumerate[Test1.type]
}
// This shows how the attempt to generalize the type doesn't work
object Test2 extends ScopeString {
val IntField: Int = 13
val StringField: String = "test"
lazy val fields = enumerateBase[Test2.type]
}
// This shows how the attempt to drop the object's type doesn't work
object Test3 extends ScopeString {
val IntField: Int = 13
val StringField: String = "test"
lazy val fields = enumerateThis
}
val test1 = Test1.fields // List(test)
val test2 = Test2.fields // List(13, test)
val test3 = Test3.fields // List()
The "enumerate" method does work. However, as you can see from the Test1 example, it requires passing the object's own type (Test1.type) as a parameter, which should not have been necessary. The "enumerateThis" method tries to avoid that but fails, producing an empty list. The "enumerateBase" method attempts to generalize the "enumerate" code by passing the val type as a parameter. But it fails, too, producing the list of all vals, not just those of a certain type.
Any idea what's going on?
Your problem in your generic implementation is the loss of the type information of T. Also, don't use exceptions as your primary method of control logic (it's very slow!). Here's a working version of your base.
abstract class ScopeBase[T : universe.TypeTag, S <: ScopeBase[T, S] : universe.TypeTag : scala.reflect.ClassTag] {
self: S =>
def enumerateBase: Seq[T] = {
val mirror = currentMirror.reflect(this)
universe.typeOf[S].baseClasses.map(_.asType.toType).flatMap(
_.decls
.filter(_.typeSignature.resultType <:< universe.typeOf[T])
.filter(_.isMethod)
.map(_.asMethod)
.filter(_.isAccessor)
.map(decl => mirror.reflectMethod(decl).apply().asInstanceOf[T])
.filter(_ != null)
).toSeq
}
}
trait Inherit {
val StringField2: String = "test2"
}
class Test1 extends ScopeBase[String, Test1] with Inherit {
val IntField: Int = 13
val StringField: String = "test"
lazy val fields = enumerateBase
}
object Test extends App {
println(new Test1().fields)
}
Instead of getting the type from universe.typeOf you can use the runtime class currentMirror.classSymbol(getClass).toType, below is an example that works:
def enumerateThis: Seq[String] = {
val mirror = currentMirror.reflect(this)
currentMirror.classSymbol(getClass).toType.decls.map {
decl => Try(mirror.reflectField(decl.asMethod).get.asInstanceOf[String])
}.filter(_.isSuccess).map(_.get).filter(_ != null).toSeq
}
//prints List(test)
With everyone's help, here's the final version that works:
import scala.reflect.runtime.{currentMirror, universe}
abstract class ScopeBase[T: universe.TypeTag] {
lazy val enumerate: Seq[T] = {
val mirror = currentMirror.reflect(this)
currentMirror.classSymbol(getClass).baseClasses.map(_.asType.toType).flatMap {
_.decls
.filter(_.typeSignature.resultType <:< universe.typeOf[T])
.filter(_.isMethod)
.map(_.asMethod)
.filterNot(_.isConstructor)
.filter(_.paramLists.size == 0)
.map(decl => mirror.reflectField(decl.asMethod).get.asInstanceOf[T])
.filter(_ != null).toSeq
}
}
}
trait FieldScope extends ScopeBase[Field[_]]
trait DbFieldScope extends ScopeBase[DbField[_, _]] {
// etc....
}
As you see from the last few lines, my use cases are limited to scope objects for specific field types. This is why I want to parameterize the scope container. If I wanted to enumerate the fields of multiple types in a single scope container, then I would have parameterized the enumerate method.

Scala Reflection to update a case class val

I'm using scala and slick here, and I have a baserepository which is responsible for doing the basic crud of my classes.
For a design decision, we do have updatedTime and createdTime columns all handled by the application, and not by triggers in database. Both of this fields are joda DataTime instances.
Those fields are defined in two traits called HasUpdatedAt, and HasCreatedAt, for the tables
trait HasCreatedAt {
val createdAt: Option[DateTime]
}
case class User(name:String,createdAt:Option[DateTime] = None) extends HasCreatedAt
I would like to know how can I use reflection to call the user copy method, to update the createdAt value during the database insertion method.
Edit after #vptron and #kevin-wright comments
I have a repo like this
trait BaseRepo[ID, R] {
def insert(r: R)(implicit session: Session): ID
}
I want to implement the insert just once, and there I want to createdAt to be updated, that's why I'm not using the copy method, otherwise I need to implement it everywhere I use the createdAt column.
This question was answered here to help other with this kind of problem.
I end up using this code to execute the copy method of my case classes using scala reflection.
import reflect._
import scala.reflect.runtime.universe._
import scala.reflect.runtime._
class Empty
val mirror = universe.runtimeMirror(getClass.getClassLoader)
// paramName is the parameter that I want to replacte the value
// paramValue is the new parameter value
def updateParam[R : ClassTag](r: R, paramName: String, paramValue: Any): R = {
val instanceMirror = mirror.reflect(r)
val decl = instanceMirror.symbol.asType.toType
val members = decl.members.map(method => transformMethod(method, paramName, paramValue, instanceMirror)).filter {
case _: Empty => false
case _ => true
}.toArray.reverse
val copyMethod = decl.declaration(newTermName("copy")).asMethod
val copyMethodInstance = instanceMirror.reflectMethod(copyMethod)
copyMethodInstance(members: _*).asInstanceOf[R]
}
def transformMethod(method: Symbol, paramName: String, paramValue: Any, instanceMirror: InstanceMirror) = {
val term = method.asTerm
if (term.isAccessor) {
if (term.name.toString == paramName) {
paramValue
} else instanceMirror.reflectField(term).get
} else new Empty
}
With this I can execute the copy method of my case classes, replacing a determined field value.
As comments have said, don't change a val using reflection. Would you that with a java final variable? It makes your code do really unexpected things. If you need to change the value of a val, don't use a val, use a var.
trait HasCreatedAt {
var createdAt: Option[DateTime] = None
}
case class User(name:String) extends HasCreatedAt
Although having a var in a case class may bring some unexpected behavior e.g. copy would not work as expected. This may lead to preferring not using a case class for this.
Another approach would be to make the insert method return an updated copy of the case class, e.g.:
trait HasCreatedAt {
val createdAt: Option[DateTime]
def withCreatedAt(dt:DateTime):this.type
}
case class User(name:String,createdAt:Option[DateTime] = None) extends HasCreatedAt {
def withCreatedAt(dt:DateTime) = this.copy(createdAt = Some(dt))
}
trait BaseRepo[ID, R <: HasCreatedAt] {
def insert(r: R)(implicit session: Session): (ID, R) = {
val id = ???//insert into db
(id, r.withCreatedAt(??? /*now*/))
}
}
EDIT:
Since I didn't answer your original question and you may know what you are doing I am adding a way to do this.
import scala.reflect.runtime.universe._
val user = User("aaa", None)
val m = runtimeMirror(getClass.getClassLoader)
val im = m.reflect(user)
val decl = im.symbol.asType.toType.declaration("createdAt":TermName).asTerm
val fm = im.reflectField(decl)
fm.set(??? /*now*/)
But again, please don't do this. Read this stackoveflow answer to get some insight into what it can cause (vals map to final fields).