Serialization in Scala / Akka - scala

I am writing a distributed application in Scala that uses Akka actors. I have some data structures that my remote actors happily serialize, send over the wire, and unserialize without any extra help from me.
For logging I would like to serialize a case class containing these objects. I read the serialization docs on the akka project site but am wondering if there an easier way to get this done since Akka apparently knows how to serialize these objects already.
Edit 5 Nov 2011 in response to Viktor's comment
The application is a distributed Markov Decision Process engine.
I am trying to serialize one of these things:
case class POMDPIteration(
observations: Set[(AgentRef, State)],
rewards: Set[(AgentRef, Float)],
actions: Set[(AgentRef, Action)],
state: State
)
here is the definition of AgentRef:
case class AgentRef(
clientManagerID: Int,
agentNumber: Int,
agentType: AgentType
)
Action and AgentType are just type aliases of Symbol
To keep this shorter, the definition of State is here:
https://github.com/ConnorDoyle/EnMAS/blob/master/src/main/scala/org/enmas/pomdp/State.scala
I am successfully sending case classes containing object of type State among remote actors with no problem. I am just wondering if there is a way to get at the serialization routines that Akka uses for my own purposes.
Akka's implicit serialization when doing message passing is easy, but it appears from the docs that asking Akka for a serialized version explicitly is hard. Perhaps I have misunderstood the documentation, or am missing something important.

This is the magic sauce: https://github.com/akka/akka/blob/master/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala

Related

How to test scala classes for kryo deserialization incompatibility

I want to use kryo to serialize and deserialize a hierarchy of classes, like this:
case class Apple(bananas: Map[String, Banana], color: Option[String])
case class Banana(cherries: Seq[Cherry], countryOfOrigin: String)
case class Cherry(name: Option[String], age: Int, isTomato: Boolean)
Sometimes I want to add and remove fields somewhere in this hierarchy, e.g. to Cherry.
I would like to write a unit test which looks at the type hierarchy starting at Apple and concludes that data previously serialized with kryo will not deserialize properly—i.e. the deserialized object would not be == to the serialized object, if I could have both in memory simultaneously.
In that case, I can update a namespace key in my Redis cache, forget all the old data and rebuild it from scratch. I just need an automated reminder so that I'll remember to do this when I need to.
Some false positives are acceptable; false negatives are not. I'm happy to hardcode something like a serial version UID into my test case and update it whenever I change the underlying class hierarchy. It's acceptable if the test only works on DAG-shaped hierarchies, but handling cycles is definitely welcome.
Is there some way of computing the bit I want by using e.g. the TypeTag machinery to walk a description of the type hierarchy? Exactly which aspects of source type declaration does kryo compatibility depend on, and how do I plop out a representation of those features using e.g. TypeTag?
I use io.altoo.akka.serialization.kryo.KryoSerializer to (de)serialize, see https://github.com/altoo-ag/akka-kryo-serialization.
One trick I've used in this area is to check in samples (ScalaCheck and its generators may prove useful here) of data serialized with "important" versions of the old serialization. Then you write tests that literally check that the new serialization properly deserializes.
You may run into a developer under pressure to get a change in who makes the deserialization test green by changing the serialized data (this happened to me). You can address that by checking in the checksums of the serialized test data and validating them at the start of CI: changing those checksums should be pretty apparent in review that something questionable is going on.
I suspect that this approach will have a somewhat better return-on-effort than the alternative of reimplementing a portion of kryo's type system and figuring out a way to serialize a representation of that type system for comparison against future versions of the code.

shared domain with scala.js, how?

Probably a basic question, but I'm confused with the various documentations and examples around scala.js.
I have a domain model I would like to share between scala and scala.js, let's say:
class Estimator(val nickname: String)
... and of course I would like to send objects between the web-client (scala.js with angular via angulate) and the server (scala with spring-mvc on spring-boot).
Should the class extends js.Object? And be annotated with #ScalaJSDefined (not yet deprecated in v0.6.15)?
If yes, it would be an unwanted dependency that comes also in the server part. Neither #scalaJSDefined nor js.Object are in the dummy scalajs-stubs. Or am I missing something?
If no, how to pass them through $http.post which expects a js.Any? I also get some TypeError at other places. Should I picke/unpickle everywhere or is there an automatic way?
EDIT 2017-03-30:
Actually this relates to Angulate, the facade for AngularJS I choose. For 2 features (communications to an http server and displaying model fields in html), the domain classes have to be Javascript classes. In Angulate's example, the domain model is duplicated.
There are also (and sadly) no plan to include js.Object in scalajs-stubs to overcome this problem. Details in https://github.com/scala-js/scala-js/issues/2564 . Perhaps js.Object doesn't hurt so much on the jvm...
So, what web frameworks and facade for scala.js does / doesn't nicely support shared domain? Not angulate1, probably Udash, perhaps react?
(Caveat: I don't know Angulate, which might affect some of this. Speaking generally, though...)
No, those shared objects shouldn't derive from js.Object or use #ScalaJSDefined -- those are only for objects that are designed to interface with JavaScript itself, and it doesn't sound like that's what you have in mind. Objects that are just for Scala don't need them.
But yes -- in general, you're usually going to need to pickle the communications in one way or another. Which pickling library you use is up to you (there are several), but remember that the communication is simply a stream of bytes -- you have to tell the system how to serialize and deserialize between your domain objects and those bytes.
There isn't anything automatic in Scala.js per se -- that's just a language, and doesn't dictate your library choices. You can use implicits to make the pickling semi-automatic, but I recommend being a bit careful with that. I don't see anything obvious in the Angulate docs that indicate that it does the pickling automatically.

Is there a way to send messages typed with my own classes to an Akka remote actor?

Is there a way to send messages typed with my own classes to a remote actor?
For instance I would like to be able to received in my remote actor a message like this:
case myClass: MyClass => doSomething()
But I get an error local class incompatible because the serialVersionUID are different.
The sole way to send a message of type MyClass that I have found is to serialize it in Json. But I have to serialize/deserialize it, and more problematic, I don't have a clean way to receive two kinds of typed messages...
So is there a way to send strongly typed messages to a remote actor? If not, what is the workaround?
You most certainly can!
When sending objects over the network, they must be turned into bytes on one end, and turned back into objects on the other end. This is called 'Serialization'.
In Akka the serialization mechanism used for messages travelling from one actor system to another is highly configurable: you shouldn't do it in your own actors, but leave it up to Akka's serialization infrastructure (and configure that to your liking).
By default akka uses the built-in 'Java serialization'. This mostly works, but as you noticed is pretty picky about having the exact same class on both sides of the connection. Also, it is not particularly fast. You should have seen a warning in the logging:
Using the default Java serializer for class [{}] which is not
recommended because of performance implications. Use another
serializer or disable this warning using the setting
akka.actor.warn-about-java-serializer-usage
To fix your problem you can either:
Keep using Java serialization, and at least fixate the serialVersionUID as described in Vitaliy's answer.
Switch to another serialization mechanism such as Protobuf.
If you don't care too much about performance and don't expect to do 'rolling upgrades' (where you might need to convert between different versions of the same message), Java serialization is certainly the easiest. It's important to be aware of its limitations, though.
Further documentation on how to configure akka's serialization mechanisms can be found at http://doc.akka.io/docs/akka/current/scala/serialization.html#serialization-scala
From Serializable javadoc:
it is strongly recommended that all serializable classes explicitly
declare serialVersionUID values, since the default serialVersionUID
computation is highly sensitive to class details that may vary
depending on compiler implementations, and can thus result in
unexpected InvalidClassExceptions during deserialization.
So, you should define serialVersionUID in your message classes like this:
#SerialVersionUID(42L)
class Message extends Serializable

How to use scala actor based event sourcing app from another language (python)

I'm somewhat familiar with scala and less familiar with akka, although I know what actor models is (the idea seems quite simple).
So let's say that right now this is my code (in reality what I need is event sourcing application). I need to be able to use it from any language, not just from JVM.
So of course I googled about that and I've found this. The problem with that is that If my understanding is correct I would need to create some custom protocol, deserialization and dispatching for zmq messages and that is totally uncool. Maybe there exists solution for that already? If not, than how to do that in most efficient way? Maybe I need to create some message case classes and something like facade actor that would do deserialization?
class HelloActor extends Actor {
def receive = {
case "hello" => println("well, helllo!")
case _ => println("huh?")
}
}
object Main extends App {
val system = ActorSystem("HelloSystem")
val helloActor = system.actorOf(Props[HelloActor], name = "helloactor")
helloActor ! "hello"
helloActor ! "buenos dias"
}
There are many ways to do this, depends on the protocol you are using ect. For a language specific way is you can use Pyro. Just like java, you can serialize generic objects in python and then transfer them over the network, which you can use Pyro for. You can take advantage of the fact that python is implemented on both the jvm (Jython), and natively. Not sure if its a great idea to write this just in scala and python, I would create the API in java, and then add that to the scala classpath, then any other JVM language can also use your API. In addition it's more common to use jython with java so there are other benifits that come with being the majority.
But anway the common language that the jvm and python will understand will be these serialized python objects. So what you will need to know is:
How to use jython with java
How to use pyro
And yea using scala with jython is only a matter of adding the jars to the classpath, as you probably already know.
EDIT: Ok I think I might not have made this method clear enough. So basically:
JVM uses jython to create a jython instance, which is sent to a remote python object. The communication is done with the module Pyro. This program can send serialized python objects back as well.
This is what happens normally with remote actors in java, except the messages are implementing Serializable. Python and Java are not in the same process, or using native methods, or anything like that. They can be on the same machine or different machines. This method is not platform specific.
Hopefully this method is usefull to someone.
In my case the Akka actor solution was a little bit overkill, so I end up implementing my own event sourcing solution in this open source project.
The persistence layer is a decision for the developer, but I provide practical examples of execution using couchbase.
Take a look in case you consider useful.
https://github.com/politrons/Scalaydrated

Easy Scala Serialization?

I'd like to serialization in Scala -- I've seen the likes of sjson and the #serializable annotation -- however, I have been unable to see how to get them to deal with 1 major hurdle -- Type Erasure and Generics in Libraries.
Take for example the Graph for Scala Library. I make heavy use of it in my code and would like to write several objects holding graphs to disk throughout my code for later analysis. However, many times the node and edge types are encapsulated in generic type arguments of another class I have. How can I properly serialize these classes without either modifying the library itself to deal with reflection or "dirtying" my code by importing a large number of Type Classes (serialization according to how an object is being viewed is wholly unsatisfying anyways...)?
Example,
class Container[N](val g: Graph[N,DiEdge]) {
...
}
// in another file
def myMethod[N](container: Container[N]): Unit = {
<serialize container somehow here>
}
To report on my findings, Java's XStream does a phenomenal job -- anything and everything, generics or otherwise, can be automatically serialized without any additional input. If you need a quick and no-work way to get serialization going, XStream is it!
However, it should be noted that the output XML will not be particularly concise without your own input. For example, every memory block used by Scala's HashMap will be recorded, even if most of them don't contain anything!
If you are using Graphs for Scala and if JSON is your serialization format, you can directly use graph-json.
Here is the code and the doc.