Overcoming changes to persistent message classes in Akka Persistence - scala

Let's say I start out with a Akka Persistence system like this:
case class MyMessage(x: Int)
class MyProcessor extends Processor {
def receive = {
case Persistent(m # MyMessage) => m.x
//...
}
}
And then someday I change it to this:
case class MyMessage(x: Int, y: Int)
class MyProcessor extends Processor {
def receive = {
case Persistent(m # MyMessage) => m.x + m.y
//...
}
}
After I deploy my new system, when the instance of MyProcessor tries to restore its state, the journaled messages will be of the former case class. Because it is expecting the latter type, it will throw an OnReplayFailure, rendering the processor useless. Question is: if we were to assume an absent y can equal 0 (or whatever) is there a best practice to overcome this? For example, perhaps using an implicit to convert from the former message to the latter on recovery?

Akka uses Java serialization by default and says that for a long-term project we should use a proper alternative. This is because Java serialization is very difficult to evolve over time. Akka recommends using either Google Protocol Buffers, Apache Thrift or Apache Avro.
With Google Protocol Buffers, for example, in your case you'd be writing something like:
if (p.hasY) p.getY else 0
Akka explains all that in a nice article (admittedly it's not very Google-able):
http://doc.akka.io/docs/akka/current/scala/persistence-schema-evolution.html
and even explains your particular use case of adding a new field to an existing message type:
http://doc.akka.io/docs/akka/current/scala/persistence-schema-evolution.html#Add_fields
A blog post the Akka documentation recommends for a comparison of the different serialization toolkits:
http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html

Related

What are the dangers of using an object as an AKKA actor?

In the code below I'm using an AKKA actor MonitorActor even though it's an object. I never see this pattern in production code although it seems to work well.
Does the below code have concurrency issues as a result of using an object as as Actor?
Are there any AKKA actor related 'gotchas' on show here?
case class SomeEvent(member: String)
class Example(eventBus: EventBus)(implicit actorSystem: ActorSystem) {
val members: AtomicReference[Set[String]] = new AtomicReference(Set())
actorSystem.actorOf(Props(MonitorActor))
private object MonitorActor extends Actor {
eventBus.subscribe(classOf[SomeEvent])
var isEnough = false
override def receive: Receive = {
case SomeEvent(member: String) =>
val newMembers = members.updateAndGet(_ + member)
if (newMembers.size >= 10) {
isEnough = true
}
}
}
}
One immediate question arising from this "pattern" is: what happens if the Actor is added to the actorSystem twice:
actorSystem.actorOf(Props(MonitorActor))
actorSystem.actorOf(Props(MonitorActor))
This is not a trivial question. In large code bases there can be multiple files/packages where an Actor is materialized so the above scenario will likely come up if only by accident.
At best, each SomeEvent is processed twice by the exact same logic. At worst you will get into nasty race conditions with isEnough. So lets assume the best case.
Even in the best case scenario each SomeEvent will be processed by the exact same logic. This isn't bad in the question's example because members is a Set. But if it were a List you would start to get double insertions of the same event.
Another issue is having to protect ourselves from race conditions involving members. A good reason for members to be an AtomicReference is to resolve the situation where the two "independent" Actors are trying to access members at the same time. But this goes against the entire purpose of the Actor model. From the original 1973 formalism (emphasis mine):
The architecture is general with respect to control structure and does
not have or need goto, interrupt, or semaphore primitives.
A similar description can be found in the akka documentation's introduction (emphasis mine):
The Actor Model provides a higher level of abstraction for writing
concurrent and distributed systems. It alleviates the developer from
having to deal with explicit locking and thread management, making it
easier to write correct concurrent and parallel systems.
So we have effectively broken the Actor model framework and all we got was not having to call a constructor. Contrast the question's example code with the "preferable" implementation:
class MonitorActor() extends Actor {
val members: Set[String] = Set.empty[String]
eventBus.subscribe(classOf[SomeEvent])
var isEnough = false
override def receive: Receive = {
case SomeEvent(member: String) => {
members add member
isEnough = members.size >= 10
}
}
}
Now the developer doesn't have to worry about semaphores, race conditions, thread contention, ... All of the logic and functionality within an Actor can be understood from a serial perspective.

scala api design: can i avoid either an unsafe cast or generics

I am writing a library that provides a distributed algorithm. The idea being that existing applications can add the library to use the algorithm. The algorithm is in a library module and it abstracts the actual transmission of data over the network behind a trait. An application that uses the algorithm has to provide the actual network transport code. In code it looks something like the following:
// library is really a separate project not a single object
object Library {
// handle to a remote server
trait RemoteProcess
// concrete server need to know how to actually send to a real remote process
trait Server {
def send(client: RemoteProcess, msg: String)
}
}
// the application uses the library and provides the concrete transport
object AkkaDemoApplication {
// concreate ref is a m wrapper to an actor ref in demo app
case class ConcreteRemoteProcess(ref: akka.actor.ActorRef) extends Library.RemoteProcess
class AkkaServer extends Library.Server {
// **WARNING** this wont compile its here to make my question
override def send(client: ConcreteRemoteProcess, msg: String): Unit = client.ref ! msg
}
}
A couple of options I have considered:
Have the signature of the AkkaServer method overload the library trait method then perform an unsafe cast to ConcreteRemoteProcess. Yuk!
Have the signature of the AkkaServer method overload the library trait method then pattern match on the RemoteProcesss argument give a ConcreteRemoteProcess. This is no better than an unsafe cast in terms of blowing up at runtime if the wrong thing is passed.
Make the library server generic in terms of the RemoteProcess.
An example of option 3 looks like:
object Library {
trait Server[RemoteProcess] {
def send(client: RemoteProcess, msg: String)
}
}
object Application {
class AkkaServer extends Library.Server[ActorRef] {
override def send(client: ActorRef, msg: String): Unit = client ! msg
}
}
I tried option 3 and it worked but the generic type ended up be stamped on just about every type throughout the entire library module. There was then a lot of covariant and contravariant hassles to get the algorithmic code to compile. Simply to get compile time certainty at the one integration point the cognitive overhead was very large. Visually the library code is dominated by the generic signatures as though understanding that is critical to understanding the library when in fact it's a total distraction to understanding the library logic.
So using the genetic works and gave me compile time certainly but now I wished I had gone with the option 2 (the pattern match) with the excuse "it would fail fast at startup if someone got it wrong lets keep it simple".
Am I missing some Scala feature or idiom here to get compile time certainty without the cognitive overhead of a "high touch" generic that all the library code touches but ignores?
Edit I have considered that perhaps my code library is badly factored such that a refactor could move the generic to the boundaries. Yet the library has already been refactored for testability and that breakup into testable responsibilities is part of the problem of the generic getting smeared around the codebase. Hence my question is: in general is there another technique I don't know about to avoid a generic to provide a concrete implementation to an abstract API?
I think you are coupling your algorithm and Akka too closely. Further more, I assume the server to send data to the remote client that performs some operation and sends back the result
Answer
Why not
object Library {
// handle to a remote server
trait RemoteProcessInput
trait RemoteProcessResult
// concrete server need to know how to actually send to a real remote process and how to deal with the result
trait Server {
def handle(clientData: RemoteProcessInput) : Future[RemoteProcessResult]
}
}
A concrete implementation provides the implementation with Akka
object Application {
class AkkaServerImpl(system: ActorSystem)
extends Library.Server {
override def handle(clientData: RemoteProcessInput)
: ActorRef, msg: String): Future[RemoteProcessResult] = {
// send data to client and expect result
// you can distinguish target and msg from the concrete input
val ref : ActorRef = ??? // (resolve client actor)
val msg = ??? // (create your message based on concrete impl)
val result = ref ? msg // using ask pattern here
// alternatively have an actor living on server side that sends msgs and receives the clients results, triggered by handle method
result
}
}
}

Registering a class with Kryo via twitter chill-scala

I'm trying to serialize an instance of a Scala class using Kryo via Twitter's chill-scala library. It's from a library (external jar) and thus, I think, needs to be registered with Kryo.
How do I register a class to (de)serialize using chill-scala?
Here's the core of my code, based mostly on examining the chill-scala test suite.
// This is from the chill-scala test suite
def serialize[T](t: T): Array[Byte] = ScalaKryoInstantiator.defaultPool.toBytesWithClass(t)
def deserialize[T](bytes: Array[Byte]): T =
ScalaKryoInstantiator.defaultPool.fromBytes(bytes).asInstanceOf[T]
/**
* Save a value in cache.
*/
def save[T](key: String, value: T, expiration: Int = 0): Future[T] = {
cache.put(key, serialize[T](value), expiration, TimeUnit.SECONDS)
Future.successful(value)
}
/**
* Finds a value in the cache.
*/
def find[T: ClassTag](key: String): Future[Option[T]] = Future {
val result = deserialize[T](cache.get(key).asInstanceOf[Array[Byte]])
Option(result)
}
When I run it, it throws
com.esotericsoftware.kryo.KryoException: Unable to find class: <name_of_external_class>
More generally, is there documentation someplace on how to use chill-scala? The authors of that package have obviously done a substantial amount of work and I've seen a number of positive references to it -- but no documentation.
Thanks for any pointers,
Byron
I believe that chill is just an extension to Kryo, which provides serializers for supported scala classes better than the Kryo default FieldSerializer.
So, if you do not know how to use chill, you should try to read the docs of Kryo.
And, the exception of
Unable to find class: <name_of_external_class>
is probably because the class is not in your class path. It may not relate to Kryo.
Note that, even not registered in Kryo, the class can still be serialized by Kryo.
If you need good example of using chill, source code of Apache Spark is a good choice. Also, the small example in this repo is acceptable.

Is there a way to avoid untyped ActorRef in scala cookbook actor communication example?

In scala cookbook: 13.3. How to Communicate Between Actors I see this
class Ping(pong: ActorRef) extends Actor { // OMG - ActorRef - no type, help!
var count = 0
def incrementAndPrint { count += 1; println("ping") }
def receive = {
case StartMessage =>
incrementAndPrint
I have got also a few places in my own code where I have this ActorRef I don't like it as I liked type safety. Is there a way to avoid that in the above pong example?
Side Note: I understand I can use "actorFor" with naming, but as a DI freak I rather pass it in constructor / parameter.
Some stuff is in the works for Akka 3.0 eg see this teaser thread: https://mobile.twitter.com/RayRoestenburg/status/510511346040197120
There is a pattern for type safety now using a custom ask (the question mark). Here is a blog about it:
http://www.warski.org/blog/2013/05/typed-ask-for-akka/
This is a little clunky though and may not be worth the trouble.
Another approach is to create typed APIs and wrap your actors in them.

Akka actors + Play! 2.0 Scala Edition best practices for spawning and bundling actor instances

My app gets a new instance of Something via an API call on a stateless controller. After I do my mission critical stuff (like saving it to my Postgres database and committing the transaction) I would now like to do a bunch of fire-and-forget operations.
In my controller I send the model instance to the post-processor:
import _root_.com.eaio.uuid.UUID
import akka.actor.Props
// ... skip a bunch of code
play.api.libs.concurrent.Akka.system.actorOf(
Props[MySomethingPostprocessorActor],
name = "somethingActor"+new UUID().toString()
) ! something
The MySomethingPostprocessorActor actor looks like this:
class MySomethingPostprocessorActor extends Actor with ActorLogging {
def receive = {
case Something(thing, alpha, beta) => try {
play.api.libs.concurrent.Akka.system.actorOf(
Props[MongoActor],
name = "mongoActor"+new UUID().toString()
) ! Something(thing, alpha, beta)
play.api.libs.concurrent.Akka.system.actorOf(
Props[PubsubActor],
name = "pubsubActor"+new UUID().toString()
) ! Something(thing, alpha, beta)
// ... and so forth
} catch {
case e => {
log.error("MySomethingPostprocessorActor error=[{}]", e)
}
}
}
}
So, here's what I'm not sure about:
I know Actor factories are discouraged as per the warning on this page. My remedy for this is to name each actor instance with a unique string provided by UUID, to get around the your-actor-is-not-unique errors:
play.core.ActionInvoker$$anonfun$receive$1$$anon$1:
Execution exception [[InvalidActorNameException:
actor name somethingActor is not unique!]]
Is there a better way to do the above, i.e. instead of giving everything a unique name? All examples in the Akka docs I encountered give actors a static name, which is a bit misleading.
(any other comments are welcome too, e.g. the if the bundling pattern I use is frowned upon, etc)
As far as I'm aware the name paramater is optional.
This may or may not be the case with Akka + Play (haven't checked). When working with standalone actor systems though, you usually only name an actor when you need that reference for later.
From the sounds of it you're tossing out these instances after using them, so you could probably skip the naming step.
Better yet, you could probably save the overhead of creating each actor instance by just wrapping your operations in Futures and using callbacks if need be: http://doc.akka.io/docs/akka/2.0.3/scala/futures.html