How do I implement "write-behind" in a StatefulActor - azure-service-fabric

Azure Service Fabric documentation says that:
Actors provide flexibility for the developer to define rich object
structures as part of the actors or reference object graphs outside of
the actors. In caching terms actors can write-behind or write-through,
or we can use different techniques at a member variable granularity.
In terms of a StatefulActor or StatefulActor<T>, how would one go about implementing write-behind to improve throughput of state-changing methods?

A possible implementation would be to mark your mutation method as [Readonly], so that the service fabric runtime would not persist the State to the cluster replicas. You could therefore modify in memory member variables with the change that you have described and success or failure will be quickly returned to the calling code. At the same time as modifying the member variables, you would register a reminder that will modify the State property and hence asynchronously distribute the change across the replicas in the cluster.
You would need to consider the possibility of the actor being moved to a different node before the reminder fires, resulting in read calls to return stale data read from the State property before the gets updated.

Related

How Axon framework's sequencing policy works in terms of statefulness

In Axon's reference guide it is written that
Besides these provided policies, you can define your own. All policies must implement the SequencingPolicy interface. This interface defines a single method, getSequenceIdentifierFor, that returns the sequence identifier for a given event. Events for which an equal sequence identifier is returned must be processed sequentially. Events that produce a different sequence identifier may be processed concurrently.
Even more, in this thread's last message it says that
with the sequencing policy, you indicate which events need to be processed sequentially. It doesn't matter whether the threads are in the same JVM, or in different ones. If the sequencing policy returns the same value for 2 messages, they will be guaranteed to be processed sequentially, even if you have tracking processor threads across multiple JVMs.
So does this mean that event processors are actually stateless? If yes, then how do they manage to synchronise? Is the token store used for this purpose?
I think this depends on what you count as state, but I assume that from the point of view your looking at it, yes, the EventProcessor implementations in Axon are indeed stateless.
The SubscribingEventProcessor receives it's events from a SubscribableMessageSource (the EventBus implements this interface) when they occur.
The TrackingEventProcessor retrieves it's event from a StreamableMessageSource (the EventStore implements this interface) on it's own leisure.
The latter version for that needs to keep track of where it is in regards to events on the event stream. This information is stored in a TrackingToken, which is saved by the TokenStore.
A given TrackingEventProcessor thread can only handle events if it has laid a claim on the TrackingToken for the processing group it is part of. Hence, this ensure that the same event isn't handled by two distinct threads to accidentally update the same query model.
The TrackingToken also allow multithreading this process, which is done by segmented the token. The number of segments (adjustable through the initialSegmentCount) drives the number of pieces the TrackingToken for a given processing group will be partitioned in. From the point of view of the TokenStore, this means you'll have several TrackingToken instances stored which equal the number of segments you've set it to.
The SequencingPolicy its job is to drive which events in a stream belong to which segment. Doing so, you could for example use the SequentialPerAggregate SequencingPolicy to ensure all the events with a given aggregate identifier are handled by one segment.

Kafka validate messages in stateful processing

I have an application where multiple users can send REST operations to modify the state of shared objects.
When an object is modified, then multiple actions will happen (DB, audit, logging...).
Not all the operations are valid for example you can not Modify an object after it was Deleted.
Using Kafka I was thinking about the following architecture:
Rest operations are queuing in a Kafka topic.
Operations to the same object are going to the same partition. So all the object's operations will be in sequence and processed by a consumer
Consumers are listening to a partition and validate the operation using an in-memory database
If the operation was valid then is sent to a "Valid operation topic" otherways is sent to an "Invalid operation topic"
Other consumers (db, log, audit) are listening to the "Valid operation topic"
I am not very sure about point number 3.
I don't like the idea to keep the state of all my objects. (I have billions of objects and even if an object can be of 10mb in size, what I need to store to validate its state is just few Kbytes...)
However, is this a common pattern? Otherwise how can you verify the validity of certain operations?
Also what would do you use as a in-memory database? Surely it has to be highly available, fault-tolerant and support transaction (read and write).
I believe this is a very valid pattern, and is essentially a variation to an event-sourced CQRS pattern.
For example, Lagom implements their CQRS persistence in a very similar fashion (although based on completely different toolset)
A few points:
you are right about the need for sequencial operations: since all your state mutations need to be based on the result of the previous mutation, there must be a strong order in their execution. This is very often the case for such things, so we like to be able to scale those operations horizontally as much as possible so that each of those sequences operations is happening in parallel to many other sequences. In your case we have one such sequence per shared object.
Relying on Kafka partitioning by key is a good way to achieve that (assuming you do not set max.in.flight.requests.per.connection higher than the default value 1). Here again Lagom has a similar approach by having their persistent entity distributed and single-threaded. I'm not saying Lagom is better, I'm just comforting you in the fact that is approach is used by others :)
a key aspect of your pattern is the transformation of a Command into an Event: in that jargon a command is seen as a request to impact the state and may be rejected for various reasons. An event is a description of a state update that happened in the past and is irrefutable from the point of view of those who receive it: a event always tells the truth. The process you are describing would be a controller that is at the boundary between the two: it is responsible for transforming commands into events.
In that sense the "Valid operation topic" you mention would be an event-sourced description of the state updates of your process. Since it's all backed by Kafka it would be arbitrarily partionable and thus scalable, which is awesome :)
Don't worry about the size of the sate of all your object, it must sit somewhere somehow. Since you have this controller that transforms the commands into events, this one becomes the primary source of truth related to that object, and this one is responsible for storing it: this controller handles the primary storage for your events, so you must cater space for it. You can use Kafka Streams's Key value store: those are local to each of your processing instance, though if you make them persistent they have no problem in handling data much bigger that the available RAM. Behind the scene data is spilled to disk thanks to RocksDB, and even more behind the scene it's all event-sourced to a kafka topic so your state store is replicated and will be transparently re-created on another machine if necessary
I hope this helps you finalise your design :)

Distributed DDD entities with Akka

Suppose we'd have a large number of persistent Person actors, each constructed with an identity and a name argument. What would be the best way to distribute these actors in a cluster, in such a manner that:
new actors are appointed a node by strategy X (round robin, consistent hash, etc.)
a "coordinator" actor contains a mapping from identity to ActorRef
one or more nodes can fail and the affected actors are recovered on other nodes
there is no SPF
I've considered the following, which doesn't seem to solve the problem:
Cluster sharding; all actors are initialised equally and created by coordinator
Cluster aware routing; groups or pools are fixed size and can't be modified dynamically
Sounds like you pretty much exactly are describing Akka cluster sharding and there isn't enough information to see why it would not fit.
The common solution to deal with such a design problem is to have an uninitialized state of the sharded entity where it only accepts an initialize command containing the needed values (so something like CreateUser(id, name)) and when it gets that it toggles to its "normal" behavior.
Another option could be to introduce an intermediate actor that doesn't start the actual actor until it has extracted the name value if you have no means to change the design of your Person actor.
Ofc. you could also drop down to the Akka cluster APIs directly and build something that exactly matches your use case, but handling redistribution on cluster topology change (add, remove nodes etc) is far from trivial to get right.
I think you would also come to the realisation that achieving such a tool that is entirely non-invasive for your entities without the sharding solution being tightly coupled with you business logic is very hard.

Starting Actors on-demand by identifier in Akka

I'm currently implementing a system that that receives inbound messages from an external monitoring system. I'm translating these messages into more concise 'events', and I'm using these to alter the state of 'Managed System' objects. Akka Actors seemed like a good use case for encapsulating mutable state in concurrent applications.
The managed systems are identified by a name (99% of the time this is a hostname). Whenever a proper event is received, the system routes the message to the correct actor based on the name property. At first I used to use actorSelection and the complete paths of said actors, but that was very ugly, and I saw several people advise against relying on the fully qualified name of an actor to deliver message.
So I've set up a simple EventBus, which is great as I can now simply do:
eventBus.subscribe(subscriber1, "/managedSystem01")
eventBus.subscribe(subscriber2, "/managedSystem02")
eventBus.publish(MonitoringEvent("/managedSystem01", MonitoringMessage("managedSystem01", "N", "CPU_LOAD_HIGH", True)))
eventBus.publish(MonitoringEvent("/managedSystem02", MonitoringMessage("managedSystem02", "Y", "DISK_USAGE_HIGH", True)))
Of course, I now have the issue that, should I receive and event that concerns a managed system for which I've not spawned an actor yet (this is entirely possibly, it is impossible for me to get an absolute list of managed systems unfortunately), the message will be routed to the dead-letter mailbox.
Ideally I don't want this to happen. When it is unable to address a specific actor, I want to spawn a new one dynamically.
I suppose that, theoretically, I could subscribe to DeadLetter messages but:
That sounds a little 'hacky', since those message are essentially reserved for the system
Is it even possible to recover the original message (in my case, the MonitoringMessage) that is sent to the DeadLetter mailbox?
Alternatively is there a way to check if there are ZERO subscribers to a certain "topic"?
What you describe ("send to Actor by some identifier, if it does not exist buffer until it gets created and then deliver to that newly on-demand created Actor") is implemented in Akka as Cluster Sharding.
While it is designed primarily for sharding load (work) across a cluster, you could use it locally as well, since your requirement is essentially a scaled down (to one node) version of problem that it solves. It takes care of starting new Actors if they don't exist for a given identifier etc, so you'd simply subscribe the shard-region to the events and it'll take care of creating the actors for you.

Service with background jobs, how to ensure jobs only run periodically ONCE per cluster

I have a play framework based service that is stateless and intended to be deployed across many machines for horizontal scaling.
This service is handling HTTP JSON requests and responses, and is using CouchDB as its data store again for maximum scalability.
We have a small number of background jobs that need to be run every X seconds across the whole cluster. It is vital that the jobs do not execute concurrently on each machine.
To execute the jobs we're using Actors and the Akka Scheduler (since we're using Scala):
Akka.system().scheduler.schedule(
Duration.create(0, TimeUnit.MILLISECONDS),
Duration.create(10, TimeUnit.SECONDS),
Akka.system().actorOf(LoggingJob.props),
"tick")
(etc)
object LoggingJob {
def props = Props[LoggingJob]
}
class LoggingJob extends UntypedActor {
override def onReceive(message: Any) {
Logger.info("Job executed! " + message.toString())
}
}
Is there:
any built in trickery in Akka/Actors/Play that I've missed that will do this for me?
OR a recognised algorithm that I can put on top of Couchbase (distributed mutex? not quite?) to do this?
I do not want to make any of the instances 'special' as it needs to be very simple to deploy and manage.
Check out Akka's Cluster Singleton Pattern.
For some use cases it is convenient and sometimes also mandatory to
ensure that you have exactly one actor of a certain type running
somewhere in the cluster.
Some examples:
single point of responsibility for certain cluster-wide consistent decisions, or coordination of actions across the cluster system
single entry point to an external system
single master, many workers
centralized naming service, or routing logic
Using a singleton should not be the first design choice. It has
several drawbacks, such as single-point of bottleneck. Single-point of
failure is also a relevant concern, but for some cases this feature
takes care of that by making sure that another singleton instance will
eventually be started.