Event-sourcing with akka-persistance: growing state as list?

Event-sourcing with akka-persistance: growing state as list? - scala

I am designing a backend using CQRS + Event sourcing, using Akka + Scala. I am not sure about how to handle a growing state. For instance, I will have a growing list of users. To my understanding, each user will be created following a UserCreated event, such events will be replayed by the PersistentActor, and the users will be stored in a collection. Something like:
class UsersActor extends PersistentActor {
override def persistenceId = ....
private case class UsersState(users: List[User])
private var state = UsersState()
....
}
Obviously such state would eventually grow too big to be held in memory by this actor, so I guess I'm doing something wrong.
I found this example project: the idea seems that each user should be held by a different actor, and loaded (from the event history) as needed.
What is the right way to do this? Thanks a lot.

The answer is: each aggregate/entity (in my example, each User) gets its own actor, which embeds the state for that particular entity and that one only.

Related

How to use multiple counters in Flink

(kinda related to How to create dynamic metric in Flink)
I have a stream of events(someid:String, name:String) and for monitoring reasons, I need a counter per event ID.
In all the Flink documentations and examples, I can see that the counter is , for instance, initialised with a name in the open of a map function.
But in my case I can not initialise the counter as I will need one per eventId and I do not know the value in advance. Also, I understand how expensive it would be to create a new counter every time an even passes in the map() method of the MapFunction.
Finally, I can not keep a "cache" of counters as it would be too big.
Ideally, I would like something like this :
class Event(id: String, name: String)
class ExampleMapFunction extends RichMapFunction[Event, Event] {
#transient private var counter: Counter = _
override def open(parameters: Configuration): Unit = {
counter = new Counter()
}
override def map(event: Event): Event = {
counter.inc(event.id)
event
}
}
Or basically could I implement my own counter that allow me to pass a dimension? if yes, how?
Any advise or best practice for this kind of use-case?

If keeping a cache of the counters would be too big, then I don't think using metrics is going to scale in a way that will satisfy your requirements.
A few alternatives:
Use side outputs to collect meaningful events in some external, queryable/visualizable data store -- e.g., influxdb.
Hold the info in keyed state, and use broadcast messages to trigger output of relevant portions of it as desired (again using side outputs).
Hold the info in keyed state, and take periodic savepoints, which you then analyze via queries using the state processor API.

How to setup domain model as actor?

I'm fairly new to both Scala and Akka and I'm trying to figure out how you would create a proper domain model, which also is an Actor.
Let's imagine we have a simple business case where you can open a new Bank Account. Let's say that one of the rules is that you can only create one bank account per last name (not realistic, but just for the sake of simplicity). My first approach, without applying any business rules, would look something like this:
object Main {
def main(args: Array[String]): Unit = {
implicit val system = ActorSystem("accout")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val account = system.actorOf(Props[Account])
account ! CreateAccount("Doe")
}
}
case class CreateAccount(lastName: String)
class Account extends Actor {
var lastName: String = null
override def receive: Receive = {
case createAccount: CreateAccount =>
this.lastName = lastName
}
}
Eventually you would persist this data somewhere. However, when adding the rule that there can only be one Bank Account per last name, a query to some data storage needs to be done. Let's say we put that logic inside a repository and the repository eventually returns an Account, we get to the problem where Account isn't an Actor anymore, since the repository won't be able to create Actors.
This is definitely a wrong implementation and not how Actors should be used. My question is, what are ways to solve these kind of problems? I am aware that my knowledge of Akka is not on a decent level yet, so it might be a weird/stupid formulated question.

This might be a long answer and I am sorry there isn't a TLDR version. :)
Ok, so you want to "Actorize" your domain model? Bad idea. Domain models are not necessarily actors. Sometimes they are but often they are not. It would be an anti-pattern to deploy one actor per domain model because if you do that you are simply offloading the method calling to message calling but losing all of the single threaded paradigm of the method calling. You cannot guarantee the timing of the messages hitting your actor and programming based upon ASK patterns is a good way to introduce a system that is not scalable, eventually you have too many threads and too many futures and cant proceed further, the system bogs and chokes. So what does that mean for your particular problem?
First you have to stop thinking of the domain model as a single thing and definitely stop using POJO entities. I entirely agree with Martin Fowler when he discusses the anemic domain model. In a well built actor system there will often be three domain models. One is the persisted model which has entities that model your database. The second is the immutable model. This is the model that the actors use to communicate with each other. All the entities are immutable from the bottom up, all collections unmodifiable, all objects only have getters, all constructors copy the collections to new immutable collections. The immutable model means your actors never have to copy anything, they just pass around references to data. Lastly you will have the API model, this is usually the set of entities that model the JSON for the clients to consume. The API model is there to insulate the back end from client code changes and vice versa, its the contract between the systems.
To create your actors stop thinking about your persistent model and what you will do with it but instead start thinking of the use cases. What does your system have to do? Model your actors based on the use cases and that will change the implementation of the actors and their deployment strategies.
For example, consider a server that delivers inventory information to users including current stock levels, reviews by users and so on for products by a single vendor. The users hammer this information and it changes quickly as stock levels change. This information is likely stored in half a dozen different tables. We don't model an actor for each table but rather a single actor to serve this use case. In this case this information is accessed by a large group of people in heavy load environment. So we are best creating an actor to aggregate all of this data and replicating the actor to each node and whenever the data changes we inform all replicants on all nodes of the changes. This means the user getting the overview doesn't even touch the database. They hit the actors, get the immutable model, convert that to the API model and then return the data.
On the other hand if a user wants to change the stock levels, we need to make sure that two users don't do it concurrently yet large DB transactions slows down the system massively. So instead we pick one node that will hold the stock management actor for that vendor and we cluster shard the actor. Any requests are routed to that actor and handled serially. The company user logs in and notes the receipt of a delivery of 20 new items. The message goes from whatever node they hit to the node holding the actor for that vendor, the vendor then makes the appropriate database changes and the broadcasts the change which is picked up by all the replicated inventory view actors to change their data.
Now this is simplistic because you have to deal with lost messages (read the articles on why reliable messaging is not necessary). However once you start to go down that road you soon realize that simply making your domain model an actor system is an anti-pattern and there are better ways to do things.
Anyway that is my 2 cents :)

General Design
Actors should generally be simple dispatchers to business logic and contain as little functionality as possible. Think of Actors as similar to a Future; when you want concurrency in scala you don't extend the Future class, you just use Future functionality around your existing logic.
Limiting your Actors to bare-bones responsibility has several advantages:
Testing the code can be done without having to construct ActorSystems, probes, ActorRefs, etc...
The business logic can easily be transplanted to other asynchronous libraries, e.g. Futures and akka streams.
It's easier to create a "proper domain model" with plain old classes and functions than it is with Actors.
Placing business logic in Actors naturally emphasizes a more object oriented code/system design rather than a functional approach (we picked scala for a reason).
Business Logic (No Akka)
Here we will setup all of the domain specific logic without using any akka related "stuff".
object BusinessLogicDomain {
type FirstName = String
type LastName = String
type Balance = Double
val defaultBalance : Balance = 0.0
case class Account(firstName : FirstName,
lastName : LastName,
balance : Balance = defaultBalance)
Lets model your account directory as a HashMap:
type AccountDirectory = HashMap[LastName, Account]
val emptyDirectory : AccountDirectory = HashMap.empty[LastName, Account]
We can now create a function that matches your requirements for distinct account per last name:
val addAccount : (AccountDirectory, Account) => AccountDirectory =
(accountDirectory, account) =>
if(accountDirectory contains account.lastName)
accountDirectory
else
accountDirectory + (account.lastName -> account)
}//end object BusinessLogicDomain
Repository (Akka)
Now that the unpolluted business code is complete, and isolated, we can add the concurrency layer on top of the foundational logic.
We can use the become functionality of Actors to store the state and respond to requests:
import BusinessLogicDomain.{Account, AccountDirectory, emptyDirectory, addAccount}
case object QueryAccountDirectory
class RepoActor(accountDirectory : AccountDirectory = emptyDirectory) extends Actor {
val statefulReceive : AccountDirectory => Receive =
currentDirectory => {
case account : Account =>
context become statefulReceive(addAccount(currentDirectory, account))
case QueryAccountDirectory =>
sender ! currentDirectory
}
override def receive : Receive = statefulReceive(accountDirectory)
}

Share state between different actor types

I created two actors Actor1 and Actor2 in one project and registered them in Program.cs:
ActorRuntime.RegisterActorAsync<Actor1>();
ActorRuntime.RegisterActorAsync<Actor2>();
but in runtime I see that each of them hosted in separate Actor Services. When I save some data in Reliable Collections via StateManager in Actor1, this state is not available for Actor2.
Is it possible to share state between different actor types?

First of all, it looks like you are using actors in wrong way. You should register one actor type per host process. For example
ActorRuntime.RegisterActorAsync<Actor1>((context, actorType) =>
new ActorService(
context,
actorType,
() => new Actor1()))
.GetAwaiter()
.GetResult();
If you want communication between actors, you need to create actor instance and call a method on its interface to save data. Then, if you want to save the same data in another actor, you need to create another instance. Actros by design are unique, separate entities. You can do it from one place and create two actors and save the data separately or you can create second actor from the first. For example
var actor1Proxy = ActorProxy.Create<IActor1>(actorId, new Uri("fabric:/MyApp/Actor1"));
var actor2Proxy = ActorProxy.Create<IActor2>(actorId, new Uri("fabric:/MyApp/Actor2"));
actor1Proxy.SaveData(data);
actor2Proxy.SaveData(data);
Probably, you need a reliable service to store your data. Please read this and this articles for more information about how to use actors.

Play framework storing Akka actors in collections

I'm creating WebSocket actors in Play (Scala).
Actors are being created somewhere else in the system, and I just need to keep them in one place, grouped by some variables.
What is the best practice to store them, and which one takes up the smallest amount of memory:
Seq[Actor]
Seq[ActorRef]
Something else?

You should NEVER store actors - the only way to access actor should be through the ActorRef
There are few patterns/practices that you could use to find your actors.
First is ActorSelection, and it would require building right actor hierarchy. For instance, you have users split by geographical location, then you might want to have actor selections like
/user/..../US/PA/18900/user1
/user/..../US/PA/18900/user2
/user/..../US/NJ/07000/user3
This way you could find all actors using selection with wildcard, although you will stick with just one property to filter them
The other way is to have data structure that would store all your flags/properties, for instance.
case class UserRef(ref: ActorRef, name: String, country: String, zip: Integer, active: Boolean)
Then, your 'directory' will store them as a users = List[UserRef] and you will be able to query this structure with one pass using users.filter(_.active = true) or users.find(_.name = "superuser")

Using Akka actors in a CRUD web application

I'm working on a web application written in Scala, using the Play! framework and Akka. The code is organized basically like this: Play controllers send messages to Akka actors. The actors, in turn, talk to a persistence layer, that abstracts database access. A typical example of usage of these components in the application:
class OrderController(orderActor: ActorRef) extends Controller {
def showOrders(customerId: Long) = {
implicit request => Async {
val futureOrders = orderActor ? FindOrdersByCustomerId(id)
// Handle the result, showing the orders list to the user or showing an error message.
}
}
}
object OrderActor extends Actor {
def receive = {
case FindOrdersByCustomerId(id) =>
sender ! OrderRepository.findByCustomerId(id)
case InsertOrder(order) =>
sender ! OrderRepository.insert(order)
//Trigger some notification, like sending an email. Maybe calling another actor.
}
}
object OrderRepository {
def findByCustomerId(id: Long): Try[List[Order]] = ???
def insert(order: Order): Try[Long] = ???
}
As you can see, this is the basic CRUD pattern, much like what you would see in other languages and frameworks. A query gets passed down to the layers below and, when the application gets a result from the database, that result comes back up until it reaches the UI. The only relevant difference is the use of actors and asynchronous calls.
Now, I'm very new to the concept of actors, so I don't quite get it yet. But, from what I've read, this is not how actors are supposed to be used. Observe, though, that in some cases (e.g. sending an email when an order is inserted) we do need true asynchronous message passing.
So, my question is: is it a good idea to use actors in this way? What are the alternatives for writing CRUD applications in Scala, taking advantage of Futures and the other concurrency capabilities of Akka?

Although actor based concurrency doesn't fit with transactional operations out of the box but that doesn't stop you from using actors that way if you play nicely with the persistence layer. If you can guarantee that the insert ( write ) is atomic then you can safely have a pool of actors doing it for you. Normally databases have a thread safe read so find should also work as expected. Apart from that if the insert is not threadsafe, you can have one single WriteActor dedicated simply for write operations and the sequential processing of messages will ensure atomicity for you.

One thing to be aware of is that an actor process one message at a time, which would be rather limiting in this case. You can use a pool of actors by using routers.
Your example define a blocking api of the repository, which might be the only thing you can do, depending on your database driver. If possible you should go for an async api there also, i.e. returning Futures. In the actor you would then instead pipe the result of the Future to the sender.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse