Scala concurrency problem for transactions system - scala

I am trying to build a transactions system using scala. I am using akka-http as the server and rest of work is performed by plain classes and methods as of now.
The steps are :
Read balance from account.
Do some processing(take for example taking loans from other linked account).
Update balance to account(deduct the amount).
When we fire multiple transactions for same account concurrently some of them read stale balance at step 1. Because 1st transaction is yet to update balance.
Ideally all read/update operations to some account should happen sequentially irrespective of the api call.
I tried to make actor for account so that all messages are executed in a sequence to actor.
But in that case reading from account is painful as ask pattern(?) also returns Future[Any] not Future[SomeAccountDetailsObject].
Even if it may work by parsing from .asInstanceOf[Future[SomeAccountDetailsObject]] and I will have to manage and pass Future in further code which I tend to ignore as I have to read from various linked accounts in a transaction and also it is a legacy code so I want to touch it as less as possible.
Is there any possible way to tackle this problem and even better if using akka stack by isolating the effects of new code for account operations as just a method call or something

Related

Designing event-based architecture for the customer service

Being a developer with solid experience, i am only entering the world of microservices and event-driven architecture. Things like loose coupling, independent scalability and proper implementation of asynchronous business processes is something that i feel should get simplified as compared with traditional monolith approach. So giving it a try, making a simple PoC for myself.
I am considering making a simple application where user can register, login and change the customer details. However, i want to react on certain events asynchronously:
customer logs in - we send them an email, if the IP address used is new to the system.
customer changes their name, we send them an email notifying of the change.
The idea is to make a separate application that reacts on "CustomerLoggedIn", "CustomerChangeName" events.
Here i can think of three approaches, how to implement this simple functionality, with each of them having some drawbacks. So, when a customer submits their name change:
Store change name Changed name is stored in the DB + an event is sent to Kafkas when the DB transaction is completed. One of the big problems that arise here is that if a customer had 2 tabs open and almost simultaneously submits a change from initial name "Bob" to "Alice" in one tab and from "Bob" to "Jim" in another one, on a database level one of the updates overwrites the other, which is ok, however we cannot guarantee the order of the events to be the same. We can use some checks to ensure that DB update is only done when "the last version" has been seen, thus preventing the second update at all, so only one event will be emitted. But in general case, this pattern will not allow us to preserve the same order of events in the DB as in Kafka, unless we do DB change + Kafka event sending in one distributed transaction, which is anti-pattern afaik.
Change the name in the DB, and use Debezium or similar DB CDC to capture the event and stream it. Here we get a single event source, so ordering problem is solved, however what bothers me is that i lose the ability to enrich the events with business information. Another related drawback is that CDC will stream all the updates in the "customer" table regardless of the business meaning of the event. So, in this case, i will probably need to build a Kafka Streams application to convert the DB CDC events to business events and decouple the DB structure from event structure. The potential benefit of this approach is that i will be able to capture "direct" DB changes in the same manner as those originated in the application.
Emit event from the application, without storing it in the DB. One of the subscribers might to the DB persistence, another will do email sending, etc. The biggest problem i see here is - what do i return to the client? I cannot say "Ok, your name is changed", it's more like "Ok, you request has been recorded and will be processed". In case if the customer quickly hits refresh - he expects to see his new name, as we don't want to explain to the customers what's eventual consistency, do we? Also the order of processing the same event by "email sender" and "db updater" is not guaranteed, so i can send an email before the change is persisted.
I am looking for advices regarding any of these three approaches (and maybe some others i am missing), maybe the usecases when one can be preferrable over others?
It sounds to me like you want event sourcing. In event sourcing, all you need to store is the event: the current state of a customer is derived from replaying the events (either from the beginning of time, or since a snapshot: the snapshot is just an optional optimization). Some other process (there are a few ways to go about this) can then project the events to Kafka for consumption by interested parties. Since every event has a sequence number, you can use the sequence number to prevent concurrent modification (alternatively, the more actor modely event-sourcing implementations can use techniques like cluster sharding in Akka to achieve the same ends).
Doing this, you can have a "write-side" which processes the updates in a strongly consistent manner and can respond to queries which only involve a single customer having seen every update to that point (the consistency boundary basically makes customer in this case an aggregate in domain-driven-design terms). "Read-sides" consuming events are eventually consistent: the latencies are typically fairly short: in this case your services sending emails are read-sides (as would be a hypothetical panel showing names of all customers), but the customer's view of their own data could be served by the write-side.
(The separation into read-sides and write-side (the pluralization is significant) is Command Query Responsibility Segregation, which sometimes gets interpreted as "reads can only be served by a read-side". This is not totally accurate: for one thing a write-side's model needs to be read in order for the write-side to perform its task of validating commands and synchronizing updates, so nearly any CQRS-using project violates that interpretation. CQRS should instead be interpreted as "serve reads from the model that makes the most sense and avoid overcomplicating a model (including that model in the write-side) to support a new read".)
I think I qualify to answer this, having extensively used debezium for simplifying the architecture.
I would prefer Option 2:
Every transaction always results in an event emitted in correct order
Option 1/3 has a corner case, what if transaction succeeds, but application fails to emit the event?
To your point:
Another related drawback is that CDC will stream all the updates in
the "customer" table regardless of the business meaning of the event.
So, in this case, i will probably need to build a Kafka Streams
application to convert the DB CDC events to business events and
decouple the DB structure from event structure.
I really dont think that is a roadblock. The benefit you get is potentially other usecases may crop up where another consumer to this topic may want to read other columns of the table.
Option 1 and 3 are only going to tie this to your core application logic, and that is not doing any favor from simplifying PoV. With option 2, with zero code changes to core application APIs, a developer can independently work on the events, with no need to understand that core logic.

Improve Hasura Subscription Performance

we developed a web app that relies on real-time interaction between our users. We use Angular for the frontend and Hasura with GraphQL on Postgres as our backend.
What we noticed is that when more than 300 users are active at the same time we experience crucial performance losses.
Therefore, we want to improve our subscriptions setup. We think that possible issues could be:
Too many subscriptions
too large and complex subscriptions, too many forks in the subscription
Concerning 1. each user has approximately 5-10 subscriptions active when using the web app. Concerning 2. we have subscriptions that are complex as we join up to 6 tables together.
The solutions we think of:
Use more queries and limit the use of subscriptions on fields that are totally necessary to be real-time.
Split up complex queries/subscriptions in multiple smaller ones.
Are we missing another possible cause? What else can we use to improve the overall performance?
Thank you for your input!
Preface
OP question is quite broad and impossible to be answered in a general case.
So what I describe here reflects my experience with optimization of subscriptions - it's for OP to decide is it reflects their situtation.
Short description of system
Users of system: uploads documents, extracts information, prepare new documents, converse during process (IM-like functionalitty), there are AI-bots that tries to reduce the burden of repetitive tasks, services that exchange data with external systems.
There are a lot of entities, a lot of interaction between both human and robot participants. Plus quite complex authorization rules: visibility of data depends on organization, departements and content of documents.
What was on start
At first it was:
programmer wrote a graphql-query for whole data needed for application
changed query to subscription
finish
It was OK for first 2-3 monthes then:
queries became more complex and then even more complex
amount of subscriptions grew
UI became lagging
DB instance is always near 100% load. Even during nigths and weekends. Because somebody did not close application
First we did optimization of queries itself but it did not suffice:
some things are rightfully costly: JOINs, existence predicates, data itself grew significantly
network part: you can optimize DB but just to transfer all needed data has it's cost
Optimization of subscriptions
Step I. Split subscriptions: subscribe for change date, query on change
Instead of complex subscription for whole data split into parts:
A. Subscription for a single field that indicates that entity was changed
E.g.
Instead of:
subscription{
document{
id
title
# other fields
pages{ # array relation
...
}
tasks{ # array relation
...
}
# multiple other array/object relations
# pagination and ordering
}
that returns thousands of rows.
Create a function that:
accepts hasura_session - so that results are individual per user
returns just one field: max_change_date
So it became:
subscription{
doc_change_date{
max_change_date
}
}
Always one row and always one field
B. Change of application logic
Query whole data
Subscribe for doc_change_date
memorize value of max_change_date
if max_change_date changed - requery data
Notes
It's absolutely OK if subscription function sometimes returns false positives.
There is no need to replicate all predicates from source query to subscription function.
E.g.
In our case: visibility of data depends on organizations and departments (and even more).
So if a user of one department creates/modifies document - this change is not visible to user of other department.
But those changes are like ones/twice in a minute per organization.
So for subscription function we can ignore those granularity and calculate max_change_date for whole organization.
It's beneficial to have faster and cruder subscription function: it will trigger refresh of data more frequently but whole cost will be less.
Step II. Multiplex subscriptions
The first step is a crucial one.
And hasura has a multiplexing of subscriptions: https://hasura.io/docs/latest/graphql/core/databases/postgres/subscriptions/execution-and-performance.html#subscription-multiplexing
So in theory hasura could be smart enough and solve your problems.
But if you think "explicit better than implicit" there is another step you can do.
In our case:
user(s) uploads documents
combines them in dossiers
create new document types
converse with other
So subscriptions becames: doc_change_date, dossier_change_date, msg_change_date and so on.
But actually it could be beneficial to have just one subscription: "hey! there are changes for you!"
So instead of multiple subscriptions application makes just one.
Note
We thought about 2 formats of multiplexed subscription:
A. Subscription returns just one field {max_change_date} that is accumulative for all entities
B. Subscription returns more granular result: {doc_change_date, dossier_change_date, msg_change_date}
Right now "A" works for us. But maybe we change to "B" in future.
Step III. What we would do differently with hasura 2.0
That's what we did not tried yet.
Hasura 2.0 allows registering VOLATILE functions for queries.
That allows creation of functions with memoization in DB:
you define a cache for function call presumably in a table
then on function call you first look in cache
if not exists: add values to cache
return result from cache
That allows further optimizations both for subscription functions and query functions.
Note
Actually it's possible to do that without waiting for hasura 2.0 but it requires trickery on postgresql side:
you create VOLATILE function that did real work
and another function that's defined as STABLE that calls VOLATILE function. This function could be registered in hasura
It works but that's trick is hard to recommend.
Who knows, maybe future postgresql versions or updates will make it impossible.
Summary
That's everything that I can say on the topic right now.
Actually I would be glad to read something similar a year ago.
If somebody sees some pitfalls - please comment, I would be glad to hear opinions and maybe alternative ways.
I hope that this explanation will help somebody or at least provoke thought how to deal with subscriptions in other ways.

Lagom | Return Values from read side processor

We are using Lagom for developing our set of microservices. The trick here is that although we are using event sourcing and persisting events into cassandra but we have to store the data in one of the graph DB as well since it will be the one that will be serving most of the queries because of the use case.
As per the Lagom's documentation, all the insertion into Graph database(or any other database) has to be done in ReadSideProcecssor after the command handler persist the events into cassandra as followed by philosophy of CQRS.
Now here is the problem which we are facing. We believe that the ReadSideProcecssor is a listener which gets triggered after the events are generated and persisted. What we want is we could return the response back from the ReadSideProcecssor to the ServiceImpl. Example when a user is added to the system, the unique id generated by the graph has to be returned as one of the response headers. How that can be achieved in Lagom since the response is constructed from setCommandHandler and not the ReadSideProcessor.
Also, we need to make sure that if due to any error at graph side, the API should notify the client that the request has failed but again exceptions occuring in ReadSideProcessor are not propagated to either PersistentEntity or ServiceImpl class. How can that be achieved as well?
Any helps are much appreciated.
The read side processor is not a listener that is attached to the command - it is actually completely disconnected from the persistent entity, it may be running on a different node, at a different time, perhaps even years in the future if you add a new read side processor that first comes up to speed with all the old events in history. If the read side processor were connected synchronously to the command, then it would not be CQRS, there would not be segregation between the command and the query side.
Read side processors essentially poll the database for new events, processing them as they detect them. You can add a new read side processor at any time, and it will get all events from all of history, not just the new ones that are added, this is one of the great things about event sourcing, you don't need to anticipate all your query needs from the start, you can add them as the query need comes.
To further explain why you don't want a connection between the two - what happens if the event persist succeeds, but the update on the graph db fails? Perhaps the graph db is crashed. Does the command have to retry? Does the event have to be deleted? What happens if the node doing the update itself crashes before it has an opportunity to fix the problem? Now your read side is in an inconsistent state from your entities. Connecting them leads to inconsistency in many failure scenarios - for example, like when you update your address with a utility company, and but your bills still go to the old address, and you contact them, and they say "yes, your new address is updated in our system", but they still go to the old address - that's the sort of terrible user experience that you are signing your users up for if you try to connect your read side and write side together. Disconnecting allows Lagom to ensure consistency between the events you have emitted on the write side, and the consumption of them on the read side.
So to address your specific concerns: ID generation should be done on the write side, or, if a subsequent ID is generated on the read side, it should also provide a way of mapping the IDs on the write side to the read side ID. And as for handling errors on the read side - all validation should be done on the write side - the write side should ensure that it never emits an event that is invalid.
Now if the read side processor encounters something that is invalid, then it has two options. One option is it could fail. In many cases, this is a good option, since if something is invalid or inconsistent, then it's likely that either you have a bug or some form of corruption. What you don't want to do is continue processing as if everything is happy, since that might make the data corruption or inconsistency even worse. Instead the read side processor stops, your monitoring should then detect the error, and you can go in and work out either what the bug is or fix the corruption. Of course, there are downsides to doing this, your read side will start lagging behind the write side while it's unable to process new events. But that's also an advantage of CQRS - the write side is able to continue working, continue enforcing consistency, etc, the failure is just isolated to the read side, and only in updating the read side. Instead of your whole system going down and refusing to accept new requests due to this bug, it's isolated to just where the problem is.
The other option that the read side has is it can store the error somewhere - eg, store the event in a dead letter table, or raise some sort of trouble ticket, and then continue processing. This way, you can go and fix the event after the fact. This ensures greater availability, but does come at the risk that if that event that it failed to process was important to the processing of subsequent events, you've potentially just got yourself into a bigger mess.
Now this does introduce specific constraints on what you can and can't do, but I can't really anticipate those without specific knowledge of your use case to know how to address them. A common constraint is set validation - for example, how do you ensure that email addresses are unique to a single user in your system? Greg Young (the CQRS guy) wrote this blog post about those types of problems:
http://codebetter.com/gregyoung/2010/08/12/eventual-consistency-and-set-validation/

How to manage a user's game state using akka

I am trying to figure out how to manage a users game state using akka.
The game state will be persisted to mysql and this cannot change because we have other services that require this.
Anything that happens in a game is considered an "event".
Then you I have "Levels" which someone can achieve. A level is achieved when you complete all the "events" associated with it.
So you have:
Level
- event1 e.g. reach a point in the game
- event2 e.g. pickup a sword
- event3 e.g. defeat a monster
So in a game there are many levels, and 100's of events that are linked to levels.
So all "events" are sent via HTTP to my backend, and I save the event in the database.
I then have to load the users game profile in memory, and then re-calculate the Level's achieved since there was a new event that happened.
Note: This calculation cannot be done at the database level because it is a little more complicated that I am writing here.
The problem I see is that if I use akka, I can't have multiple actors processing the events for the same user, because the data can become stale.
Just to be clear, so when a new event arrives, I have to load the game profile in memory, loop through the levels and see if any of them have been achieved, if they have, update the database
e.g. update levels set achieved=true where level_id = 123 and user_id=234
e.g. actor1 loads the profile (all the levels and events for this user) and then processes the new event that just arrived in the inbox.
at the same time, actor2 loads the profile (same as actor1), and then processes the new event. When it persists the changes to mysql, the data will be out of sych.
If I was using threads, I would have to lock during the game profile calculation and persisting to the db.
How can I do this using Akka and be able to handle things in parallel, or is this scenerio not allow for it?
Let's think how you would manage it without actors. So, in nutshell, you have the following problem scenario:
two (or more) update requests arrive at the same time, both are
going to modify the same data
both requests read some stable data
state, then update it each in its own manner and persist to the DB
the modifications from the request which checked in first are lost, more precisely - overridden by the later request.
This is a classical problem. There are at least two classical solutions of it:
Optimistic locking
Pessimistic locking: it's usually achieved by applying Serializable isolation level for transactions.
It worth reading this answer with a nice comparison of both worlds.
As you're using Akka, you most probably want to prefer better concurrency and occasional failures, which are easy to recover. It goes on par with Akka motto let it crash.
So, you need to make the next steps:
Add version column to your table(s). It can be numeric or string (with hash). Numeric is the simplest one.
When you insert new record - initialize versions.
When you update the record - check version value has not changed. So, here's your update strategy:
Read record and its version.
Update record in memory.
Execute update query with criteria where rec_id=$id and version=$version.
If updated records count is 1 - you're good. If 0 - throw OptimisticLockException or smth like this.
Finally, it's time for Akka to do its job: come up with appropriate supervision strategy (I'd pick something like try again in 1 second). In actor's preRestart method return the update message back to the actor's mailbox (see Restart Hooks chapter in Akka docs).
With this strategy, even if two requests try to update the same record at a time, one of them will fail and will be immediately processed again.

CQRS + Event Sourcing: (is it correct that) Commands are generally communicated point-to-point, while Domain Events are communicated through pub/sub?

Didn't know how to shorten that title.
I'm basically trying to wrap my head around the concept of CQRS (http://en.wikipedia.org/wiki/Command-query_separation) and related concepts.
Although CQRS doesn't necessarily incorporate Messaging and Event Sourcing it seems to be a good combination (as can be seen with a lot of examples / blogposts combining these concepts )
Given a use-case for a state change for something (say to update a Question on SO), would you consider the following flow to be correct (as in best practice) ?
The system issues an aggregate UpdateQuestionCommand which might be separated into a couple of smaller commands: UpdateQuestion which is targeted at the Question Aggregate Root, and UpdateUserAction(to count points, etc) targeted at the User Aggregate Root. These are send asynchronously using point-to-point messaging.
The aggregate roots do their thing and if all goes well fire events QuestionUpdated and UserActionUpdated respectively, which contain state that is outsourced to an Event Store.. to be persisted yadayada, just to be complete, not really the point here.
These events are also put on a pub/sub queue for broadcasting. Any subscriber (among which likely one or multiple Projectors which create the Read Views) are free to subscribe to these events.
The general question: Is it indeed best practice, that Commands are communicated Point-to-Point (i.e: The receiver is known) whereas events are broadcasted (I.e: the receiver(s) are unknown) ?
Assuming the above, what would be the advantage/ disadvantage of allowing Commands to be broadcasted through pub/sub instead of point-to-point?
For example: When broadcasting Commands while using Saga's (http://blog.jonathanoliver.com/2010/09/cqrs-sagas-with-event-sourcing-part-i-of-ii/) could be a problem, since the mediation role a Saga needs to play in case of failure of one of the aggregate roots is hindered, because the saga doesn't know which aggregate roots participate to begin with.
On the other hand, I see advantages (flexibility) when broadcasting commands would be allowed.
Any help in clearing my head is highly appreciated.
Yes, for Command or Query there is only one and exactly one receiver (thus you can still load balance), but for Events there could be zero or more receivers (subscribers)