Axon Framework- is it possible to have an aggregate handle commands from multiple sagas? - cqrs

I'd like to use one aggregate to handle commands from multiple sagas. Unfortunately, if a saga sends a command while the aggregate is busy handling another command, the command is lost with an AggregateNotFoundException written to the log.
I can use one aggregate per saga, but I'd like to know if it is possible with one aggregate for all sagas.

in Axon, a Command Handler isn't interested in the source of the command. Therefore, it doesn't matter whether multiple Sagas send commands, or if there is just one source.
I think the problem here is more related to a race condition. If a Command results in an AggregateNotFoundException, it means that the Command creating the Aggregate hasn't been processed, yet.
Most likely, there is an issue in the model/design, that causes these race conditions to appear. However, to be able to judge that, I would need much more information about your design and what you're trying to achieve with it.

Related

Is Defining window mandatory for unbounded pcollection

New to beam/flink and would appreciate assistance in this issue.
I have a pipeline that reads from kafka avro message does some object transformation and writes again to kafka. ו did not define any window since currently we would like to handle each event separately with no aggregation.
I wonder if this correct. From what i understand in the docs seem like we cannot use the default behaviour and define some kind of window and relevant triggers.
Is my understanding correct?
Thanks
S
If you do not specify a windowing strategy for a pipeline, it will automatically run on a global window. Since no aggregations or such are to be made, this is ok.
https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/windowing.ipynb says:
All pipelines use the GlobalWindow by default. This is a single window that covers the entire PCollection.
In many cases, especially for batch pipelines, this is what we want since we want to analyze all the data that we have.
ℹ️ GlobalWindow is not very useful in a streaming pipeline unless you only need element-wise transforms. Aggregations, like GroupByKey and Combine, need to process the entire window, but a streaming pipeline has no end, so they would never finish.

Do Firebase/Firestore Transactions create internal queues?

I'm wondering if transactions (https://firebase.google.com/docs/firestore/manage-data/transactions) are viable tools to use in something like a ticketing system where users maybe be attempting to read/write to the same collection/document and whoever made the request first will be handled first and second will be handled second etc.
If not what would be a good structure for such a need with firestore?
Transactions just guarantee atomic consistent update among the documents involved in the transaction. It doesn't guarantee the order in which those transactions complete, as the transaction handler might get retried in the face of contention.
Since you tagged this question with google-cloud-functions (but didn't mention it in your question), it sounds like you might be considering writing a database trigger to handle incoming writes. Cloud Functions triggers also do not guarantee any ordering when under load.
Ordering of any kind at the scale on which Firestore and other Google Cloud products operate is a really difficult problem to solve (please read that link to get a sense of that). There is not a simple database structure that will impose an order where changes are made. I suggest you think carefully about your need for ordering, and come up with a different solution.
The best indication of order you can get is probably by adding a server timestamp to individual documents, but you will still have to figure out how to process them. The easiest thing might be to have a backend periodically query the collection, ordered by that timestamp, and process things in that order, in batch.

Nested queries with MediatR/CQRS

I'm just wondering if the design I will be trying to implement is valid CQRS.
I'm going to have a query handler that itself will send more queries to other sub-handlers. Its main task is going to aggregate results from multiple services.
Is this ok to send queries from within handlers? I can already think of 3 level deep hierachies of these in my application.
No, MediatR is designed for a single level of requests and handlers. A better design may be to create a service/manager of some kind which invokes multiple, isolated queries using MediatR and aggregate the results. The implementation may be similar to what you have in mind, except that it's not a request handler itself but rather an aggregation of multiple request handlers.
This will badly affect the system's resilience and compute time and it will increase coupling.
If any of the sub-handlers fails then the entire handler will fail. If the queries are send in a synchronous way then the total compute time is sum of the individual queries times.
One way to reuse the sub-handlers is to query them in the background, outside the client's request, it possible. In this way, when a client request comes you already have the data locally, increasing the resilience and compute time. You will be left only with the coupling but it could worth it if the reuse is heavier than the coupling.
I don't know if any of this is possible in MediatR, there are just general principles of system architecture.

CQRS, Event Sourcing and Scaling

It's clear that system based on these patterns is easily scalable. But I would like to ask you, how exactly? I have few questions regarding scalability:
How to scale aggregates? If I will create multiple instances of aggregate A, how to sync them? If one of the instances process the command and create an event, this event should be propagated to every instance of that agregate?
Shouldn't be there some business logic present which instance of the agregate to request? So if I am issuing multiple commands which applies to aggregate A (ORDERS) and applies to one specific order, it make sense to deliver it to the same instance. Or?
In this article: https://initiate.andela.com/event-sourcing-and-cqrs-a-look-at-kafka-e0c1b90d17d8,
they are using Kafka with a partitioning. So the user management service - aggregate is scaled but is subscribed only to specific partition of the topic, which contains all events of a particular user.
Thanks!
How to scale aggregates?
choose aggregates carefully, make sure your commands spread reasonably among many aggregates. You don't want to have an aggregate that likely to receive high number of command from concurrent users.
Serialize commands sent to aggregate instance. This can be done with aggregate repository and command bus/queue. But for me, the simplest way is to make optimistic locking with aggregate versioning as described in this post by Michiel Rook
which instance of the agregate to request?
In our reSolve framework we are creating instance of aggregate on every command and don't keep it between requests. This works surprisingly fast - it is faster to fetch 100 events and reduce them to aggregate state, than to find a right aggregate instance in a cluster.
This approach is scalable, lets you go serverless - one lambda invocation per command and no shared state in between. Those rare cases when aggregate has too many events are solved by snapshots.
How to scale aggregates?
The Aggregate instances are represented by their stream of events. Every Aggregate instance has its own stream of events. Events from one Aggregate instance are NOT used by other Aggregate instances. For example, if Order Aggregate with ID=1 creates an OrderWasCreated event with ID=1001, that Event will NEVER be used to rehydrate other Order Aggregate instances (with ID=2,3,4...).
That being said, you scale the Aggregates horizontally by creating shards on the Event store based on the Aggregate ID.
If I will create multiple instances of aggregate A, how to sync them? If one of the instances process the command and create an event, this event should be propagated to every instance of that agregate?
You don't. Each Aggregate instance is completely separated from other instances.
In order to be able to scale horizontally the processing of commands, it is recommended to load each time an Aggregate instance from the Event store, by replaying all its previously generated events. There is one optimization that you can do to boost performance: Aggregate snapshots, but it is recommended to do it only if it's really needed. This answer could help.
Shouldn't be there some business logic present which instance of the agregate to request? So if I am issuing multiple commands which applies to aggregate A (ORDERS) and applies to one specific order, it make sense to deliver it to the same instance. Or?
You assume that the Aggregate instances are running continuously on some servers' RAM. You could do that but such an architecture is very complex. For example, what happens when one of the servers goes down and it must be replaced by other? It's hard to determine what instances where living there and to restart them. Instead, you could have many stateless servers that could handle commands for any of the aggregate instances. When a command arrives, you identity the Aggregate ID, you load it from the Event store by replaying all its previous events and then it can execute the command. After the command is executed and the new events are persisted to the Event store, you can discard the Aggregate instance. The next command that arrives for the same Aggregate instance could be handled by any other stateless server. So, scalability is dictated only by the scalability of the Event store itself.
How to scale aggregates?
Each piece of information in the system has a single logical authority. Multiple authorities for a single piece of data gets you contention. You scale the writes by creating smaller non overlapping boundaries -- each authority has a smaller area of responsibility
To borrow from your example, an example of smaller responsibilities would
be to shift from one aggregate for all ORDERS to one aggregate for _each_
ORDER.
It's analogous to the difference between having a key value store with
all ORDERS stored in a document under one key, vs each ORDER being stored
using its own key.
Reads are safe, you can scale them out with multiple copies. Those copies are only eventually consistent, however. This means that if you ask "what is the bid price of FCOJ now?" you may get different answers from each copy. Alternatively, if you ask "what was the bid price of FCOJ at 10:09:02?" then each copy will either give you a single answer or say "I don't know yet".
But if the granularity is already one command per aggregate, what is not very often possible in my opinion, and you have really many concurrent accesses, how to solve it? How to spread the load and stay without the conflict as much as possible?
Rough sketch - each aggregate it stored via a key that can be computed from the contents of the command message. Update to the aggregate is achieved by a compare-and-swap operation using that key.
Acquire a message
Compute the storage key
Load a versioned representation from storage
Compute a new versioned representation
Store.compare and swap the new representation for the old
To provide additional traffic throughput, you add more stateless compute.
To provide storage throughput, you distribute the keys across more storage appliances.
A routing layer can be used to group messages together - the routers uses the same storage key calculation as before, but uses that to choose where in the compute farm to forward the message. The compute can then check each batch of messages it receives for duplicate keys, and process those messages together (trading some extra compute to reduce the number of compare and swaps).
Sane message protocols are important; see Marc de Graauw's Nobody Needs Reliable Messaging.

Performance of repeatedly executing the same javascript on MongoDb?

I'm looking at using some JavaScript in a MongoDb query. I have a couple of choices:
db.system.js.save the function in the db then execute it
db.myCollection.find with a $where clause and send the JS each time
exec_js in MongoEngine (which I imagine uses one of the above)
I plan to use the JavaScript in a regularly used query that's executed as part of a request to a site or API (i.e. not a batch administrative jobs) so it's important that the query executes with reasonable speed.
I'm looking at a 30ish line function.
Is the Javascript interpreted fresh each time? Will the performance be ok? Is it a sensible basis upon which to build queries?
Is the Javascript interpreted fresh each time?
Pretty much. MongoDB only has one "javascript instance" per running instance of MongoDB. You'll notice this if you try to run two different Map/Reduces at the same time.
Will the performance be ok?
Obviously, there are different definitions of "OK" here. The $where clause can not use indexes. You can combine that clause with another indexed query. In either case each object will need to be pushed from BSON over to the Javascript run-time and then acted on inside the run-time.
The process is definitely not what you would call "performant". Of course, by that measure Map/Reduce is also not very performant and people use that on production systems.
Is it a sensible basis upon which to build queries?
The real barrier here isn't the number of lines in the code, it's the number of possible documents this code will interpret. Even though it's "server-side" javascript, it's still a bunch of work that the server has to do. (in one thread, in an interpreted environment)
If you can test it and scope it correctly, it may well work out. Just don't expect miracles.
What is your point here? Write a JS script and call it regularly through cron. What should be the problem with that?