Automatically scale Axon's tracking event processors - cqrs

I am using Axon framework 4.0.3 with Spring Boot to have event sourcing, and have one tracking processor which is configured to have multiple segments/threads to process events concurrently:
axon.eventhandling.processors[my_processor].initial-segment-count = 6
axon.eventhandling.processors[my_processor].thread-count = 3
It is meant to have 2 nodes of my_processor using 3 threads each.
However, the problem with this solution is that it's not scalable. I have to know from the very beginning how many nodes and threads I must have, as it's not possible to change it later: if I increase the initial-segment-count and restart the processor, nothing happens. Even worse if I decrease the segment count: events that were meant for the "removed" segments are never processed!
Ideally it should be able to specify just the number of threads that each node should use. After that, when new nodes are added to the processor, the number of segments should scale up accordingly. Similarly if I remove nodes, the number of segments should scale down. Is this possible with Axon, or is it not designed to be scaled this way at all?

This assumption from your part is completely right - just adjusting these fields is not scalable, at runtime.
This is exactly why we have introduced the Split and Merge operations, to split/merge segments at runtime of your Axon application. See this GitHub pull request for it's introduction into the framework.
This feature will be part of Axon 4.1, which will be released today.
Do note that if you're only using the framework, this feature does not give you the automatic scaling. It will require implementation from your part, leveraging the provided split and merge API, to make it automatic.
Axon Server on the other hand provides you with a split/merge button in the UI, thus relieving you of the necessity to build this yourself.
I am fairly certain Axon Server will also introduce an auto scaling solution eventually, but not as part of release 4.1.
Hope this gives you some background Archie!

Related

What is meant by Distributed System?

I am reading about distributed systems and getting confused with what is really means?
I understand on high level, it means that set of different machines that work together to achieve a single goal.
But this definition seems too broad and loose. I would like to give some points to explain the reasons for my confusion:
I see lot of people referring the micro-services as distributed system where the functionalities like Order, Payment etc are distributed in different services, where as some other refer to multiple instances of Order service which possibly trying to serve customers and possibly use some consensus algorithm to come to consensus on shared state (eg. current Inventory level).
When talking about distributed database, I see lot of people talk about different nodes which possibly use to store/serve a part of user request like records with primary key from 'A-C' in first node 'D-F' in second node etc. On high level it looks like sharding.
When talking about distributed rate limiting. Some refer to multiple application nodes (so called distributed application nodes) using a single rate limiter, some other mention that the rate limiter itself has multiple nodes with a shared cache (like redis).
It feels that people use distributed systems to mention about microservices architecture, horizontal scaling, partitioning (sharding) and anything in between.
I am reading about distributed systems and getting confused with what is really means?
As commented by #ReinhardMänner, the good general term definition of distributed system (DS) is at https://en.wikipedia.org/wiki/Distributed_computing
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. The components interact with one another in order to achieve a common goal.
Anything that fits above definition can be referred as DS. All mentioned examples such as micro-services, distributed databases, etc. are specific applications of the concept or implementation details.
The statement "X being a distributed system" does not inherently imply any of such details and for each DS must be explicitly specified, eg. distributed database does not necessarily meaning usage of sharding.
I'll also draw from Wikipedia, but I think that the second part of the quote is more important:
A distributed system is a system whose components are located on
different networked computers, which communicate and coordinate their
actions by passing messages to one another from any system. The
components interact with one another in order to achieve a common
goal. Three significant challenges of distributed systems are:
maintaining concurrency of components, overcoming the lack of a global clock, and managing the independent failure of components. When
a component of one system fails, the entire system does not fail.
A system that constantly has to overcome these problems, even if all services are on the same node, or if they communicate via pipes/streams/files, is effectively a distributed system.
Now, trying to clear up your confusion:
Horizontal scaling was there with monoliths before microservices. Horizontal scaling is basically achieved by division of compute resources.
Division of compute requires dealing with synchronization, node failure, multiple clocks. But that is still cheaper than scaling vertically. That's where you might turn to consensus by implementing consensus in the application, or using a dedicated service e.g. Zookeeper, or abusing a DB table for that purpose.
Monoliths present 2 problems that microservices solve: address-space dependency (i.e. someone's component may crash the whole process and thus your component) and long startup times.
While microservices solve these problems, these problems aren't what makes them into a "distributed system". It doesn't matter if the different processes/nodes run the same software (monolith) or not (microservices), it matters that they are different processes that can't easily communicate directly (e.g. via function calls that promise not to fail).
In databases, scaling horizontally is also cheaper than scaling vertically, The two components of horizontal DB scaling are division of compute - effectively, a distributed system - and division of storage - sharding - as you mentioned, e.g. A-C, D-F etc..
Sharding of storage does not define distributed systems - a single compute node can handle multiple storage nodes. It's just that it's much more useful for a database that divides compute to also shard its storage, so you often see them together.
Distributed rate limiting falls under "maintaining concurrency of components". If every node does its own rate limiting, and they don't communicate, then the system-wide rate cannot be enforced. If they wait for each other to coordinate enforcement, they aren't concurrent.
Usually the solution is "approximate" rate limiting where components synchronize "occasionally".
If your components can't easily (= no latency) agree on a global rate limit, that's usually because they can't easily agree on a global anything. In that case, you're effectively dealing with a distributed system, even if all components just threads in the same process.
(that could happen e.g. if you plan to scale out but haven't done so yet, so you don't allow your threads to communicate directly.)

Real-time processing: Storm / flink vs standard application (java, c#...)

I am wondering about the choice of implementing an application processing events coming from Kafka, I have in mind two architecture patterns:
an application developed using the Apache Storm or Apache Flink framework that would process events consumed from Kafka
a Java application (or python, C#...), deployed X times (scalable depending on traffic), which would process events coming from Kafka
I find it difficult to see which of the scenarios is the most interesting.
Someone could help me on this topic ?
It's hard to give some definitive advice with so little information available. So I leave my response vague until you provide more specific information:
Choosing a processing framework over a native implementation gives you the following advantages:
Parallel processing with (in theory) infinite scalability: If you ever expect that you cannot process all events in a single thread in a timely manner, you first need to scale up (more threads) and eventually scale out (more machines). A frameworks takes care of all synchronization between threads and machines, so you just need to write sequential code glued together with some high-level primitives (similar to LINQ in C#).
Fault tolerance: What happens when your code screws up (some edge case not implemented)? When you run out of resources? When network (to Kinesis or other machines) temporarily breaks? A framework takes care of all these nasty little details.
In case of failure, when you restart application, most frameworks give you some form of exactly once processing: How do you avoid losing data? How do you avoid duplicates when reprocessing old data?
Managed state: If your application needs to remember things for a certain time (calculating sums/average or joining data), how do you ensure that the state is kept in sync with data in case of failure?
Advanced features: time triggers, complex event processing (=pattern matching on events), writing to different sinks (Kafka for low latency, s3 for batch processing)
Flexibility of storage: if you want to try out a different storage system, it's much easier to change source/sink in an application writing in a framework.
Integration in deployment platforms: If you want to scale to several machines, it's usually much easier to scale a platform that already offers related integration (at the time of writing that should be mostly Kubernetes). But all frameworks also support simple local setups where you just scale-up on one (bigger) machine.
Low-level optimizations: When using new engines with higher abstractions, it's possible that the frameworks generate code that is much more efficient than what you can implement yourself (with specific memory layout or serialized data processing).
The big downsides are usually:
Complexity of the framework: you need to understand how the framework works from a user's perspective. However, you usually save time by not going into the details of writing a custom consumer/producer, so it's not as bad as it initially seems.
Flexibility in code: you cannot write arbitrary code anymore. Since the framework handles parallelism for you, you need to think in terms of chunks of data and adjust your algorithms accordingly. Standard SQL operations are usually directly supported though in one form or another.
Less control over resource usage: since the platform schedules the task across machines, you may end up with unfortunate assignments and the platform may give you too little options to fix it. Note that most applications are more intrinsically bound to bad resource utilization because of data skew and suboptimal algorithms though.

How to run a Kafka Canary Consumer

We have a Kafka queue with two consumers, both read from the same partition (fan-out scenario). One of those consumers should be the canary and process 1% of the messages, while the other processes the 99% remaining ones.
The idea is to make the decision based on a property of the message, eg the message ID or timestamp (e.g. mod 100), and accept or drop based on that, just with a reversed logic for canary and non-canary.
Now we are facing the issue of how to do so robustly, e.g. reconfigure percentages while running and avoid loosing messages or processing them twice. It appears this escalates to a distributed consensus problem to keep the decision logic in sync, which we would very much like to avoid, even though we could just use ZooKeeper for that.
Is this a viable strategy, or are there better ways to do this? Possibly one that avoids consensus?
Update: Unfortunately the Kafka Cluster is not under our control, and we cannot make any changes.
Update 2 Latency of messages is not a huge issues, a few hundred 100ms added are okay and won't be noticed.
I dont see any way to change the "sampling strategy" across 2 machines without "ignoring" or double-processing records. Since different Kafka consumers could be in different positions in the partition, and could also get the new config at different times, you'd inevitably run into one of 2 scenarios:
Double processing of the same record by both machines
"Skipping" a record because neither machine thinks it should "own" it when it sees it.
I'd suggest a small change to your architecture instead:
Have the 99% machine (the non-canary) pick up all records, then decide for every record if it wants to handle it, or if it belongs to the canary
If it belongs to the canary, send the record to a 2nd topic (from the 99% machine)
Canary machine only listens on the 2nd topic, and processes every arriving record
And now you have a pipeline setup where decisions are only ever made in one point and no records are missed or double processed.
The obvious downside is somewhat higher latency on the canary machine. If you absolutely cannot tolerate the latency push the decision of which topic to produce to upstream to producers? (I don't know how feasible that is to you)
Variant in case a 2nd topic isnt allowed
If (as youve stated above) you cant have a 2nd topic, you could still make the decision only on the 99% machine, then for records that need to go to the canary, re-produce them into the origin partition with some sort of "marker" (either in the payload or as a kafka header, up to you).
The 99% machine will ignore any incoming records with a marker, and the canary machine will only process records with a marker.
Again, the major downside is added latency.

High availability options for Drools Fusion?

I have been digging and it seems:
1) There is no native/built-in failover solution for Drools Fusion 6
2) There is support for persistent sessions but it appears they are limited to save all/retrieve all, e.g. no ability to efficiently add and remove single events like hibernate would add/remove a single record from a DB. This would be expensive for a large, long running data set (STREAM mode)
3) Persistent sessions is a partial solution and I am unclear how we would even operate a cold/warm/hot standby
On the other hand Storm and Trident handle all aspects of failover but have limited support for CEP, I am debating using a custom solution with storm and storm tick tuples, but hate to reinvent the wheel.
I think in Storm Trident the state has to be relatively simple so it can fit into a key-value(s) pair, and the value cannot be too large. Such as a count or sum or some simple aggregation per key. Most people seem to use some time-based key and total up stuff with Trident. If there is complex state and multiple keys Storm Trident seems to falls down and cannot guarantee fully consistency between all states. Complex event processing keeps rich state such as intermediate pattern matches, derived indexes or data windows for many queries and many contexts. All that doesn't map well to Trident. Depending on your requirements Trident may be good enough.

Why do we need message brokers like RabbitMQ over a database like PostgreSQL?

I am new to message brokers like RabbitMQ which we can use to create tasks / message queues for a scheduling system like Celery.
Now, here is the question:
I can create a table in PostgreSQL which can be appended with new tasks and consumed by the consumer program like Celery.
Why on earth would I want to setup a whole new tech for this like RabbitMQ?
Now, I believe scaling cannot be the answer since our database like PostgreSQL can work in a distributed environment.
I googled for what problems does the database poses for the particular problem, and I found:
polling keeps the database busy and low performing
locking of the table -> again low performing
millions of rows of tasks -> again, polling is low performing
Now, how does RabbitMQ or any other message broker like that solves these problems?
Also, I found out that AMQP protocol is what it follows. What's great in that?
Can Redis also be used as a message broker? I find it more analogous to Memcached than RabbitMQ.
Please shed some light on this!
Rabbit's queues reside in memory and will therefore be much faster than implementing this in a database. A (good)dedicated message queue should also provide essential queuing related features such as throttling/flow control, and the ability to choose different routing algorithms, to name a couple(rabbit provides these and more). Depending on the size of your project, you may also want the message passing component separate from your database, so that if one component experiences heavy load, it need not hinder the other's operation.
As for the problems you mentioned:
polling keeping the database busy and low performing: Using Rabbitmq, producers can push updates to consumers which is far more performant than polling. Data is simply sent to the consumer when it needs to be, eliminating the need for wasteful checks.
locking of the table -> again low performing: There is no table to lock :P
millions of rows of task -> again polling is low performing: As mentioned above, Rabbitmq will operate faster as it resides RAM, and provides flow control. If needed, it can also use the disk to temporarily store messages if it runs out of RAM. After 2.0, Rabbit has significantly improved on its RAM usage. Clustering options are also available.
In regards to AMQP, I would say a really cool feature is the "exchange", and the ability for it to route to other exchanges. This gives you more flexibility and enables you to create a wide array of elaborate routing typologies which can come in very handy when scaling. For a good example, see:
(source: springsource.com)
and: http://blog.springsource.org/2011/04/01/routing-topologies-for-performance-and-scalability-with-rabbitmq/
Finally, in regards to Redis, yes, it can be used as a message broker, and can do well. However, Rabbitmq has more message queuing features than Redis, as rabbitmq was built from the ground up to be a full-featured enterprise-level dedicated message queue. Redis on the other hand was primarily created to be an in-memory key-value store(though it does much more than that now; its even referred to as a swiss army knife). Still, I've read/heard many people achieving good results with Redis for smaller sized projects, but haven't heard much about it in larger applications.
Here is an example of Redis being used in a long-polling chat implementation: http://eflorenzano.com/blog/2011/02/16/technology-behind-convore/
PostgreSQL 9.5
PostgreSQL 9.5 incorporates SELECT ... FOR UPDATE ... SKIP LOCKED. This makes implementing working queuing systems a lot simpler and easier. You may no longer require an external queueing system since it's now simple to fetch 'n' rows that no other session has locked, and keep them locked until you commit confirmation that the work is done. It even works with two-phase transactions for when external co-ordination is required.
External queueing systems remain useful, providing canned functionality, proven performance, integration with other systems, options for horizontal scaling and federation, etc. Nonetheless, for simple cases you don't really need them anymore.
Older versions
You don't need such tools, but using one may make life easier. Doing queueing in the database looks easy, but you'll discover in practice that high performance, reliable concurrent queuing is really hard to do right in a relational database.
That's why tools like PGQ exist.
You can get rid of polling in PostgreSQL by using LISTEN and NOTIFY, but that won't solve the problem of reliably handing out entries off the top of the queue to exactly one consumer while preserving highly concurrent operation and not blocking inserts. All the simple and obvious solutions you think will solve that problem actually don't in the real world, and tend to degenerate into less efficient versions of single-worker queue fetching.
If you don't need highly concurrent multi-worker queue fetches then using a single queue table in PostgreSQL is entirely reasonable.