On demand horizontally scaling event driven architectures - apache-kafka

What is the best way to horizontally scale an event driven architecture when load increases?
Many people suggest using Kakfa as the message queue source for EDA
however Kafka only allows one consumer in a consumer group per
partition. Repartitioning especially during heavy load situations
can be costly and time consuming.
Having many consumers in a consumer group that take work and
acknowledge quickly would give some horizontal scaling but now
message order needs to be considered as well as load completion.
With RabbitMQ queues can be created and deleted on the fly however
that would require an additional orchestrator to help manage and
distribute load.
Also none of this addresses the load balancing problem that comes with the territory.
Any help would be appreciated. Thanks

A bit late in answering but here goes,
Your reasoning that the scaling should occur at the message bus layer is not entirely correct. If we take an end to end scenario, increased load means increase in incoming request to the front end (the API layer). See reference event driven architecture in link below.
Assuming some form of auto-scaling present (Kubernetes replication factor, Amazon autoscaler) the front end will scale out to handle extra load. After initial pre-processing the service will post the event to the message queue in event driven architecture.
In Kafka specifically the topic partition is a unit of scale out since one producer can write to one partition. Typically you would define the number of partition in advance based on throughput of single partition.
As the reference article mentions if single partition throughput is p and you need t as throughput then you need t/p partitions.
If t is the throughput of normal expected load you can create provision in advance for 2x, 3x, 10x throughput then normal by creating as many partitions.
Typically throughput on a single partition is in excess of 10 Mb/s.

Related

How to expand microservices? If Kafka is used

I have built a micro service platform based on kubernetes, but Kafka is used as MQ in the service. Now a very confusing question has arisen. Kubernetes is designed to facilitate the expansion of micro services. However, when the expansion exceeds the number of Kafka partitions, some micro services cannot be consumed. What should I do?
This is a Kafka limitation and has nothing to do with your service scheduler.
Kafka consumer groups simply cannot scale beyond the partition count. So, if you have a single partitioned topic because you care about strict event ordering, then only one replica of your service can be active and consuming from the topic, and you'd need to handle failover in specific ways that is outside the scope of Kafka itself.
If your concern is the k8s autoscaler, then you can look into the KEDA autoscaler for Kafka services
Kafka, as OneCricketeer notes, bounds the parallelism of consumption by the number of partitions.
If you couple processing with consumption, this limits the number of instances which will be performing work at any given time to the number of partitions to be consumed. Because the Kafka consumer group protocol includes support for reassigning partitions consumed by a crashed (or non-responsive...) consumer to a different consumer in the group, running more instances of the service than there are partitions at least allows for the other instances to be hot spares for fast failover.
It's possible to decouple processing from consumption. The broad outline of could be to have every instance of your service join the consumer group. Up to the number of instances consuming will actually consume from the topic. They can then make a load-balanced network request to another (or the same) instance based on the message they consume to do the processing. If you allow the consumer to have multiple requests in flight, this expands your scaling horizon to max-in-flight-requests * number-of-partitions.
If it happens that the messages in a partition don't need to be processed in order, simple round-robin load-balancing of the requests is sufficient.
Conversely, if it's the case that there are effectively multiple logical streams of messages multiplexed into a given partition (e.g. if messages are keyed by equipment ID; the second message for ID A needs to be processed after the first message, but could be processed in any order relative to messages from ID B), you can still do this, but it needs some care around ensuring ordering. Additionally, given the amount of throughput you should be able to get from a consumer of a single partition, needing to scale out to the point where you have more processing instances than partitions suggests that you'll want to investigate load-balancing approaches where if request B needs to be processed after request A (presumably because request A could affect the result of request B), A and B get routed to the same instance so they can leverage local in-memory state rather than do a read-from-db then write-to-db pas de deux.
This sort of architecture can be implemented in any language, though maintaining a reasonable level of availability and consistency is going to be difficult. There are frameworks and toolkits which can deliver a lot of this functionality: Akka (JVM), Akka.Net, and Protoactor all implement useful primitives in this area (disclaimer: I'm employed by Lightbend, which maintains and provides commercial support for one of those, though I'd have (and actually have) made the same recommendations prior to my employment there).
When consuming messages from Kafka in this style of architecture, you will definitely have to make the choice between at-most-once and at-least-once delivery guarantees and that will drive decisions around when you commit offsets. Note particularly that you need to be careful, if doing at-least-once, to not commit until every message up to that offset has been processed (or discarded), lest you end up with "at-least-zero-times", which isn't a useful guarantee. If doing at-least-once, you may also want to try for effectively-once: at-least-once with idempotent processing.

How to implement fair scheduling between multiple tennants writing to 1 stream

As of now I have single Kafka Topic with 10 partitions. We have 10000 clients who keep dumping uncontrolled data into streams. The problem currently is that
A tenant with out any notice (or little notice) floods the topic
now the messages from other tenants suffer --> because their messages (handful) are queued behind and will take several hours to get their turn for processing
Question:
Can I somehow read may be 1k messages per tenant and roundrobin --> essentially like fair scheduling of Hadoop yarn
Can Apache pulsar help me in this? If yes then is there any example you can point me to?
I went through: https://www.confluent.io/blog/prioritize-messages-in-kafka/ already; but given the volume of clients it may not be practical to have 100k partitions etc.
I'm not aware of any way to get what you want out of the box. You could probably have the consumer pause some partitions to prioritize consumption from the ones with more messages (for example, by checking the lag per partition after every few poll iterations).
I'm not familiar enough with Apache Pulsar to have a clear answer.
I have a similar problem: a single customer can monopolize the resources and delay execution from all other customers, just because their events arrived first.
On a different application with a low amount of messages, we just load all the events in memory, creating a in-memory queue for every customer and then dequeuing up to N events from each customer queue and re-queue them again into a different queue, lets call it the re-ordered queue. The re-ordered queue has a capacity limit. (lets say...100*N), so no additional elements are queue until there is space. This guarantees equal treatment to all customers.
I am facing the same problem now with an application that processes billions of messages. The solution above is impossible; there is just not enough RAM. We can't keep all the data in memory. Creating a topic for each customer also sounds overkill; specially if you have a variable set of active customers at any given point in time. Nevertheless, Pulsar seems to handle well thousands, even millions, of topics.
So the technique above may work well for you (and for me).
Just read from thousands of topics... write to another topic a limited number of messages and then wait for it to have "space" to continue enqueuing.

Streaming audio streams trough MQ (scalability)

my question is rather specific, so I will be ok with a general answer, which will point me in the right direction.
Description of the problem:
I want to deliver specific task data from multiple producers to a particular consumer working on the task (both are docker containers run in k8s). The relation is many to many - any producer can create a data packet for any consumer. Each consumer is processing ~10 streams of data at any given moment, while each data stream consists of 100 of 160b messages per second (from different producers).
Current solution:
In our current solution, each producer has a cache of a task: (IP: PORT) pair values for consumers and uses UDP data packets to send the data directly. It is nicely scalable but rather messy in deployment.
Question:
Could this be realized in the form of a message queue of sorts (Kafka, Redis, rabbitMQ...)? E.g., having a channel for each task where producers send data while consumer - well consumes them? How many streams would be feasible to handle for the MQ (i know it would differ - suggest your best).
Edit: Would 1000 streams which equal 100 000 messages per second be feasible? (troughput for 1000 streams is 16 Mb/s)
Edit 2: Fixed packed size to 160b (typo)
Unless you need disk persistence, do not even look in message broker direction. You are just adding one problem to an other. Direct network code is a proper way to solve audio broadcast. Now if your code is messy and if you want a simplified programming model good alternative to sockets is a ZeroMQ library. This will give you all MessageBroker functionality for which you care: a) discrete messaging instead of streams, b) client discoverability; without going overboard with another software layer.
When it comes to "feasible": 100 000 messages per second with 160kb message is a lot of data and it comes to 1.6 Gb/sec even without any messaging protocol on top of it. In general Kafka shines at message throughput of small messages as it batches messages on many layers. Knowing this sustained performances of Kafka are usually constrained by disk speed, as Kafka is intentionally written this way (slowest component is disk). However your messages are very large and you need to both write and read messages at same time so I don't see it happen without large cluster installation as your problem is actual data throughput, and not number of messages.
Because you are data limited, even other classic MQ software like ActiveMQ, IBM MQ etc is actually able to cope very well with your situation. In general classic brokers are much more "chatty" than Kafka and are not able to hit message troughpout of Kafka when handling small messages. But as long as you are using large non-persistent messages (and proper broker configuration) you can expect decent performances in mb/sec from those too. Classic brokers will, with proper configuration, directly connect a socket of producer to a socket of a consumer without hitting a disk. In contrast Kafka will always persist to disk first. So they even have some latency pluses over Kafka.
However this direct socket-to-socket "optimisation" is just a full circle turn to the start of an this answer. Unless you need audio stream persistence, all you are doing with a broker-in-the-middle is finding an indirect way of binding producing sockets to consuming ones and then sending discrete messages over this connection. If that is all you need - ZeroMQ is made for this.
There is also messaging protocol called MQTT which may be something of interest to you if you choose to pursue a broker solution. As it is meant to be extremely scalable solution with low overhead.
A basic approach
As from Kafka perspective, each stream in your problem can map to one topic in Kafka and
therefore there is one producer-consumer pair per topic.
Con: If you have lots of streams, you will end up with lot of topics and IMO the solution can get messier here too as you are increasing the no. of topics.
An alternative approach
Alternatively, the best way is to map multiple streams to one topic where each stream is separated by a key (like you use IP:Port combination) and then have multiple consumers each subscribing to a specific set of partition(s) as determined by the key. Partitions are the point of scalability in Kafka.
Con: Though you can increase the no. of partitions, you cannot decrease them.
Type of data matters
If your streams are heterogeneous, in the sense that it would not be apt for all of them to share a common topic, you can create more topics.
Usually, topics are determined by the data they host and/or what their consumers do with the data in the topic. If all of your consumers do the same thing i.e. have the same processing logic, it is reasonable to go for one topic with multiple partitions.
Some points to consider:
Unlike in your current solution (I suppose), once the message is received, it doesn't get lost once it is received and processed, rather it continues to stay in the topic till the configured retention period.
Take proper care in determining the keying strategy i.e. which messages land in which partitions. As said, earlier, if all of your consumers do the same thing, all of them can be in a consumer group to share the workload.
Consumers belonging to the same group do a common task and will subscribe to a set of partitions determined by the partition assignor. Each consumer will then get a set of keys in other words, set of streams or as per your current solution, a set of one or more IP:Port pairs.

Open source multi region, consistent at-least-once FIFO solution: Dedicated Queue (e.g. Kafka) vs Database (e.g. Cassandra, RethinkDB)?

I've been searching for a FIFO solution where producers and consumers can be deployed in multiple data-centers, in different regions (e.g. >20ms ping). Obviously paying the price of increased latency, the main goal is to handle transparently the increased latency, spikes in latency, link failures.
This theoretical use-case is like this:
Super Fast Producer --sticky-load-balancing-with-fail-over--> Multi-Region Processors -->
Queue(FIFO based on order established by the producer) --> Multi-Region Consumers with fail-over
Consumers should not consume from the same "queue" at the same time, however, let's not consider the scaling aspect here. If the replication and fail-over work well for one "queue" the partitioning can be applied even at the application level with a decent amount of effort.
Thoughts:
In order for fail-over to work correctly, the Queue (e.g. messages, consumer offsets) must be active-active synchronously replicated between data centers. I don't see how an active-standby asynchronous topology can work without losing messages or break FIFO in failure scenarios.
Kafka stretch cluster would be perfect, although it can span multiple availability zones (<2ms ping and stable connections), most people advise against multiple regions (>15ms ping, unstable connections).
Confluent Platform 5.4 with the synchronous replication feature is in Preview, we could fail-over consumers at the application level in case the local cluster is down. Since data is replicated synchronously we should not break FIFO or lose messages during fail-over. In order to ensure a more active-active setup, we could rotate the Consumers periodically between data centers (e.g. once or twice a day in off-peak hours).
A DB (like Cassandra) can handle consistency across multiple data-center/regions. However, a queue use-case is an anti-pattern (Using Cassandra as a Queue).
The first point would be about the pure insert/delete workload which will make the DB work really hard to remove tombstones. It is sub-optimal use of the DB, but if it can handle the workload reliably than it is not a problem IMHO
The second point is about polling, consumers will generate a large amount of quorum reads just for polling the DB even if there is no data. Again IMHO Cassandra will handle this reliably even if it is a poor use of its capabilities.
Using a DB with notifications, like CouchDB/RethinkDB. CouchDB's replication is asynchronous so I do not see how Consumers can have a consistent view of the queue. For RethinkDB I am not sure how reliable it works across regions with majority reads and writes.
Have you deployed such "queues" in production, which would you choose?
Kafka supports 2 patterns Publish-Subscribe and Message Queue. There are some places discussed the differences. here
The problem you stated can be solved using Kafka. The FIFO queue can be implemented using the topic/partition/key message. All messages with the same key will belong to the same partition hence we can achieve the FIFO attribute. In case you want to increase the consuming throughput, you just need to increase the total of partitions per topic and increase number of consumers.
Other queues such as RabbitMQ are not easy, though. For load balancing the workload, we must use the separate queue which increasing the management cost.
You can implement many kinds of delivery semantics such as at-most-once, at-least-once, exactly-once (literally) at the producer side and the consumer side. Kafka also supports multi-center deployments.
Cassandra is not designed for queue modeling, and as you said using Cassandra as a queue is an anti-pattern. It can turn quick into a nightmare.
The main problem with the queue is the deletes (Cassandra doesn't perform well for frequently updated data anyway).
Here is a link that might help you understanding delete/queue.
https://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/

Apache Kafka isolate different producers

I'm working on a project where different producers (each one represented by another customer) can send events to my service.
This service is responsible for receiving those events and storing them in intermediate Kafka topic, later we are fetching and processing those events.
The problem is that one customer can flood events and effect processing of events of another customers, i'm trying to find a best way to create a level of isolation between different customers!
So far, i was able to solve this, by creating different topic for each customer.
Although this solution temporary solved the issue, it seems that Kafka is not designed to handle well huge number of topics 100k+ as our producers (customers) number grew up we started to experience that controlled restart of a single broker takes up to a few hours.
Can anyone suggest a better way to create level of isolation between producers?
You can take a look at Kafka limits, that is done on Kafka broker level. By configuring producers to have different user / client-id each, you could achieve some level of limiting (so that one producer does not flood others).
See https://kafka.apache.org/documentation.html#design_quotas
With the number (100k+) that you mentioned I think that you will probably need to solve this issue in your service that sits before Kafka.
Kafka can most probably (without knowing exact numbers) handle the load that you throw at it, but there is a limit to the number of partitions per broker that can be handled in a performant way. As usual there are no fixed limits for this, but I'd say the number of partitions per broker is more in the lower 4-figures, so unless you have a fairly large cluster you probably have many more than that. This can lead to longer restart times as all these partitions have to be recovered. What you could try is to experiment with the num.recovery.threads.per.data.dir parameter and set this higher, which could bring your restart times down.
I'd recommend consolidating topics to get the number down though and implementing some sort of flow control in the service that your customers talk to, maybe add a load balancer to be able to scale that service ..