How to implement fair scheduling between multiple tennants writing to 1 stream

How to implement fair scheduling between multiple tennants writing to 1 stream - apache-kafka

As of now I have single Kafka Topic with 10 partitions. We have 10000 clients who keep dumping uncontrolled data into streams. The problem currently is that
A tenant with out any notice (or little notice) floods the topic
now the messages from other tenants suffer --> because their messages (handful) are queued behind and will take several hours to get their turn for processing
Question:
Can I somehow read may be 1k messages per tenant and roundrobin --> essentially like fair scheduling of Hadoop yarn
Can Apache pulsar help me in this? If yes then is there any example you can point me to?
I went through: https://www.confluent.io/blog/prioritize-messages-in-kafka/ already; but given the volume of clients it may not be practical to have 100k partitions etc.

I'm not aware of any way to get what you want out of the box. You could probably have the consumer pause some partitions to prioritize consumption from the ones with more messages (for example, by checking the lag per partition after every few poll iterations).
I'm not familiar enough with Apache Pulsar to have a clear answer.

I have a similar problem: a single customer can monopolize the resources and delay execution from all other customers, just because their events arrived first.
On a different application with a low amount of messages, we just load all the events in memory, creating a in-memory queue for every customer and then dequeuing up to N events from each customer queue and re-queue them again into a different queue, lets call it the re-ordered queue. The re-ordered queue has a capacity limit. (lets say...100*N), so no additional elements are queue until there is space. This guarantees equal treatment to all customers.
I am facing the same problem now with an application that processes billions of messages. The solution above is impossible; there is just not enough RAM. We can't keep all the data in memory. Creating a topic for each customer also sounds overkill; specially if you have a variable set of active customers at any given point in time. Nevertheless, Pulsar seems to handle well thousands, even millions, of topics.
So the technique above may work well for you (and for me).
Just read from thousands of topics... write to another topic a limited number of messages and then wait for it to have "space" to continue enqueuing.

Related

Consuming messages in a Kafka topic ASAP

Imagine a scenario in which a producer is producing 100 messages per second, and we're working on a system that consuming messages ASAP matters a lot, even 5 seconds delay might result in a decision not to take care of that message anymore. also, the order of messages does not matter.
So I don't want to use a basic queue and a single pod listening on a single partition to consume messages, since in order to consume a message, the consumer needs to make multiple remote API calls and this might take time.
In such a scenario, I'm thinking of a single Kafka topic, with 100 partitions. and for each partition, I'm gonna have a separate machine (pod) listening for partitions 0 to 99.
Am I thinking right? this is my first project with Kafka. this seems a little weird to me.

For your use case, think of partitions = max number of instances of the service consuming data. Don't create extra partitions if you'll have 8 instances. This will have a negative impact if consumers need to be rebalanced and probably won't give you any performace improvement. Also 100 messages/s is very, very little, you can make this work with almost any technology.
To get the maximum performance I would suggest:
Use a round robin partitioner
Find a Parallel consumer implementation for your platform (for jvm)
And there a few producer and consumer properties that you'll need to change, but they depend your environment. For example batch.size, linger.ms, etc. I would also check about the need to set acks=all as it might be ok for you to lose data if a broker dies given that old data is of no use.
One warning: In Java, the standard kafka consumer is single threaded. This surprises many people and I'm not sure if the same is true for other platforms. So having 100s of partitions won't give any performance benefit with these consumers, and that's why it's important to use a Parallel Consumer.
One more warning: Kafka is a complex broker. It's trivial to start using it, but it's a very bumpy journey to use it correctly.
And a note: One of the benefits of Kafka is that it keeps the messages rather than delete them once they are consumed. If messages older than 5 seconds are useless for you, Kafka might be the wrong technology and using a more traditional broker might be easier (activeMQ, rabbitMQ or go to blazing fast ones like zeromq)

Your bottleneck is your application processing the event, not Kafka.
when you have ten consumers, there is overhead for connecting each consumer to Kafka so it will lower the performance.
I advise focusing on your application performance rather than message broker.
Kafka p99 Latency is 5 ms with 200 MB/s load.
https://developer.confluent.io/learn/kafka-performance/

Kafka and Event Streaming On Client Side?

I need to consume messages from a event source (represented as a single Kafka topic) producing about 50k to 250k events per second. It only provides a single partition and the ping is quite high (90-100ms).
As far as I have learned by reading the Kafka client code, during polling a fetch request is issued and once the response is fully read, the events/messages are parsed and deserialized and once enough events/messages are available consumer.poll() will provide the subset to the calling application.
In my case this makes the whole thing not worth while. The best throughput I achieve with about 2s duration per fetch request (about 2.5MB fetch.max.bytes). Smaller fetch durations will increase the idle time (time the consumer does not receive any bytes) between last byte of previous response, parsing, deserialization and sending next request and waiting for the first byte of the next request's response.
Using a fetch duration of about 2s results in a max latency of 2s which is highly undesirable. What I would like to see is while receiving the fetch response, that the messages transmitted are already available to the consumer as soon as a individual message is fully transmitted.
Since every message has an individual id and the messages are send in a particular order while only a single consumer (+thread) for a single partition is active, it is not a problem to suppress retransmitted messages in case a fetch response is aborted / fails and its messages were partially processed and later on retransmitted.
So the big question is, if the Kafka client provides a possibility to consume messages from a not-yet completed fetch response.

That is a pretty large amount of messages coming in through a single partition. Since you can't control anything on the Kafka server, the best you can do is configure your client to be as efficient as possible, assuming you have access to Kafka client configuration parameters. You didn't mention anything about needing to consume the messages as fast as they're generated, so I'm assuming you don't need that. Also I didn't see any info about average message size, how much message sizes vary, but unless those are crazy values, the suggestions below should help.
The first thing you need to do is set max.poll.records on the client side to a smallish number, say, start with 10000, and see how much throughput that gets you. Make sure to consume without doing anything with the messages, just dump them on the floor, and then call poll() again. This is just to benchmark how much performance you can get with your fixed server setup. Then, increase or decrease that number depending on if you need better throughput or latency. You should be able to get a best scenario after playing with this for a while.
After having done the above, the next step is to change your code so it dumps all received messages to an internal in-memory queue, and then call poll() again. This is especially important if processing of each message requires DB access, hitting external APIs, etc. If you take even 100ms to process 1K messages, that can reduce your performance in half in your case (100ms to poll/receive, and then another 100ms to process the messages received before you start the next poll())
Without having access to Kafka configuration parameters on the server side, I believe the above should get you pretty close to an optimal throughput for your configuration.
Feel free to post more details in your question, and I'd be happy to update my answer if that doesn't help.

To deal with such a high throughput, this is what community recommendations for number of partitions on a source topic. And it is worth considering all these factors when choosing the number of partitions.
• What is the throughput you expect to achieve for the topic?
• What is the maximum throughput you expect to achieve when
consuming from a single partition?
• If you are sending messages to partitions based on keys,
adding partitions later can be very challenging, so calculate
throughput based on your expected future usage, not the current
usage.
• Consider the number of partitions you will place on each
broker and available diskspace and network bandwidth per
broker.
So if you want to be able to write and read 1 GB/sec from a topic, and each consumer can only process 50 MB/s, then you need at least 20 partitions. This way, you can have 20 consumers reading from the topic and achieve 1 GB/sec.
Also,
Regarding the fetch.max.bytes, I am sure you have already had a glance on this one Kafka fetch max bytes doesn't work as expected.

Streaming audio streams trough MQ (scalability)

my question is rather specific, so I will be ok with a general answer, which will point me in the right direction.
Description of the problem:
I want to deliver specific task data from multiple producers to a particular consumer working on the task (both are docker containers run in k8s). The relation is many to many - any producer can create a data packet for any consumer. Each consumer is processing ~10 streams of data at any given moment, while each data stream consists of 100 of 160b messages per second (from different producers).
Current solution:
In our current solution, each producer has a cache of a task: (IP: PORT) pair values for consumers and uses UDP data packets to send the data directly. It is nicely scalable but rather messy in deployment.
Question:
Could this be realized in the form of a message queue of sorts (Kafka, Redis, rabbitMQ...)? E.g., having a channel for each task where producers send data while consumer - well consumes them? How many streams would be feasible to handle for the MQ (i know it would differ - suggest your best).
Edit: Would 1000 streams which equal 100 000 messages per second be feasible? (troughput for 1000 streams is 16 Mb/s)
Edit 2: Fixed packed size to 160b (typo)

Unless you need disk persistence, do not even look in message broker direction. You are just adding one problem to an other. Direct network code is a proper way to solve audio broadcast. Now if your code is messy and if you want a simplified programming model good alternative to sockets is a ZeroMQ library. This will give you all MessageBroker functionality for which you care: a) discrete messaging instead of streams, b) client discoverability; without going overboard with another software layer.
When it comes to "feasible": 100 000 messages per second with 160kb message is a lot of data and it comes to 1.6 Gb/sec even without any messaging protocol on top of it. In general Kafka shines at message throughput of small messages as it batches messages on many layers. Knowing this sustained performances of Kafka are usually constrained by disk speed, as Kafka is intentionally written this way (slowest component is disk). However your messages are very large and you need to both write and read messages at same time so I don't see it happen without large cluster installation as your problem is actual data throughput, and not number of messages.
Because you are data limited, even other classic MQ software like ActiveMQ, IBM MQ etc is actually able to cope very well with your situation. In general classic brokers are much more "chatty" than Kafka and are not able to hit message troughpout of Kafka when handling small messages. But as long as you are using large non-persistent messages (and proper broker configuration) you can expect decent performances in mb/sec from those too. Classic brokers will, with proper configuration, directly connect a socket of producer to a socket of a consumer without hitting a disk. In contrast Kafka will always persist to disk first. So they even have some latency pluses over Kafka.
However this direct socket-to-socket "optimisation" is just a full circle turn to the start of an this answer. Unless you need audio stream persistence, all you are doing with a broker-in-the-middle is finding an indirect way of binding producing sockets to consuming ones and then sending discrete messages over this connection. If that is all you need - ZeroMQ is made for this.
There is also messaging protocol called MQTT which may be something of interest to you if you choose to pursue a broker solution. As it is meant to be extremely scalable solution with low overhead.

A basic approach
As from Kafka perspective, each stream in your problem can map to one topic in Kafka and
therefore there is one producer-consumer pair per topic.
Con: If you have lots of streams, you will end up with lot of topics and IMO the solution can get messier here too as you are increasing the no. of topics.
An alternative approach
Alternatively, the best way is to map multiple streams to one topic where each stream is separated by a key (like you use IP:Port combination) and then have multiple consumers each subscribing to a specific set of partition(s) as determined by the key. Partitions are the point of scalability in Kafka.
Con: Though you can increase the no. of partitions, you cannot decrease them.
Type of data matters
If your streams are heterogeneous, in the sense that it would not be apt for all of them to share a common topic, you can create more topics.
Usually, topics are determined by the data they host and/or what their consumers do with the data in the topic. If all of your consumers do the same thing i.e. have the same processing logic, it is reasonable to go for one topic with multiple partitions.
Some points to consider:
Unlike in your current solution (I suppose), once the message is received, it doesn't get lost once it is received and processed, rather it continues to stay in the topic till the configured retention period.
Take proper care in determining the keying strategy i.e. which messages land in which partitions. As said, earlier, if all of your consumers do the same thing, all of them can be in a consumer group to share the workload.
Consumers belonging to the same group do a common task and will subscribe to a set of partitions determined by the partition assignor. Each consumer will then get a set of keys in other words, set of streams or as per your current solution, a set of one or more IP:Port pairs.

Handling catastrophic failover in Kafka

Let's imaging a simple message processing pipeline, like on the image below:
A group of consumers listens to a topic, picks messages one by one, does some sort of processing and sends them over to the next topic.
Some messages crash the consumer or make it stuck forever (so then a liveness probe kills the consumer after timeout).
In this case a consumer is not able to commit the offset, so the malicious message gets picked up by another consumer. And also makes it crash.
Ideally we want to move the message to a dead letter topic after N such attempts.
This can be achieved by introducing a shared storage:
But this creates coupling between the services and introduces a Single Point of Failure (SPOF) which is the shared database.
I'm looking for ideas on how to work this around with stateless services.

If your context is correct with this approach (that's something you should judge, as I'm only trying to give a suggestion), please consider decoupling the consumption and the processing.
In your case, the consumer is stopped, not because it was not able to read from kafka, and/or the kafka broker wasn't able to provide messages, but because the processing of the message was too slow and/or unsuccesful.
The consumer, in fact, was correctly receiving the messages. It was the processing of them that made it be declared dead.
First of all, the KafkaConsumer javadoc block regarding this (just above the constructor summary). The second option is the one quoted here
2. Decouple Consumption and Processing
Another alternative is to have one or more consumer threads that do
all data consumption and hands off ConsumerRecords instances to a
blocking queue consumed by a pool of processor threads that actually
handle the record processing. This option likewise has pros and cons:
PRO: This option allows independently scaling the number of consumers
and processors. This makes it possible to have a single consumer that
feeds many processor threads, avoiding any limitation on partitions.
CON: Guaranteeing order across the processors requires particular care
as the threads will execute independently an earlier chunk of data may
actually be processed after a later chunk of data just due to the luck
of thread execution timing. For processing that has no ordering
requirements this is not a problem.
CON: Manually committing the position becomes harder as it requires
that all threads co-ordinate to ensure that processing is complete for
that partition.**
Esentially, works like this. The consumer keeps reading and gives the responsibility of the processing and process-timeout management to the processor threads .
The error handling of the message processing would be responsibility of the processor threads as well. For example, if a timeout is thrown or an exception occurs, the processor will send the message to your defined "dead" queue, or whatever management of this you wish to perform, without involving the consumer. Regardless of the processor threads' success or fail, the consumer will continue its job and never be considered dead for not calling poll() in the specified timeout.
You should control the amount of messages the consumer retrieves in its poll call in order not to saturate the processors. Its a game regarding how fast the processors finish their job, how many messages the consumer retrieves (max.poll.records) at each iteration, and what's the specified timeout for the consumer.
Decoupled workflow
The first element to be quoted is the queue (with a limited size, which you should also manage in order not getting too filled - OOM).
This queue would be the link between consumer and processor threads, essentially a buffer that could dynamically get bigger or smaller depending on the specific word load at each time; It would manage overloads, something like a dam, or barrier, to find a similarity.
----->WORKERTHREAD1
KAFKA <------> CONSUMER ----> QUEUE -----|
----->WORKERTHREAD2
What you get is a second queue-lag mechanism:
1. Kafka Consumer LAG (the messages still to be read from the partition/topic)
2. Queue LAG (received messages still need to be processed)
--->WORKERTHREAD1
KAFKA <--(LAG)--> CONSUMER ----> QUEUE --(LAG)--|
--->WORKERTHREAD2
The queue could be some kind of synchronized queue, such a ConcurrentLinkedQueue. for example. Or you could manage yourself the synchronization with a customized queue.
Essentially, the duties would be divided, and the consumer is given the easiest one (as its the one that is most crucial).
Responsibilities:
Consumer
consume-->send to queue
Workers
read from queue|-->[manage timeout]
|==>PROCESS MESSAGE ==> send to topic
|-->[handle failed messages]
You should also manage if the processor threads die/deadlock; but usually those mechanisms are already implemented in most of ThreadPool variants.
I suggest the workers to share a unique KafkaProducer; The producer is thread safe and since the output topic would be the same for the group of consumers, this would also increase its performance. Also from the Kafka Producer javadoc:
The producer is thread safe and sharing a single producer instance
across threads will generally be faster than having multiple
instances.
In resume, each consumer thread feeds n processor threads. Some variants could be:
- 1 consumer - 1 worker (no processing paralellization, just division of duties)
- 1 consumer - 2 workers
- 1 consumer - 4 workers
- 2 consumers - 4 workers (2 for each)
- 2 consumers - 8 workers (4 for each)
...
Read carefully the pros and contras from this mechanism in the javadoc, and judge if this could be a solution to your specific case.
In my oppinion, there's a PRO that doesn't get reflected in the docs, which is the root of this answer/suggestion:
Consumption shouldn't be affected by processing. This approach avoids any consumer thread being considered dead due to a slow processing of the messages, and offers an extra "safety-window" thanks to the queue. I'm not saying that, at the point in which all processors fail for every message, or the queue hits maximum size, for example, the consumer would continue happily as if that didn't affect it; It will in fact be stopped by processing, but much, much later and due to bigger reasons that couldn't be avoided. This approach offers some extra time, or extra shield, for that to happen. Just like a dam can fail if it can't hold any more water.
Well, hope you take this as a suggestion, and may it be helpful somehow. It may avoid most of the dead consumer issues you're having. If well managed, it's a good approach for 24/7 real time data workflow.

Apache Kafka isolate different producers

I'm working on a project where different producers (each one represented by another customer) can send events to my service.
This service is responsible for receiving those events and storing them in intermediate Kafka topic, later we are fetching and processing those events.
The problem is that one customer can flood events and effect processing of events of another customers, i'm trying to find a best way to create a level of isolation between different customers!
So far, i was able to solve this, by creating different topic for each customer.
Although this solution temporary solved the issue, it seems that Kafka is not designed to handle well huge number of topics 100k+ as our producers (customers) number grew up we started to experience that controlled restart of a single broker takes up to a few hours.
Can anyone suggest a better way to create level of isolation between producers?

You can take a look at Kafka limits, that is done on Kafka broker level. By configuring producers to have different user / client-id each, you could achieve some level of limiting (so that one producer does not flood others).
See https://kafka.apache.org/documentation.html#design_quotas

With the number (100k+) that you mentioned I think that you will probably need to solve this issue in your service that sits before Kafka.
Kafka can most probably (without knowing exact numbers) handle the load that you throw at it, but there is a limit to the number of partitions per broker that can be handled in a performant way. As usual there are no fixed limits for this, but I'd say the number of partitions per broker is more in the lower 4-figures, so unless you have a fairly large cluster you probably have many more than that. This can lead to longer restart times as all these partitions have to be recovered. What you could try is to experiment with the num.recovery.threads.per.data.dir parameter and set this higher, which could bring your restart times down.
I'd recommend consolidating topics to get the number down though and implementing some sort of flow control in the service that your customers talk to, maybe add a load balancer to be able to scale that service ..

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse