Apache Kafka isolate different producers - apache-kafka

I'm working on a project where different producers (each one represented by another customer) can send events to my service.
This service is responsible for receiving those events and storing them in intermediate Kafka topic, later we are fetching and processing those events.
The problem is that one customer can flood events and effect processing of events of another customers, i'm trying to find a best way to create a level of isolation between different customers!
So far, i was able to solve this, by creating different topic for each customer.
Although this solution temporary solved the issue, it seems that Kafka is not designed to handle well huge number of topics 100k+ as our producers (customers) number grew up we started to experience that controlled restart of a single broker takes up to a few hours.
Can anyone suggest a better way to create level of isolation between producers?

You can take a look at Kafka limits, that is done on Kafka broker level. By configuring producers to have different user / client-id each, you could achieve some level of limiting (so that one producer does not flood others).
See https://kafka.apache.org/documentation.html#design_quotas

With the number (100k+) that you mentioned I think that you will probably need to solve this issue in your service that sits before Kafka.
Kafka can most probably (without knowing exact numbers) handle the load that you throw at it, but there is a limit to the number of partitions per broker that can be handled in a performant way. As usual there are no fixed limits for this, but I'd say the number of partitions per broker is more in the lower 4-figures, so unless you have a fairly large cluster you probably have many more than that. This can lead to longer restart times as all these partitions have to be recovered. What you could try is to experiment with the num.recovery.threads.per.data.dir parameter and set this higher, which could bring your restart times down.
I'd recommend consolidating topics to get the number down though and implementing some sort of flow control in the service that your customers talk to, maybe add a load balancer to be able to scale that service ..

Related

Kafka Monitoring With New Relic

Problem: There's a random consumer lag in multi regional consumers and I can't figure out why and I can't get decent information from New Relic.
Context ~
I have some Kafka consumers in 2 different regions and there's significant latency between these 2 regions. Let's say Region A and Region B.
Most of my services are in Region B, including my Kafka producers, brokers and some consumers. Some consumers are in Region A. But, when they consume, they still have to make call to Region B because my database resides in Region B
Last week, my kafka consumers saw a huge lag for 3 consecutive days and then it came back to normal. I checked logs, no 5xx or 4xx errors. As a matter of fact, everything was 200.
I'm trying to build some graphs in New Relic to see how often my consumers are consuming the messages in different regions. The problem is that the new relic is setup to get metrics from the broker and it has information about offset and whatnot for consumers. When I construct any query, it will show everything under 1 region. The only difference I can see is the IP address of my consumers, which should be good enough to create a graph and see how many messages which consumer consumed in what time.
What I did ~
I wrote this query
SELECT rate(average(consumer.offset), 1 day) FROM KafkaOffsetSample FACET topic, clientHost TIMESERIES AUTO
But, the graph that I get from this seem wrong. Because offset keeps on increasing (which makes sense). If the consumers recovered after 3 days, then this offset value should go down as well. Well, at least that is my understanding.
The templates NewRelic has are pretty much useless. Bytes in and out... but nothing on the offsets and relations of consumers and producers.
There isn’t enough info here to determine whether this is network related.  Lag caused by suboptimal network performance is not uncommon.  No sure if that’s the case here. 
There’s a possibility that you may need client side metrics, broker side (which it sounds like what you have) isn’t going to likely get what you want in this case.
Also, the specific integration they are using, isn’t what NR uses internally to monitor so unfortunately there’s not much expertise internally on what information may be available.

The Java agent does have some built in Kafka client instrumentation that may help but there isn’t much by way of dashboards available AFAIK and it likely won’t help without some targeted queries.

Consuming messages in a Kafka topic ASAP

Imagine a scenario in which a producer is producing 100 messages per second, and we're working on a system that consuming messages ASAP matters a lot, even 5 seconds delay might result in a decision not to take care of that message anymore. also, the order of messages does not matter.
So I don't want to use a basic queue and a single pod listening on a single partition to consume messages, since in order to consume a message, the consumer needs to make multiple remote API calls and this might take time.
In such a scenario, I'm thinking of a single Kafka topic, with 100 partitions. and for each partition, I'm gonna have a separate machine (pod) listening for partitions 0 to 99.
Am I thinking right? this is my first project with Kafka. this seems a little weird to me.
For your use case, think of partitions = max number of instances of the service consuming data. Don't create extra partitions if you'll have 8 instances. This will have a negative impact if consumers need to be rebalanced and probably won't give you any performace improvement. Also 100 messages/s is very, very little, you can make this work with almost any technology.
To get the maximum performance I would suggest:
Use a round robin partitioner
Find a Parallel consumer implementation for your platform (for jvm)
And there a few producer and consumer properties that you'll need to change, but they depend your environment. For example batch.size, linger.ms, etc. I would also check about the need to set acks=all as it might be ok for you to lose data if a broker dies given that old data is of no use.
One warning: In Java, the standard kafka consumer is single threaded. This surprises many people and I'm not sure if the same is true for other platforms. So having 100s of partitions won't give any performance benefit with these consumers, and that's why it's important to use a Parallel Consumer.
One more warning: Kafka is a complex broker. It's trivial to start using it, but it's a very bumpy journey to use it correctly.
And a note: One of the benefits of Kafka is that it keeps the messages rather than delete them once they are consumed. If messages older than 5 seconds are useless for you, Kafka might be the wrong technology and using a more traditional broker might be easier (activeMQ, rabbitMQ or go to blazing fast ones like zeromq)
Your bottleneck is your application processing the event, not Kafka.
when you have ten consumers, there is overhead for connecting each consumer to Kafka so it will lower the performance.
I advise focusing on your application performance rather than message broker.
Kafka p99 Latency is 5 ms with 200 MB/s load.
https://developer.confluent.io/learn/kafka-performance/

Streaming audio streams trough MQ (scalability)

my question is rather specific, so I will be ok with a general answer, which will point me in the right direction.
Description of the problem:
I want to deliver specific task data from multiple producers to a particular consumer working on the task (both are docker containers run in k8s). The relation is many to many - any producer can create a data packet for any consumer. Each consumer is processing ~10 streams of data at any given moment, while each data stream consists of 100 of 160b messages per second (from different producers).
Current solution:
In our current solution, each producer has a cache of a task: (IP: PORT) pair values for consumers and uses UDP data packets to send the data directly. It is nicely scalable but rather messy in deployment.
Question:
Could this be realized in the form of a message queue of sorts (Kafka, Redis, rabbitMQ...)? E.g., having a channel for each task where producers send data while consumer - well consumes them? How many streams would be feasible to handle for the MQ (i know it would differ - suggest your best).
Edit: Would 1000 streams which equal 100 000 messages per second be feasible? (troughput for 1000 streams is 16 Mb/s)
Edit 2: Fixed packed size to 160b (typo)
Unless you need disk persistence, do not even look in message broker direction. You are just adding one problem to an other. Direct network code is a proper way to solve audio broadcast. Now if your code is messy and if you want a simplified programming model good alternative to sockets is a ZeroMQ library. This will give you all MessageBroker functionality for which you care: a) discrete messaging instead of streams, b) client discoverability; without going overboard with another software layer.
When it comes to "feasible": 100 000 messages per second with 160kb message is a lot of data and it comes to 1.6 Gb/sec even without any messaging protocol on top of it. In general Kafka shines at message throughput of small messages as it batches messages on many layers. Knowing this sustained performances of Kafka are usually constrained by disk speed, as Kafka is intentionally written this way (slowest component is disk). However your messages are very large and you need to both write and read messages at same time so I don't see it happen without large cluster installation as your problem is actual data throughput, and not number of messages.
Because you are data limited, even other classic MQ software like ActiveMQ, IBM MQ etc is actually able to cope very well with your situation. In general classic brokers are much more "chatty" than Kafka and are not able to hit message troughpout of Kafka when handling small messages. But as long as you are using large non-persistent messages (and proper broker configuration) you can expect decent performances in mb/sec from those too. Classic brokers will, with proper configuration, directly connect a socket of producer to a socket of a consumer without hitting a disk. In contrast Kafka will always persist to disk first. So they even have some latency pluses over Kafka.
However this direct socket-to-socket "optimisation" is just a full circle turn to the start of an this answer. Unless you need audio stream persistence, all you are doing with a broker-in-the-middle is finding an indirect way of binding producing sockets to consuming ones and then sending discrete messages over this connection. If that is all you need - ZeroMQ is made for this.
There is also messaging protocol called MQTT which may be something of interest to you if you choose to pursue a broker solution. As it is meant to be extremely scalable solution with low overhead.
A basic approach
As from Kafka perspective, each stream in your problem can map to one topic in Kafka and
therefore there is one producer-consumer pair per topic.
Con: If you have lots of streams, you will end up with lot of topics and IMO the solution can get messier here too as you are increasing the no. of topics.
An alternative approach
Alternatively, the best way is to map multiple streams to one topic where each stream is separated by a key (like you use IP:Port combination) and then have multiple consumers each subscribing to a specific set of partition(s) as determined by the key. Partitions are the point of scalability in Kafka.
Con: Though you can increase the no. of partitions, you cannot decrease them.
Type of data matters
If your streams are heterogeneous, in the sense that it would not be apt for all of them to share a common topic, you can create more topics.
Usually, topics are determined by the data they host and/or what their consumers do with the data in the topic. If all of your consumers do the same thing i.e. have the same processing logic, it is reasonable to go for one topic with multiple partitions.
Some points to consider:
Unlike in your current solution (I suppose), once the message is received, it doesn't get lost once it is received and processed, rather it continues to stay in the topic till the configured retention period.
Take proper care in determining the keying strategy i.e. which messages land in which partitions. As said, earlier, if all of your consumers do the same thing, all of them can be in a consumer group to share the workload.
Consumers belonging to the same group do a common task and will subscribe to a set of partitions determined by the partition assignor. Each consumer will then get a set of keys in other words, set of streams or as per your current solution, a set of one or more IP:Port pairs.

Multi-tenancy support in apache-kafka

I have a scenario where multiple clients want to produce and consume messages. Is there a way to achieve multi-tenancy in apache kafka so that one client remain intact by a huge inflow of another client.
Basically I want a way to tag client to brokers and all topic/partitions on that client will fall under the tagged brokers.
Here one client produces and consumes messages internally. They are doing this to achieve distributed processing. eg parent message is broken down to children and produces/process these children in any machine them.
One reason that clients would fail from other clients is from I/O saturation on the brokers due to those other clients.
The way to protect from that would be to enforce quotas.
Not sure I understand what you mean by "tag" since you cannot control partition placement within a cluster, so sending data from particular clients to certain brokers is not possible unless you were to first create a topic then manually re-assign the partiion and replica placements.
By client do you mean a customer tenant? Regardless of that, if the concern is to have a clear load separation between clients(to avoid denial of attack or starvation) one possibility is to have different topics for different clients. This will separate load to some extent(partition based) but not fully as still various resources are shared. Also it has its own de-merits. The demerits are
The partitions are not shared. If one client is not having enough load and when the partitions are not occupied much with read/write, the partitions are not used for another client in another topic.
Another pitfall is the number of topics will grow according to number of clients and not according to the message load. Not good again.
Another approach which is probably a better one to avoid denial of attack or starvation of a client is to limit the rate of messages produced(throttle) by a client.

Kafka: multiple consumers in the same group

Let's say I have a Kafka cluster with several topics spread over several partitions. Also, I have a cluster of applications act as clients for Kafka. Each application in that cluster has a client that is subscribed to a same set of topics, which is identical over the whole cluster. Also, each of these clients share same Kafka group ID.
Now, speaking of commit mode. I really do not want to specify offset manually, but I do not want to use autocommit either, because I need to do some handing after I receive my data from Kafka.
With this solution, I expect to occur "same data received by different consumers" problem, because I do not specify offset before I do reading (consuming), and I read data concurrently from different clients.
Now, my question: what are the solutions to get rid of multiple reads? Several options coming to my mind:
1) Exclusive (sequential) Kafka access. Until one consumer committed read, no other consumers access Kafka.
2) Somehow specify offset before each reading. I do not even know how to do that with assumption that read might fail (and offset will not be committed) - we gonna need some complicated distributed offset storage.
I'd like to ask people experienced with Kafka to recommend something to achieve behavior I need.
Every partition is consumed only by one client - another client with the same group ID won't get access to that partition, so concurrent reads won't occur...