Kafka Replication factor JMX - apache-kafka

I'm trying to collect some Kafka Telemetry when I do some replication of my messages.
I presume my bottleneck is in the network when I do replication of the record in 3 instances RF=3
I need data to support my theory, so do we have any JMX data that can tell me in Grafana how much time it takes a record to be replicated in the three machines.
Regards.

Take a look at the kafka.network:type=RequestMetrics metrics. There's a few metrics that track the time spent processing produce requests on the leader and the time spent waiting for followers to replicate records. They are highlighted in the Monitoring section in the Kafka docs:
kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce: Total time spent processing produce requests
kafka.network:type=RequestMetrics,name=LocalTimeMs,request=Produce: Time spent by the leader processing produce requests
kafka.network:type=RequestMetrics,name=RemoteTimeMs,request=Produce: Time spent waiting for followers
There are a few other metrics including RequestQueueTimeMs, ResponseQueueTimeMs, ResponseSendTimeMs each measuring the different steps brokers take when handling requests.
All of these metrics have a few attributes such as various percentiles, min, max, etc, that you should monitor to identify potential bottlenecks in your clusters.

Related

Is Kafka suitable for computation request (Use case: Trading system)?

To learn Kafka, I create a microservices application: A trading system that can be used by GUIs or trading bots. Some microservices are responsible for getting market data from different brokers, and "produce" the market data into Kafka topics. A Trading bot or a GUI can then consume these market Data on the topics. For this use-case, Kafka is the PERFECT tool!
But, What if the trading bots want to consume some metrics on the real-time data. What is the best solution to produce the result for computed metrics (Windowed, or not)?
For instance, Trading-Bot-1 wants the real-time moving average on BTCUSD price for the last 200 minutes, and Trading-Bot-2 wants the same information and the variance of the ETHBTC price from the time I have opened a position to now.
I have the feeling that Kafka is not a good choice. During my research, I have seen it is a bad practice to consider Kafka as a way to create a wire between two micro-service. Throughout my readings, I got the feeling that Kafka is not suitable for "request-style" messages, which is not purely an event as we expect a result.
The dirtiest solution would be to create Topic-1 to send metrics request (i.e, "Request XXX, Need moving average on 150 last prices for BTCEUR"), and Topic-2 to send the result ("For the request XXX, result is YYY").... Very inefficient I suppose.
Another solution would be to create a consumer group on the market data topic, and the consumers of this group will be the services responsible for producing the metrics into a kafka topic named "Metrics". Not satisfying also, it computes metrics that may never be used by the current consumers of the system, and it also limits me to a set of metrics predefined into the metrics services, so Trading Bots and GUIs won't be able to request metrics with custom inputs (i.e, computing in real-time the present value of a future with a custom risk-free rate).
Please note that I would like these metrics to be provided by my services, I don't want the consumers of the system (GUIs or trading bots) to be responsible for computing the metrics themself.
What do you guys think? Is Kafka suitable for metrics computation "in real-time" or should I consider another solution?
Many thanks!

Are 3k kafka topics decrease performance?

I have a Kafka Cluster (Using Aivan on AWS):
Kafka Hardware
Startup-2 (2 CPU, 2 GB RAM, 90 GB storage, no backups) 3-node high availability set
Ping between my consumers and the Kafka Broker is 0.7ms.
Backgroup
I have a topic such that:
It contains data about 3000 entities.
Entity lifetime is a week.
Each week there will be different 3000 entities (on avg).
Each entity may have between 15k to 50k messages in total.
There can be at most 500 messages per second.
Architecture
My team built an architecture such that there will be a group of consumers. They will parse this data, perform some transformations (without any filtering!!) and then sends the final messages back to the kafka to topic=<entity-id>.
It means I upload the data back to the kafka to a topic that contains only a data of a specific entity.
Questions
At any given time, there can be up to 3-4k topics in kafka (1 topic for each unique entity).
Can my kafka handle it well? If not, what do I need to change?
Do I need to delete a topic or it's fine to have (alot of!!) unused topics over time?
Each consumer which consumes the final messages, will consume 100 topics at the same time. I know kafka clients can consume multiple topics concurrenctly but I'm not sure what is the best practices for that.
Please share your concerns.
Requirements
Please focus on the potential problems of this architecture and try not to talk about alternative architectures (less topics, more consumers, etc).
The number of topics is not so important in itself, but each Kafka topic is partitioned and the total number of partitions could impact performance.
The general recommendation from the Apache Kafka community is to have no more than 4,000 partitions per broker (this includes replicas). The linked KIP article explains some of the possible issues you may face if the limit is breached, and with 3,000 topics it would be easy to do so unless you choose a low partition count and/or replication factor for each topic.
Choosing a low partition count for a topic is sometimes not a good idea, because it limits the parallelism of reads and writes, leading to performance bottlenecks for your clients.
Choosing a low replication factor for a topic is also sometimes not a good idea, because it increases the chance of data loss upon failure.
Generally it's fine to have unused topics on the cluster but be aware that there is still a performance impact for the cluster to manage the metadata for all these partitions and some operations will still take longer than if the topics were not there at all.
There is also a per-cluster limit but that is much higher (200,000 partitions). So your architecture might be better served simply by increasing the node count of your cluster.

Kafka Random Access to Logs

I am trying to implement a way to randomly access messages from Kafka, by using KafkaConsumer.assign(partition), KafkaConsumer.seek(partition, offset).
And then read poll for a single message.
Yet i can't get past 500 messages per second in this case. In comparison if i "subscribe" to the partition i am getting 100,000+ msg/sec. (#1000 bytes msg size)
I've tried:
Broker, Zookeeper, Consumer on the same host and on different hosts. (no replication is used)
1 and 15 partitions
default threads configuration in "server.properties" and increased to 20 (io and network)
Single consumer assigned to a different partition each time and one consumer per partition
Single thread to consume and multiple threads to consume (calling multiple different consumers)
Adding two brokers and a new topic with it's partitions on both brokers
Starting multiple Kafka Consumer Processes
Changing message sizes 5k, 50k, 100k -
In all cases the minimum i get is ~200 msg/sec. And the maximum is 500 if i use 2-3 threads. But going above, makes the ".poll()", call take longer and longer (starting from 3-4 ms on a single thread to 40-50 ms with 10 threads).
My naive kafka understanding is that the consumer opens a connection to the broker and sends a request to retrieve a small portion of it's log. While all of this has some involved latency, and retrieving a batch of messages will be much better - i would imagine that it would scale with the number of receivers involved, with the expense of increased server usage on both the VM running the consumers and the VM running the broker. But both of them are idling.
So apparently there is some synchronization happening on broker side, but i can't figure out if it is due to my usage of Kafka or some inherent limitation of using .seek
I would appreaciate some hints of whether i should try something else, or this is all i can get.
Kafka is a streaming platform by design. It means there are many, many things has been developed for accelerating sequential access. Storing messages in batches is just one thing. When you use poll() you utilize Kafka in such way and Kafka do its best. Random access is something for what Kafka don't designed.
If you want fast random access to distributed big data you would want something else. For example, distributed DB like Cassandra or in-memory system like Hazelcast.
Also you could want to transform Kafka stream to another one which would allow you to use sequential way.

Does one consumer thread against many partitions per topic in Kafka can cause latency?

Our kafka setup is as follows:
30 partitions per topic
1 consumer thread
we configured this way to be able to scale-up in the future.
we wanted to minimize the times we re-balance when we need to scale-up by adding partitions because latency is very important to us and during re-balances messages can be stuck till the coordination phase is done
Having 1 consumer thread with many partitions per 1 topic can effect somehow the overall messaging consuming latency?
More partitions in a Kafka cluster leads to higher throughput however, you need to be aware that the number of partitions has an impact on availability and latency as well.
In general more partitions,
Lead to Higher Throughput
Require More Open File Handles
May Increase Unavailability
May Increase End-to-end Latency
May Require More Memory In the Client
You need to study the trade-offs and make sure that you've picked the number of partitions that satisfies your requirements regarding throughput, latency and required resources.
For further details refer to this blog post from Confluent.
My opinion: Make some tests and write down your findings. For example, try to run a single consumer over a topic with 5, 10, 15, ... partitions, measure the impact and pick the configuration that meets your requirements. Finally ask yourself if you will ever need x partitions. At the end of the day, if you need more partitions you should not worry about re-balancing etc. Kafka was designed to be scalable .

large number of Channels Kafka

I was wondering if Kafka has any limitation or starts slowing down (due to GC or other reasons) if we have large number of channels. We have a heavy volume of data that we will be sending through Kafka (Over 2B data points). We were thinking of having about 1600 channels to start with.
Has anyone come across issues when we have such large number of channels in Kafka? Similarly, do you see issues with local DC replication with these large number of channels and lastly any foreseeable issues if we are using MirrorMaker for cross DC replication with such large number of channels
Any pointers are highly appreciated
Thanks
I believe there is no hard limit on number of topics in Kafka itself. However, since Kafka stores topic info in Zookeeper (//brokers/topics/), and Zookeeper has a 1MB limitation on max node size, there can be only a finite number of topics. Also, Kafka brokers store data for different topics in /var/kafka/data/. Performance may suffer if there are too many subdirs in /var/kafka/data/.
I haven't tried thousands of topics but Kafka with a few hundred topics works ok for my purposes. The only area where I had problems was dynamic topic creation while using high level consumer. It required client re-connection to pick up the new topics on all consumer boxes. This caused time consuming consumer re-balancing (which sometimes failed, preventing reading from some topics). As a result I had to switch to simple consumer and take care about read coordination in my code.
I'd recommend to create a simple test app that generates some random data for the number of topics you expect going forward and verify that performance is acceptable.