Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 months ago.
Improve this question
I am trying to load test my Kafka cluster with multiple producers and multiple consumers. I came across a lot of available tools and article but all of them generates load(Producer) from a single machine and similarly reads(Consumer) from a single machine.
I am looking for a tool which can be deployed across/spawn multiple producers and consumers and load test a given kafka cluster.
As input, we can give the number of producers and consumers.
It can then spawn those number of machines with producers and consumers (On AWS, Azure or GCP). Or we can spawn machines manually and then the tool can initiate producer and consumer on them.
Post that it load test's the target kafka cluster.
At the end, it gives out test results like, write/sec, read/sec etc.
Tools/Articles I checked are:
https://www.blazemeter.com/blog/apache-kafka-how-to-load-test-with-jmeter
https://medium.com/selectstarfromweb/lets-load-test-kafka-f90b71758afb
Load testing with Kafka and Storm
Load test kafka consumer
Load testing a kafka consumer
The very first article neither mentions nor assumes any limitations regarding the number of consumers/producers.
Just put the Samplers for different Kafka instances (or different topics or whatever is your test scenario) under different JMeter Thread Groups and you will be able to concurrently stress multiple endpoints.
If you prefer doing it from different machines - you can run JMeter in distributed mode and point different JMeter slave machines to stress different endpoints using If Controller and __machineName() or __machineIP() functions combination.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I would like to transfer data from one db system to any other db systems. which messaging system(Kafka, ActiveMQ, RabbitMQ.....likewise) would be better to achieve this with much throughput and performance.
I guess the answer for this type of questions is "it depends"
You probably could find many information on the internet about comparison between these message brokers,
as far as I can share from our experience and knowledge, Kafka and its ecosystem tools like kafka connect , introducing your requested behavior with source connectors and sink connectors using kafka in the middle,
Kafka connect is a framework which allows adding plugin called connectors
Sink connectors- reads from kafka and send that data to target system
Source connector- read from source store and write to kafka
Using kafka connect is "no code", calling rest api to set configuration of the connectors.
Kafka is distributed system that supports very high throughout with low latency. It supports near real time streaming of data.
Kafka is highly adopted by the biggest companies around the world.
There are many tools and vendors that support your use case, they vary in price and support, it depends from which sources you need to take data and to which targets you wish to write, should it be cdc/near real time or "batch" copy
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am currently choosing between Kafka Streams or Logstash for real-time log collect, transformation and enrichment and finally send to Elasticsearch. The logs comes from different IT network devices such as firewalls, switches, access-points etc.
Since both Kafka Streams and Logstash have almost similar functionalities, is there benefits choose 1 over another? (Performance? Easy to deploy?)
Thanks
Kafka Streams and Logstash are two completely different things
Kafka Streams is a client library that you can use to write an application to stream and process data stored in Kafka Brokers, you need to write your own application in Java.
Logstash is an ETL tool that you can use to extract/receive data from multiple sources, process this data using a wide range of filters and send it to different outputs, like elasticsearch, file, s3, kafka and many others.
It is very common to use Logstash and Kafka together, which Kafka working as a message queue for the messages that logstash will consume and process, you have shippers like Filebeat sending data to Kafka Brokers and then you use Logstash to consume this data.
You can build your own applications in Java using the Kafka Streams library to collect, process and ship the data to Elasticsearch, but this will be very complex in comparison with using the tools of the stack, Filebeat to collect logs, Logstash to receive/process, Elasticsearch to store.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I want to debug some Kafka topics so I know if the consumer or producer is at fault here.
Is there a UI for Kafka where I can see what messages a topic contain?
A dumper would also be nice so I can search for stuff on my own.
We use Landoop's Kafka Topics UI, which is pretty good. You can see topic contents and information (e.g. number of partitions, configuration, etc) and also export topic contents.
I'll second Yoni Gibb's suggestion of the Landoop product. I also use it in development and find it very useful; although you may need to tweak a few settings around timeout and size in order to see all messages. Easy to install, just pull the Docker image.
Kafkacat is useful too, but it's not quite as good for being able to monitor many topics at once and be left running.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
This post was edited and submitted for review 3 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I'm very naive about data engineering but it seems to me that a popular pipeline for data used to be Kafka to Storm to something.... but as I understand it Kafka now seems to have data processing capabilities that may often render Storm unnecessary. So my question is simply, in what scenarios might this be true that Kafka can do it all, and in what scenarios might Storm still be useful?
EDIT:
Question was flagged for "opinion based".
This question tries to understand what capabilities Apache Storm offers that Apache Kafka Streaming does not (now that Kafka Streaming exists). The accepted answer touches on that. No opinions are requested by this question nor are they necessary to address the question. Question title edited to seem more objective.
You still need to deploy the Kafka code somewhere, e.g. YARN if using Storm.
Plus, Kafka Streams can only process between the same Kafka cluster; Storm has other spouts and bolts. But Kafka Connect is one alternative to that.
Kafka has no external dependency of a cluster scheduler, and while you may deploy Kafka clients in almost any popular programming language, it still requires external instrumentation, whether that's a Docker container or deployed on bare-metal.
If anything, I'd say Heron or Flink are true comparative replacements for Storm
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a couple questions in Kafka.
1) Does Kafka have a default web UI?
2) How can we gracefully shutdown a standalone kafka server, kafka console-
consumer/console-producer.
Any solutions will be highly appreciated.
Thank you.
1) No Kafka does not have a default UI.
There are however a number of third party tools that can graphically display Kafka resources. Just Google for kafka ui and pick the tool that displays what you want and you like the most.
2) To gracefully shutdown a Kafka broker, just send a SIGTERM to the Kafka process and it will properly shutdown. This can be done via the ./bin/kafka-server-stop.sh tool.
If it's part of a cluster, new leaders will be elected on other brokers, otherwise it will simply cleanly close all its resources. Note that depending on the number of partitions, this can take a few minutes.
You can try Landoop Kafka UI: https://github.com/Landoop/fast-data-dev
They provide a nice Web-UI for Kafka topics, Avro schemata, Kafka Connect and much more.