I have a Kafka cluster running on Mesos. I'm trying to increase number of partitions on a topic. That usually works with bin/kafka-topics.sh --alter command. Is this option exposed via dcos cli or kafka-mesos rest API? From what I see its not exposed.
If not, what is the best way to access kafka's cli within mesos installation?
Right now I use dcos cli to get broker IP and then in an adhoc way get to
/var/lib/mesos/slave/slaves/11aaafce-f12f-4aa8-9e5c-200b2a657225-S3/frameworks/11aaafce-f12f-4aa8-9e5c-200b2a657225-0001/executors/broker-1-7cf26bed-aa40-464b-b146-49b45b7800c7/runs/849ba6fb-b99e-4194-b90b-8c9b2bfabd7c/kafka_2.10-0.9.0.0/bin/kafka-console-consumer.sh
Is there a more direct way?
We've just released a new version of the Kafka framework with DC/OS 1.7. The new version supports changing the partition count via dcos kafka [--name=frameworkname] topic partitions <topicname> <count>.
See also: Service documentation ("Alter Topic Partition Count" reference)
Related
I am trying to achieve cronjob which sends a Kafka message once a day. Kafka broker is running on remote instance.
I am currently accomplishing this by installing entire Kafka package, and then having cronjob call a script which look like:
less msg.txt | <kafka_path>/bin/kafka-console-producer.sh --broker-list <AWS_INSTANCE_IP> --topic <topic> --producer.config <property.file>
Is it possible to isolate the Jar(s) kafka-console-producer.sh require so I can do this without dragging in rest of the stuff in kafka directory (i.e broker related stuff) into my system? Since they aren't required. Is there some pre-existing solution for this?
Thank you.
If you must use JVM solutions, you'd have to build your own tooling with a dependency on org.apache.kafka:kafka-clients
If you can use other solutions, kafkacat is its own standalone, portable script.
I am looking to migrate a small 4 node Kafka cluster with about 300GB of data on the each brokers to a new cluster. The problem is we are currently running Cloudera's flavor of Kafka (CDK) and we would like to run Apache Kafka. For the most part CDK is very similar to Apache Kafka but I am trying to figure out the best way to migrate. I originally looked at using MirrorMaker, but to my understanding it will re-process messages once we cut over the consumers to the new cluster so I think that is out. I was wondering if we could spin up a new Apache Kafka cluster and add it to the CDK cluster (not sure how this will work yet, if at all) then decommission the CDK server one at a time. Otherwise I am out of ideas other than spinning up a new Apache Kafka cluster and just making code changes to every producer/consumer to point to the new cluster. which I am not really a fan of as it will cause down time.
Currently running 3.1.0 which is equivalent to Apache Kafka 1.0.1
MirrorMaker would copy the data, but not consumer offsets, so they'd be left at their configured auto.offset.reset policies.
I was wondering if we could spin up a new Apache Kafka cluster and add it to the CDK cluster
If possible, that would be the most effective way to migrate the cluster. For each new broker, give it a unique broker ID and the same Zookeeper connection string as the others, then it'll be part of the same cluster.
Then, you'll need to manually run the partition reassignment tool to move all existing topic partitions off of the old brokers and onto the new ones as data will not automatically be replicated
Alternatively, you could try shutting down the CDK cluster, backing up the data directories onto new brokers, then starting the same version of Kafka from your CDK on those new machines (as the stored log format is important).
Also make sure that you backup a copy of the server.properties files for the new brokers
Can we add more brokers dynamically to Kafka Cluster and achieve Broker fail-over? Don't we get issues with metadata update?
Yes brokers can be added to increase the size of Kafka clusters. However, Kafka does not rebalance partitions onto the new brokers automatically, so this needs to be done manually or via an external tool.
There's a section in the docs that details the steps to expand a cluster: http://kafka.apache.org/documentation/#basic_ops_cluster_expansion
I am using Apache Kafka in (Plain Vanilla) Hadoop Cluster for the past few months and out of curiosity I am asking this question. Just to gain additional knowledge about it.
Kafka server.properties file already has the below parameter :
zookeeper.connect=localhost:2181
And I am starting Kafka Server/Broker with the following command :
bin/kafka-server-start.sh config/server.properties
So I assume that Kafka automatically infers the Zookeeper details by the time we start the Kafka server itself. If that's the case, then why do we need to explicitly mention the zookeeper properties while we create Kafka topics the syntax for which is given below for your reference :
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic test
As per the Kafka documentation we need to start zookeeper before starting Kafka server. So I don't think Kafka can be started by commenting out the zookeeper details in Kafka's server.properties file
But atleast can we use Kafka to create topics and to start Kafka Producer/Consumer without explicitly mentioning about zookeeper in their respective commands ?
The zookeeper.connect parameter in the Kafka properties file is needed for having each Kafka broker in the cluster connecting to the Zookeeper ensemble.
Zookeeper will keep information about connected brokers and handling the controller election. Other than that, it keeps information about topics, quotas and ACL for example.
When you use the kafka-topics.sh tool, the topic creation happens at Zookeeper level first and then thanks to it, information are propagated to Kafka brokers and topic partitions are created and assigned to them (thanks to the elected controller). This connection to Zookeeper will not be needed in the future thanks to the new Admin Client API which provides some admin operations executed against Kafka brokers directly. For example, there is a opened JIRA (https://issues.apache.org/jira/browse/KAFKA-5561) and I'm working on it for having the tool using such API for topic admin operations.
Regarding producer and consumer ... the producer doesn't need to connect to Zookeeper while only the "old" consumer (before 0.9.0 version) needs Zookeeper connection because it saves topic offsets there; from 0.9.0 version, the "new" consumer saves topic offsets in real topics (__consumer_offsets). For using it you have to use the bootstrap-server option on the command line insteand of the zookeeper one.
I want to know if Kafka and Kafka-connect can run on different servers? So a connector would be started on server A and send data from a kafka topic on server B to HDFS or S3 etc. Thanks
Yes, and for Production deployments this is typically recommended for resource reasons. Generally you'd deploy a cluster of Kafka Brokers (3+ for HA), and then a cluster of Kafka Cluster workers (as many as needed for throughput capacity / resilience) -- all on separate nodes.
For more details, see the Confluent Enterprise Reference Architecture.
Yes, you can do it.
I have my set of kafka servers and kafka connect applications are running in different machines and writing data in hdfs. you have to mention list of brokers in bootstrap.servers under worker properties file (config/connect-distributed.properties or config/connect-standalone.properties) instead of localhost:9092