I have kafka conenct running in Marathon container. If I want to update the connector plugin (jar) I have to upload the new one and then restart the Connect task.
Is it possible to do that without restarting/downtime?
The updated jar for the connector plugin needs to be added to the classpath and then the classloader for the worker needs to pick it up. The best way to do this currently is to take an outage as described here.
Depending on your connector, you might be able to do rolling upgrades, but the generic answer is that if you need to upgrade the connector plugin, you currently have to take an outage.
Related
I already saw a similar question in SO, but not clearly answer my doubts.
We have different Kafka clusters and lot of exploitation operational habits around it. We have our way to start/stop the cluster, lots of exploit scripts that help maintain the cluster etc..
Now we would like to use Kafka connect connectors for new needs, but from what I saw, Kafka connect is extremely coupled to confluent-hub.
It's like I can't even use the connectors without having to install a full operational confluent-hub.
This makes it very difficult for us to use Kafka connect connectors, I understand that confluent-hub might be a framework that help running those connectors, but it's like we can't even use a dissociated Kafka cluster ( a one not exploited by confluent-hub..).
But maybe I miss something..
Do you know if there is any way to use properly Kafka connectors on a already existing Kafka cluster ( completely independent from confluent-hub) ?
EDITED :
It's more a question regarding the high coupled behaviour between confluent-hub and Kafka-connect. All the features that comes with Kafka connect ( distributed workers to handle different fail over scenarios, etc..) are not usable without confluent-hub, thus a "need" to have Kafka cluster running exclusively via confluent-hub, which is not an easy task when you already have an existing big Kafka cluster with lots of OPS habits on it.
Kafka Connect is part of Apache Kafka. It's a pluggable framework for streaming integration between systems in and out of Kafka.
To use Kafka Connect you need connectors for the specific technology with which you want to integrate. For example, S3 sink, Elasticsearch sink, JDBC source or sink, and so on.
The connector API is part of Apache Kafka, and available for anyone who wants to develop a connector.
Connectors are written by various people and organisations, and available in various different ways. How you obtain a connector depends on which connector you want, how its licensed, and how the author has made it available for distribution. It could be you go to github, clone the repo and build the JAR. It could be you can download the JAR directly.
All that Confluent Hub does is make lots of these connectors available for you in one place, easily searchable, and with an optional CLI tool that will install them for you.
Do you have to use Confluent Hub? No, not at all. Might it make your life easier in locating connectors that you want to use, and make it easier to install them? Hopefully :)
Disclaimer: I work for Confluent.
I have an exisiting Kafka Cluster. I want to install the Kafka REST Proxy:
https://github.com/confluentinc/kafka-rest
If I install confluent does that come with Kafka? I am afraid if I still it on my master Kafka node confluent will override all my settings and mess up my Kafka cluster.
How do you install Kafka REST when you have an existing Kafka cluster?
This is not made clear on their website. I have CentOS and was going to try:
sudo yum install confluent-platform-oss-2.11
Any help would be great....
Download the Confluent Platform tarball, extract it, (or preferrably use APT/YUM) then only configure and run the REST proxy via kafka-rest-start
I wouldn't recommend using APT/YUM to install the entire confluent platform if you already have an existing Kafka. You might be able to only install kafka-rest using it, though.
Alternatively, backup your existing Kafka and Zookeeper property files, then place the Confluent Platform on top of the existing files, keeping the original files. If your Kafka is an old release, take this as a good opportunity to schedule an upgrade. Downloading Confluent isn't going to overwrite anything for the upstream Apache projects version for the corresponding release. If anything, it's an extension
Currently, at my company we are migrating from Kafka 0.8 to 0.11, brokers migration steps and clearly stated in kafka documentation here
What I am stuck in is, upgrading the kafka clients (producers, consumers, spark-streaming), I don't find any documentation/ articles listing out clearly what are the required changes or steps to follow to upgarde the client, all what I found is the java doc Producer Client
What I did so far is to change the kafka client version in my gradle to kafka-clients-0.11.0.0, and everything from the compilation point of view went fine with no code changes at all.
What I seek help with is, is there any expected problems I should take care of, any pointers for client changes other than the kafka-client version?
I went through lots of experiments to get this done.
For the consumers and producers, I just used the kafka consumers and producers 0.11.0.
The trick part was replacing spark-streaming, spark-streaming latest version only support upto kafka 0.10.X, which doesn't contains any updates related to the new broker.
What I recommend here, if you are about to write an application from scratch and your main goal is realtime streaming go for kafka-streaming API, it is just AWESOME!, if you already have spark streaming app (which was my case), you should either judge which is more important than the other wether to get stuck with the kafka-broker version 10.X and spark-streaming which was [experimental][1] btw.
The benefits of having the streaming inside kafka not spark the following:
Kafka streaming is a normal jar that can be injected in any java application, so you don't care that much about deployment, and environment
Auto-scaling is so easy when using kafka-streaming using any scaleset provided by any cloud service provider, unlike scaling a HDP cluster.
Monitoring using something like prometheus would be much easier.
I have a question about Apache Storm. My current application has multiple projects which run continuously to update latest data from other services. When a project completed, it will be started again after a modifiable interval to update data.
We're moving to Apache Storm and Kafka for new version. Our design is putting project metadata into Kafka then Storm will read from Kafka and start projects. But since we have many Bolts to process data then we haven't found out any solution to indicate when a project completed updating data to update its status so a scheduler can know that this project was done for a while then putting it to Kafka again after specific interval.
So my question is for a design like this, is there anyway to handle project status, to know when a project finished in Storm? Thank u in advance
I am new to kafka.
I have downloaded kafka 2.9.2-0.8.1.1. I have started zookeeper, broker, producer and consumer from command prompt. I successfully created a topic and sent a message.
Now I want to run the producer from eclipse. But I dont know how to to do that. I found some links like http://vulab.com/blog/?p=611 to do this but I am still not able to run it. Is this the correct process mentioned in the link? Do i really need to create maven project in eclipse? Or is there any other way to do that?
You need to use the Kafka Java API for creating kafka producer (if you intent to use JAVA). Using maven is highly recomended as it will help managing all the dependencies for you. But you are free to bypass maven if you are ready to manage all the required JARs by yourself.
kafka wiki is another good place to look at.