How to run a Kafka connect worker in YARN? - apache-kafka

I'm playing with Kafka-Connect. I've got the HDFS connector working both in stand-alone mode and distributed mode.
They advertise that the workers (which are responsible for running the connectors) can be managed via YARN However, I haven't seen any documentation that describes how to achieve this goal.
How do I go about getting YARN to execute workers? If there is no specific approach, are there generic how-to's as to how to get an application to run within YARN?
I've used YARN with SPARK using spark-submit however, I cannot figure out how to get the connector to run in YARN.

You can theoretically run anything on YARN, even a simple hello world program. Which is why saying Kafka-Connect runs on YARN is technically correct. The caveat is that getting Kafka-Connect to run on YARN will take a fair amount of elbow grease at the moment. There are two ways to do it:
Directly talk to the YARN API to acquire a container, deploy the Kafka-Connect binaries and launch Kafka-Connect.
Use the separate Slider project https://slider.incubator.apache.org/docs/getting_started.html that Stephane has already mentioned in the comments.
Slider
You'll have to read quite a bit of documentation to get it working but the idea behind Slider is that you can get any program to run on YARN without dealing with the YARN API and writing a YARN app master by doing the following:
Create a slider package out of your program
Define a configuration for you package
Use the slider cli to deploy your application onto YARN
Slider handles container deployment and recovery of failed containers for you, which is nice. Also Slider is becoming a native part of YARN when YARN 3.0 is released.
Alternatives
Also as a side note, getting Kafka-Connect to deploy on Kubernetes or Mesos / Marathon is probably going to be easier. The basic workflow to do that would be:
Create a Kafka-Connect docker container or just use confluent's docker container
Create a deployment config for Kubernetes or Marathon
Click a button / run a command
Tutorials
A good Mesos / Marathon tutorial can be found here
Kubernetes tutorial here
Confluent Kubernetes Helm Charts here

Related

GITLAB Autoscaling in Kubernetes

I am using GitLab runner in Kubernetes for building our application. Since ours is a docker in a docker use case, we are using Kaniko to build images from a DockerFile.
I am having a hard time figuring out how to implement horizontal/vertical scaling for pods and instances.
Looking at this article https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runnersmachine-section , it says to use docker+machine image, but
I don't want to have any docker dependency in our build process especially when Kubernetes has deprecated docker support in newer versions.
Any advice?

Install Custom Connector To Kafka Connect on Kubernetes

I'm running the kafka kubenetes helm deployment, however I am unsure about how to install a custom plugin.
When running custom plugin on my local version of kafka I mount the volume /myplugin to the Docker image, and then set the plugin path environment variable.
I am unsure about how to apply this workflow to the helm charts / kubernetes deployment, mainly how to go about mounting the plugin to the Kafka Connect pod such that it can be found in the default plugin.path=/usr/share/java.
Have a look at the last few slides of https://talks.rmoff.net/QZ5nsS/from-zero-to-hero-with-kafka-connect. You can mount your plugins but the best way is to either build a new image to extend the cp-kafka-connect-base, or to install the plugin at runtime - both using Confluent Hub.

Why does Flink use Yarn?

I am taking a deep look inside Flink to see how I can use it on a project and had a question for the creators / high level thinkers... why does Flink use Yarn as the default resource manager?
Was Kubernetes considered? Or is it one of those things where we started on Yarn, it works pretty well...
I have come across many projects and articles that allow Kubernetes and Yarn to work together in cluding the Myraid project that allows yarn to be deployed with Mesos (but I am on Kubernetes...)
I have a very large compute cluster 2000 or so nodes that I use and I want to use the super cool CEP features of Flink feeding off a Kafka infrastructure (also deployed on to this kubernetes environment).
I am looking to understand the reasons behind using Yarn as the resource manager underneath Flink and if would be possible (with some effort and contribution to the project) to make Kubernetes an option alongside Yarn.
Please note - I am new to Yarn - just reading up about it. Also new to Flink and learning about the deployment and scale-out architecture.
Flink is not tied to YARN. It can also run on Apache Mesos and there are also users running it on Kubernetes. In the current version (Flink 1.4.1), there are a few things to consider when running Flink in Kubernetes (see this talk by Patrick Lucas).
The Flink community is also currently working on improving Flink's support for container setups. The effort is called FLIP-6 and will be included in the next release (Flink 1.5.0).

How to run a Flink Job on a remote YARN cluster

I have some issues to deploy a Flink job remotely through the scala API.
I have no problem with launching a Yarn session on my cluster and then run my job in command line with a jar.
What I want is to directly run my job with my IDE. How to do it in scala ?
val env = ExecutionEnvironment.createRemoteEnvironment("mymaster", 6123, "myjar-with-dependencies.jar")
This is not working, and I do realize that I am not declaring any YARN deployment with it.
Any help ?
Flink does currently (March 2017, Flink 1.2) not allow to deploy on YARN programmatically through an ExecutionEnvironment.
You could look into Flink's internal, undocumented APIs for deploying it on YARN, and then submit through the remote env.

Learning to use Kuberentes on one single computer

I'm in the need of learning how to use Kubernetes. I've read the first sentences of a couple of introductory tutorials, and never have found one which explains me, step by step, how to build a simulated real world example on a single computer.
Is Kubernetes by nature so distributed that even the 101-level tutorials can only be performed on clusters?
Or can I learn (execute important examples) the important stuff there is to know by just using my Laptop without needing to use a stack of Raspberry Pi's, AWS or GCP?
The easiest might be minikube.
Minikube is a tool that makes it easy to run Kubernetes locally.
Minikube runs a single-node Kubernetes cluster inside a VM on your
laptop for users looking to try out Kubernetes or develop with it
day-to-day.
For a resource that explains how to use this, try this getting started guide. It runs through an entire example application using a local development environment.
If you are okay with using Google Cloud Platform (I think one gets free credits initially), there is hello-node.
If you want to run the latest and greatest (not necessary stable) and you're using Linux, is also possible to spin up a local cluster on Linux from a cloned copy of the kubernetes sources, using hack/local_up_cluster.sh.