what are the advantages of flink vs a golang program using for loop consuming kafka messages - apache-kafka

when consuming kafka messages, what are the advantages of flink over a simple golang/java program using for loops to consume kafka messages?
thanks in advance~
I understand that flink has better support for exactly once assurance, but are there any other advantages?

Forget processing guarantees. Flink is an ETL tool. It requires a scheduler to actually deploy any job, and has builtin libraries for SQL and connectors to external systems.
If all you care about is a lightweight binary for a Kafka-only process, then absolutely use Go, Rust, whatever. But you still need to bring-your-own scheduler (Kubernetes, perhaps?). That app won't run in Flink/YARN on its own...
BTW, Apache Beam has a Go and Java SDK, and a Flink runtime layer, so best of both?

Related

Can kafka server be installed with dotNet

Can kafka server be installed with dotnet?
I'm finding it goes with JVM. Without JVM can it work in dotnet tech stack?
Kafka Server requires the JVM, yes. It cannot run without the JVM.
Kafka Clients support multiple languages, including .NET
Currently microsoft kafka client is not actively developed/supported. Instead the author recommends to use rdkafka-dotnet It has better documentation and is easy to use
https://github.com/Microsoft/CSharpClient-for-Kafka
https://github.com/ah-/rdkafka-dotnet
But confluent has a great .net kafka library. It's easy to implement consumer groups and highly configurable.
https://www.nuget.org/packages/Confluent.Kafka/

Kafka - Confluent Hub - Exploit only part of it

I already saw a similar question in SO, but not clearly answer my doubts.
We have different Kafka clusters and lot of exploitation operational habits around it. We have our way to start/stop the cluster, lots of exploit scripts that help maintain the cluster etc..
Now we would like to use Kafka connect connectors for new needs, but from what I saw, Kafka connect is extremely coupled to confluent-hub.
It's like I can't even use the connectors without having to install a full operational confluent-hub.
This makes it very difficult for us to use Kafka connect connectors, I understand that confluent-hub might be a framework that help running those connectors, but it's like we can't even use a dissociated Kafka cluster ( a one not exploited by confluent-hub..).
But maybe I miss something..
Do you know if there is any way to use properly Kafka connectors on a already existing Kafka cluster ( completely independent from confluent-hub) ?
EDITED :
It's more a question regarding the high coupled behaviour between confluent-hub and Kafka-connect. All the features that comes with Kafka connect ( distributed workers to handle different fail over scenarios, etc..) are not usable without confluent-hub, thus a "need" to have Kafka cluster running exclusively via confluent-hub, which is not an easy task when you already have an existing big Kafka cluster with lots of OPS habits on it.
Kafka Connect is part of Apache Kafka. It's a pluggable framework for streaming integration between systems in and out of Kafka.
To use Kafka Connect you need connectors for the specific technology with which you want to integrate. For example, S3 sink, Elasticsearch sink, JDBC source or sink, and so on.
The connector API is part of Apache Kafka, and available for anyone who wants to develop a connector.
Connectors are written by various people and organisations, and available in various different ways. How you obtain a connector depends on which connector you want, how its licensed, and how the author has made it available for distribution. It could be you go to github, clone the repo and build the JAR. It could be you can download the JAR directly.
All that Confluent Hub does is make lots of these connectors available for you in one place, easily searchable, and with an optional CLI tool that will install them for you.
Do you have to use Confluent Hub? No, not at all. Might it make your life easier in locating connectors that you want to use, and make it easier to install them? Hopefully :)
Disclaimer: I work for Confluent.

Upgrading Kafka client from 0.8.2.0 to 0.11.0.0

Currently, at my company we are migrating from Kafka 0.8 to 0.11, brokers migration steps and clearly stated in kafka documentation here
What I am stuck in is, upgrading the kafka clients (producers, consumers, spark-streaming), I don't find any documentation/ articles listing out clearly what are the required changes or steps to follow to upgarde the client, all what I found is the java doc Producer Client
What I did so far is to change the kafka client version in my gradle to kafka-clients-0.11.0.0, and everything from the compilation point of view went fine with no code changes at all.
What I seek help with is, is there any expected problems I should take care of, any pointers for client changes other than the kafka-client version?
I went through lots of experiments to get this done.
For the consumers and producers, I just used the kafka consumers and producers 0.11.0.
The trick part was replacing spark-streaming, spark-streaming latest version only support upto kafka 0.10.X, which doesn't contains any updates related to the new broker.
What I recommend here, if you are about to write an application from scratch and your main goal is realtime streaming go for kafka-streaming API, it is just AWESOME!, if you already have spark streaming app (which was my case), you should either judge which is more important than the other wether to get stuck with the kafka-broker version 10.X and spark-streaming which was [experimental][1] btw.
The benefits of having the streaming inside kafka not spark the following:
Kafka streaming is a normal jar that can be injected in any java application, so you don't care that much about deployment, and environment
Auto-scaling is so easy when using kafka-streaming using any scaleset provided by any cloud service provider, unlike scaling a HDP cluster.
Monitoring using something like prometheus would be much easier.

understand and analyse the flink code, specially the scheduling part

I am working on improvement of the resource and tasks scheduling of apache flink, please share any tutorial or any document on the internal architecture of apache flink. Also share proper method of debuging the flink code.
Your help and experience of analyzing the flink code will be appreciated.
For how job scheduling works in Flink, I would start looking at https://ci.apache.org/projects/flink/flink-docs-release-1.2/internals/job_scheduling.html. Generally, the last part of the current Flink documentation contains some good information about the internals of Apache Flink.

Apache Kafka and supported platforms

Basic question, which platforms and languages does Apache Kafka currently support?
Kafka is written in Scala, which means it runs on the JVM, so you can effectively run on any OS that supports the JVM. However, the brokers extract a huge performance boost by using the OS s kernel buffer cache. Im not sure how good this is with a non-unix system like Windows. The kafka source code base provides first class support for Scala and Java Clients . You could also find producer and consumer clients in languages like Php,C++, python etc under the contrib directory.
Apache Kafka runs well and is most stable and performant on Linux (either bare metal Linux, Linux VMs in private or public clouds, or Linux based docker containers). Kafka has been known to run on Windows but most vendors that commercially support Kafka do not extend their support to Windows for production servers so it's "community supported" by the Kafka community. Kafka also runs quite well on macOS for development.
The Apache Kafka distribution includes support for Java and Scala clients only but the larger Kafka community has created a long list of clients for other languages. A good list of the available options for clients is on the apache kafka wiki here: https://cwiki.apache.org/confluence/display/KAFKA/Clients
You will find that for some languages (like C#/.Net, Python, or Go) there are 2 or 3 or even more options for client libraries. Some are up to date with the newest Kafka wire protocol changes such as Exactly-Once Semantics, and message Headers which were added in Apache Kafka 0.11 or timestamps which were added in 0.10, or the security enhancements and new consumer api added in 0.9, and others are not. Some have the full set of functions/methods provided in Java (like seek(), or consumer group management, or interceptors) but others do not. Some are written purely in the target language and others are wrappers in the librdkafka C/C++ library. Some are commercially supported by a vendor and others are not, so choose based on your needs in terms of functionality, stability, execution environment, and supportability.