I have kafka broker upgraded from 0.8 to 0.11, and now I am trying to upgrade the spark streaming job code to be compatible with the new kafka -I am using spark 1.6.2-.
I searched a lot for steps to follow to do this upgrade I didn't find any article either official or not-official.
The only article I found useful is this one, however it is mentioning spark 2.2 and kafka 0.10, but I got a line saying
However, because the newer integration uses the new Kafka consumer API instead of the simple API, there are notable differences in usage. This version of the integration is marked as experimental, so the API is potentially subject to change
Do anyone have tried to integrate spark streaming 1.6 with kafka 0.11, or is it better to upgrade spark first to 2.X , since there is a lack of info about and support regarding this version mix of spark-streaming and kafka?
After lots of investigations, found no way to do this move, as spark-streaming only supporting kafka version up to 0.10 (which has major differences from kafka 0.11, 1.0.X).
That's why I decided to move from spark-streaming to use the new kafka-streaming api, and simply it was awesome, simple to use, very flexible, and the big advantage is that: IT IS A LIBRARY, you can simply add it to your project, not a framework that is wrapping your code.
Kafka-streaming api almost support all functionality provided by spark (aggregation, windowing, filtering, MR).
Related
I'm running a Kafka Streams application. Here's the current version compatibility:
kafka-streams v1.1.0 (with custom changes on top of it)
kafka cluster v2.1.1
I'm planning to upgrade my Kafka cluster to v2.8.1. Is the same kafka-streams library compatible with the newer kafka version? I'm not able to find any official compatibility matrix for streams. I could find one from Confluent, but none from the official Kafka website.
I ran my Kafka Streams app against Kafka v2.8.1 and it does seems to run. I'd like to avoid any surprises at a later point in time, hence looking for any pointers to compatibility or support.
The Kafka Streams API uses the Kafka consumer and producer API underneath. The Kafka protocol is backward compatible so you should not have any problem (so having new Kafka cluster working with old Kafka clients).
Takes only into account that starting with Kafka 4.0.0 (not released yet), they are planning to remove this backward compatibility. See KIP-896 for more details here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-896%3A+Remove+old+client+protocol+API+versions+in+Kafka+4.0
I am using kafka client & streams library version 2.7.0 for building my application. However the kafka brokers(2 different clusters) are at older version ( 2.4.1 & 2.6.0).
As i understand we can use the latest clients & Streams library and it should run fine with older version of kafka brokers. Am i correct ? Is there any compatibility matrix between client & streams library with kafka brokers ?
I tried running in my application (with 2.7.0 client library) in local environment ( with kafka version 2.6.0) and it worked fine but wanted to get the supported compatibility between them
Update: As onecricketeer has helpfully pointed out, you can refer to the Kafka Compatability Matrix. He also notes:
There is a general answer. Clients above 0.10.2 work with brokers down to that version for all basic functionality until stated otherwise. Extra functionality includes transactional/idempotence and record headers, which Spring may depend on, but Kafka Streams natively has no dependency on.
Additionally, the upgrade section of the Kafka Documentation provides guidance on upgrade order for various Kafka versions.
The compatability matrix provided by the spring-cloud-stream project may also be of assistance.
While starting a topology with Kafka Spout with new Kafka version 2.1.0 and Storm version 1.2.2 getting a java.lang.ClassNotFoundException: kafka.api.OffsetRequest. I don't get this when I use Kafka version 0.10.0.1. Can you guys please help as I want to be on latest Kafka version?
I have tried all the latest kafka version starting with 2.*. But not of it works.Caused by: java.lang.ClassNotFoundException: kafka.api.OffsetRequest
kafka.api is the old Scala classes. Many of these were removed in Kafka 2.x
The majority of those classes were moved to org.apache.kafka.common.requests, and there is ListOffsetRequest and OffsetFetchRequest, so not sure which you're trying to use.
If Storm itself is depending on these old APIs, then you are bound to those, your own processor cannot use the new APIs.
Plus, the Kafka server version itself only supports certain API calls of these new request classes.
Adding to this answer, I suspect you're using the storm-kafka library for Kafka integration. You need to migrate to storm-kafka-client, which is based on the new Kafka APIs. The documentation for the new module can be found here.
If you need to migrate committed offsets from storm-kafka, you can use the utility at https://github.com/apache/storm/tree/master/external/storm-kafka-migration. It will let you migrate without having to start over on your Kafka partitions.
Currently, at my company we are migrating from Kafka 0.8 to 0.11, brokers migration steps and clearly stated in kafka documentation here
What I am stuck in is, upgrading the kafka clients (producers, consumers, spark-streaming), I don't find any documentation/ articles listing out clearly what are the required changes or steps to follow to upgarde the client, all what I found is the java doc Producer Client
What I did so far is to change the kafka client version in my gradle to kafka-clients-0.11.0.0, and everything from the compilation point of view went fine with no code changes at all.
What I seek help with is, is there any expected problems I should take care of, any pointers for client changes other than the kafka-client version?
I went through lots of experiments to get this done.
For the consumers and producers, I just used the kafka consumers and producers 0.11.0.
The trick part was replacing spark-streaming, spark-streaming latest version only support upto kafka 0.10.X, which doesn't contains any updates related to the new broker.
What I recommend here, if you are about to write an application from scratch and your main goal is realtime streaming go for kafka-streaming API, it is just AWESOME!, if you already have spark streaming app (which was my case), you should either judge which is more important than the other wether to get stuck with the kafka-broker version 10.X and spark-streaming which was [experimental][1] btw.
The benefits of having the streaming inside kafka not spark the following:
Kafka streaming is a normal jar that can be injected in any java application, so you don't care that much about deployment, and environment
Auto-scaling is so easy when using kafka-streaming using any scaleset provided by any cloud service provider, unlike scaling a HDP cluster.
Monitoring using something like prometheus would be much easier.
Refter to the python programming guide online:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/python.html
I didn't see any material related to streaming programming like kafka connector and so on.
Also from the git hub examples(https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming), I didn't see python codes either.
If it does support python in streaming programming, could you show me some examples to start with?
Thanks!
Maybe Flink 1.4 will support Python for streaming, you can see the recent PR FLINK-5886 - Python API for streaming applications.
No, Flink 1.2 does not support Python for streaming.
Flink 1.3 doesn't support it either.