I want kafka connect to airflow? - apache-kafka

I need to connect kafka with airflow send data from kafka to airflow and save it to a local file, tell me how best to do it. I thinks airflow-provider-kafka is this the correct approach, is there anything else?

You can install kafka sdk and connect it via PythonOperator but it seems that "airflow-provider-kafka" wrap it and do it , so seems like its good to use it and extend it if needed.
The only reason not to use "airflow-provider-kafka" its because it developed and maintained outside of airflow repo , so in the future you can have compatibility issues and breaking changes that would prevent you to upgrade your airflow version

The only reason not to use "airflow-provider-kafka" its because it developed and maintained outside of airflow repo , so in the future you can have compatibility issues and breaking changes that would prevent you to upgrade your airflow version
Use the provider.
The Kafka provider in Airflow will not be breaking when you upgrade your airflow version - this is guaranteed by the Airflow project using and adhering to SemVer. Furthermore, the provider is maintained at Astronomer who has a vested commercial interest in these providers.
(Unless/Until Airflow 3 comes out, at which time it will probably be upgraded)

Related

Rundeck upgrade from 3.3.5 to 3.4.10

Can someone share the procedure to upgrade rundeck 3.3.5 to 3.4.10
In order to over come log4j security vulnerability
The process is described here. Your instance is 3.3.5 so, you don't need to follow the database migration process, you can test in a non-prod env by launching the new instance over the old one.
Anyway, as good advice please backup all your instance data and test it in a non-prod environment before.
UPDATE 04/22/2022:
Rundeck 4.1.0 uses H2v2 as the default testing backend, please take a look at this if you're using H2 as the default backend.

Can i use different versions of cassandra in a cluster?

Can i use different versions of cassandra in a single cluster? My goal is to transfer data from one DC(A) to new DC(B) and decommission DC(A), but DC(A) is on version 3.11.3 and DC(B) is going to be *3.11.7+
* I Want to use K8ssandra deployment with metrics and other stuff. The K8ssandra project cannot deploy older versions of cassandra than 3.11.7
Thank you!
K8ssandra itself is purposefully an "opinionated" stack, which is why you can only use certain more recent and not-known to include major issues versions of Cassandra.
But, if you already have the existing cluster, that doesn't mean you can't migrate between them. Check out this blog for an example of doing that: https://k8ssandra.io/blog/tutorials/cassandra-database-migration-to-kubernetes-zero-downtime/

How to upgrade Apache Kafka 2.0 to Apache Kafka 2.6 in running environment?

we are using Apache kafka 2.0 in our production environment and now we are planning to upgrade the kafka version to 2.6 from 2.0
we are running in three broker based cluster setup
i am having the below questions.
1)is it possible to upgrade the kafka from one version to higher version?
2)while upgrading is there any data loss happen?
3)is it possible to perform while the cluster is running?
4)How to rollback to the down version if something wrong happened?
can you share your valuable thoughts for this question..
it would be helpful to setup..
Yes, upgrades are possible - http://kafka.apache.org/26/documentation.html#upgrade
Data that's already written to the topics shouldn't get lost if you follow the guide. Active clients might experience network exceptions, retries, and potential dropped packets while individual brokers are restarting.
A rolling upgrade is possible to prevent downtime
Depending on the exact version, rollbacks are not possible due to internal log format changes (as indicated in the documentation)

Kafka - Confluent Hub - Exploit only part of it

I already saw a similar question in SO, but not clearly answer my doubts.
We have different Kafka clusters and lot of exploitation operational habits around it. We have our way to start/stop the cluster, lots of exploit scripts that help maintain the cluster etc..
Now we would like to use Kafka connect connectors for new needs, but from what I saw, Kafka connect is extremely coupled to confluent-hub.
It's like I can't even use the connectors without having to install a full operational confluent-hub.
This makes it very difficult for us to use Kafka connect connectors, I understand that confluent-hub might be a framework that help running those connectors, but it's like we can't even use a dissociated Kafka cluster ( a one not exploited by confluent-hub..).
But maybe I miss something..
Do you know if there is any way to use properly Kafka connectors on a already existing Kafka cluster ( completely independent from confluent-hub) ?
EDITED :
It's more a question regarding the high coupled behaviour between confluent-hub and Kafka-connect. All the features that comes with Kafka connect ( distributed workers to handle different fail over scenarios, etc..) are not usable without confluent-hub, thus a "need" to have Kafka cluster running exclusively via confluent-hub, which is not an easy task when you already have an existing big Kafka cluster with lots of OPS habits on it.
Kafka Connect is part of Apache Kafka. It's a pluggable framework for streaming integration between systems in and out of Kafka.
To use Kafka Connect you need connectors for the specific technology with which you want to integrate. For example, S3 sink, Elasticsearch sink, JDBC source or sink, and so on.
The connector API is part of Apache Kafka, and available for anyone who wants to develop a connector.
Connectors are written by various people and organisations, and available in various different ways. How you obtain a connector depends on which connector you want, how its licensed, and how the author has made it available for distribution. It could be you go to github, clone the repo and build the JAR. It could be you can download the JAR directly.
All that Confluent Hub does is make lots of these connectors available for you in one place, easily searchable, and with an optional CLI tool that will install them for you.
Do you have to use Confluent Hub? No, not at all. Might it make your life easier in locating connectors that you want to use, and make it easier to install them? Hopefully :)
Disclaimer: I work for Confluent.

Redeploy Service Fabric application without version change

I've read about partial upgrade, but it always requires to change some parts of the packages application. I'd like to know if there's a way to redeploy a package without version change. In a way, similar to what VS is doing when deploying to the dev cluster.
On your local dev cluster, VS simply deletes the application before it starts the re-deployment. You could do the same in your production cluster, however this results in downtime since the application is not accessible during that time.
What's the reason why you wouldn't want to use the regular monitored upgrade? It has many advantages, like automatic rollbacks and so on.