Apache Kafka patch release process - apache-kafka

How Kafka will release the patch updates?
How users will get to know Kafka patch updates?

Kafka is typically available as a zip/tar that contains the binary files which we will use to start/stop/manage Kafka. You may want to:
Subscribe to https://kafka.apache.org/downloads by generating a feed for it.
Subscribe to any feeds that give you updates
Write a script that checks for new kafka releases https://downloads.apache.org/kafka/ periodically to notify or download.
The Kafka versioning format typically is major.minor.patch release.
Every time, there is a new Kafka release, we need to download the latest zip, use the old configuration files (make changes if required) and start Kafka using new binaries. The upgrade process is fully documented in the Upgrading section at https://kafka.apache.org/documentation
For production environments, we have several options:
1. Using Managed Kafka Service (like in AWS, Azure, Confluent etc)
In this case, we need not worry about patching and security updates to Kafka because it is taken care by the service provider itself. For AWS, you will typically get notifications in the Console regarding when your Kafka update is scheduled.
It is easy to get started to use Managed Kafka service for production environments.
2. Using self-hosted kafka in Kubernetes (eg, using Strimzi)
If you are running Kafka in Kubernetes environment, you can use Strimzi operator and helm upgrade to update to the version you require. You need to update helm chart info from repository using helm repo update.
Managed services and Kubernetes operators make managing easy, however, manually managing Kafka clusters is relatively difficult.

Related

How to start/trigger a job when a new version of deployment is released (image updated) on Kubernetes?

I have two environments (clusters): Production and Staging with two independent databases. They are both deployed on Kubernetes and production doesn't have a fixed schedule for new deployments but it happens on a weekly basis (roughly).
I would like to sync the production database with the staging database every time that a new release is deployed to production (kubernetes deployment is updated with new image).
Is there a way that I can set a job/cronjob to be triggered everytime this even happen?
The deployments are done using ArgoCD to pull the changes in the deployment manifest from a github repository.
I don't think this functionality is inherent to kubernetes; you are asking about something custom that can be implemented in a variety of ways (depending on your tool stack)
e.g.
if you are using helm to install to Production, you can use a post-install hook that triggers a Job that does what you want.
Perhaps ArgoCD has some post-install functionality that can also create a Job resource doing what you want.
I think you can also use a tool like Kyverno and write a policy to generate a K8s job upon any resource created in K8s.
This is exactly the case what Argo Events is for.
https://argoproj.github.io/argo-events/
There are many ways to implement this, but it depends on your exact situation how it’s best for you.
Eg. if you can use a Git tag event’s webhook you could go with an HTTP trigger to initiate a Job or Argo Workflow.

apache ambari local repository cloudera

i have a production cluster using ambari from hortonworks. Now cloudera has blocked every access to hdp repository, because a paid support license is needed.
This hit us really hard because we have big infrastructure using ambari, kafka, ans storm.
I'm trying to build ambari from source but i think that a local hdp repository is needed.
Anyone know how to build a repo strarting from kafka and storm source?

Test/QA/Prod setup with a Kafka Schema Registry

We are working with Kafka, and have three environments set up: A test environment for the test system, a QA environment and a production environment. Pretty standard. Now we are getting started with using Avro, and have set up a schema registry. The same installation setup: Test, QA and production. But now we get a bit uncertain about how to use the environments together. We have looked around a bit but haven't really found any examples of how to setup a Kafka test/qa/prod environment with a schema registry. These are the three approaches we have discussed internally:
Should we use Schema Registry Prod for all environments? Just as we do with our other artifact repositories.
With Nexus, Artifactory, Harbor etc. we use one instance for handling both developer version and release versions of artifacts. So our initial approach was to do the same with the Schema Registry. But here there is a difference though: with our other artifact repositories we have SNAPSHOT support and different spaces (builds/releases etc.) which we have not seen people using with a Schema Registry. So even though this approach was our initial, and it should work since we plan on using compatibility FULL_TRANSITIVE, we are now getting doubtful about sending each development/test version of a contract all the way to production. For example, FULL_TRANSITIVE would make it impossible to make non-compatible changes even during development.
Use the Test schema registry for test environment, and prod schema registry for Prod environment.
Straight forward, use "Kafka Schema Registry Test" in our test environment, with our Kafka test installation. And "Kafka Schema Registry Prod" only in our production environment. Then our build pipeline would deploy the Avro schemas to the production schema registry at an appropriate stage.
Use snapshots schemas
This would be to try and mimic the setup with our other repositories. That is, we use one schema registry in all our environments (the "Schema Registry Prod"). But during development and test we use a "snapshot" version of the schema (perhaps prepend "-snapshot" to the subject name). In the build pipeline we change to a 'non-snapshot' subject when ready to release.
So we would like to hear how other people work with Avro and Schema Registry. What does your setup look like?
Adding snapshot to the subject name would cause the default TopicNameStrategy serializer function to fail
In a primarily Java dev environment, we do have 3 Registries, but they are only at backwards transitivity, not full
Where possible, we use the Maven Plugins provided by Confluent to test the schema compatibility before registering. This will cause the CI pipeline to fail if the schema is not compatible, and we can use Maven profiles to override the registry url per environment, but the assumption is that if it fails in a lower environment, it'll never reach higher. Test environment can have SNAPSHOT artifacts manually be deployed to, but this is only referenced by code artifacts and not Registry subjects, so schemas need to be manually deleted if there are any mis-registrations. That being said, "staging/qa" and prod are generally the exact same, so barring network connectivity around "production", you'd only need two Registries
For non Java projects, we force them to use a standalone custom Maven archetype repo with the same plugins, so that allows them to version the schema artifacts in Maven, but also allows for JVM consumers to still use their data. It also simplifies the build pipeline support to a standard lifecycle

spring cloud config server concurrency control

I've multiple consuming app instances connecting to spring cloud config server. Config server gets config from SVN Repo.
Just wanted to understand how config server (instance) is managing possibly concurrent requests.
Thanx,
That's a bug in the SVN support (the git version has a synchronized method). https://github.com/spring-cloud/spring-cloud-config/issues/128

Spring Cloud Configuration recommended architecture in data center

I have been playing with Spring Cloud Configuration. I like the simplicity of the solution and the fact that it uses git as it's default configuration store.
There are two aspects I need to figure out before pushing it as a solution for centralized configuration management.
The aspects are:
High availability
How to gradually roll out configuration changes (to support canary releases)
If you already implemented this in your data center or just playing with that please share your ideas!
Also I would like to hear from the creators, how they see the recommended deployment in single/cross data-center environments.
The Config Server itself is stateless, so you can spin up as many as these as you need and find them via eureka. Underneath the server itself, the git implementation you point to needs to be highly available as well. So if you point to github (private or public), then git is as available as github is. If the config server can't reach git it will continue to serve what it has checked out even if it is stale.
As far as gradual config changes, you could use a different branch and configure the canary to use that branch via spring.cloud.config.label and them merge the branch. You could also use profiles (eg application-<profilename>.properties) and configure the canary to use the specified profile.
I think the branch makes a little more sense, because you wouldn't have to reconfigure the non-canary nodes to use the new profile each time, just configure canary to use the branch.
Either way, the only time apps see config chages (when using spring cloud config client) is on startup or when you POST to /refresh on each node. You can also POST to /bus/refresh?destination=<servicename> if you use the Spring Cloud Bus to refresh all instances of a service at once.