Data resistance on Kafka topic on GCP pod failure

Data resistance on Kafka topic on GCP pod failure - kubernetes

I am planning to run Kafka on GCP (google cloud platform).
What I wonder is what happens to a data in Kafka topic when a GCP pod fails? By default a new pod will be created, but will the data in Kafka topic be lost? How can I avoid data loss in this situation?
I appreciate any help. Thanks in advance :)
Best Regards,

Kafka itself needs a solution for persistence, you will probably need a cloud native storage solution. Create a storage class defining your storage requirements like replication factor, snapshot policy, and performance profile. Deploy Kafka as a StatefulSet on Kubernetes at a high level.

Not understood exactly your purpose but in this case, you cannot guarantee Kafka's data resistance when pod fails/evicted. Maybe you should tried with native VM with Kafka installed and config that to be fully backed up (restorable anytime when disaster happen)

It depends on what exactly you need. It's quite a general question.
You have some ready Kafka deployments if you would use MarketPlace.
As you are asking for pods, I guess you want to use Google Kubernetes Engine. On the internet you can find many guides about using Kafka on Kubernetes.
For example you can refer Kafka with Zookeeper on Portworx. In one of the steps you have StorageClass yaml. In GKE default storageclass is set to delete but you can create a new storageclass with reclaimPolicy: Retain which will keep the disk in GCP after delete of pod.
In GCP you have also an option to create disk snapshot
In addition to some best practice using Kafka on Kubernetes you can find here.

Related

GKE Upgrade Causing Kafka to become Unavailable

I have a Kafka cluster hosted in GKE. Google updates GKE nodes in weekly basis and whenever this happens Kafka becomes temporarily unavailable and this causes massive error/rebalance to get it backup to healthy state. Currently we rely on K8 retry to eventually succeed once upgrade completes and cluster becomes available. Is there a way to gracefully handle this type of situation in Kafka or avoid it if possible?

In order to be able to inform you better, you would have to give us a little more information, what is your setup? Versions of Kube and Kafka? How many Kafka & ZK pods? How are you deploying your Kafka cluster (via a simple helm chart or an operator?) What are the exact symptoms you see when you upgrade your kube cluster? What errors do you get? What is the state of the Kafka cluster etc.? How do you monitor it?
But here are some points worth investigating.
Are you spreading the Kafka/ZK pods correctly across the nodes/zones?
Do you set PDBs to a reasonable maxUnavailable setting?
What are your readiness/liveness probes for your Kafka/ZK pods?
Are your topics correctly replciated?
I would strongly encourage you to use take a look at https://strimzi.io/ which can be very helpful if you want to operate Kafka on Kube. It is open source operator and very well documented.

You have control over the GKE Node's auto upgrade through "upgrade maintenance window" to decide when upgrades should occur. Based on your business criticality you can configure this option along with K8 retry feature.

Strimzi Kafka Connect how to mount PersistentVolumeClaim

I am deploying KafkaConnect using Custom Resource. I would like to mount to Kafka Connect cluster PersistentVolumeClaim. The idea is that another application responsible for file transfer will place there file that will be picked up by kafka connector.
I checked KafkaConnect resource config docs and it seems that I cannot simply add the volume to the Pod.
My understanding is that if I will patch the Pod strimzi-operator will recognise the modification and on the next reconciliation will overwrite it.
Would anyone have an idea how can I still use KafkaConnect CR and mount the pvc volume?

This is currently not supported by Strimzi. There is an enhancement issue for this (https://github.com/strimzi/strimzi-kafka-operator/issues/2571) but nobody implemented it yet.
I wonder if for the use-case you describe something like running the Kafka Connect as a sidecar would make more sense. You could then share the storage directly without any networking etc. (This is not something supported by Strimzi, But Kafka Connect itself can of course be used like this)

How to deploy Grafana as a Replicated Statefulset in Kubernetes?

I need to deploy Grafana in a Kubernetes cluster in a way so that I can have multiple persistent volumes stay in sync - similar to what they did here.
Does anybody know how I can use the master/slave architecture so that only 1 pod writes while the others read? How would I keep them in sync? Do I need additional scripts to do that? Can I use Grafana's built-in sqlite3 database or do I have to set up a different one (Mysql, Postgres)?
There's really not a ton of documentation out there about how to deploy statefulset applications other than Mysql or MongoDB.
Any guidance, experience, or even so much as a simple suggestion would be a huge help. Thanks!

StatefulSets are not what you think and have nothing to do with replication. They just handle the very basics of provisioning storage for each replica.
The way you do this is as you said by pointing Grafana at a "real" database rather than local Sqlite.
Once you do that, you use a Deployment because Grafana itself is then stateless like any other webapp.

StorageClass and Kafka

I have a question about Kubernetes StorageClasses...
I am planning to install a Kafka Cluster to K8s with the help of the Confluent Helm Charts...
Confluent Kafka Helm Charts
What I am not so sure about there how the StorageClasses in K8s functions, lets say I have Cluster with 3 Broker Instances, if I read the Helm charts correctly I can only configure one single StorageClass for the their StatefulSet Configuration and values.yaml
values.yaml
persistence:
enabled: true
storageClass: "kafka-storageclass"
size: 50Gi
disksPerBroker: 1
Now I am not exactly sure how the Storage Class functions (I will probably use an Azure Disk) but my question is, if I only able to configure one StorageClass, will my all my 3 brokers will use same Phyical Disk? That is one scenario that I will definetely will try to evade.
I don't want that my Broker Instances fighting each other for Disk IO...
If my assumption is correct how can I configure my helm charts to use different StorageClass for every instance (please consider that I can have today 3 instances but if the need arises I can have 20 Broker Instances also).
And there is also the additional paramter 'disksPerBroker', which I can't imagine which advantage it can have in K8s environment?
Thx for answers...

A storage-class defines how a unit of storage is dynamically created. Usually, you have some pre-created storage-classes with specific guaranteed performance (i.e. standard, premium). You need to find out the details in your cloud provider's documentation.
That said, with Apache Kafka you need to use block-storage, because good latency is required. Both Azure Managed Disks and AWS EBS are known to work well. You can think about them as they were dedicated local disks. To avoid imbalances, you should always use the same storage-class across all your brokers.
The disksPerBroker property is used to determine the number of disks attached to every single broker (JBOD). This is more about the flexibility to increase/descrease the amount of storage available to each broker, rather than performance.

Recover a Kubernetes Cluster

At the moment I have a Kubernetes cluster distributed on AWS via kops. I have a doubt: is it possible to make a sort of snapshot of the Kubernetes cluster and recreate the same environment (master and pod nodes), for example to be resilient or to migrate the cluster in an easy way? I know that the Heptio Ark exists, it is very beautiful. But I'm curious to know if there is an easy way to do it. For example, is it enough to back up Etcd (or in my case the snapshot of EBS volumes)?
Thanks a lot. All suggestions are welcome

kops stores its state in an S3 bucket identified by the KOPS_STATE_STORE. So yes, if your cluster has been removed you can restore it by running kops create cluster.
Keep in mind that it doesn't restore your etcd state so for that you are going to set up etcd backups. You could also make use of Heptio Ark.
Similar answers to this topic:
Recover kops Kubernetes cluster
How to restore kubernetes cluster using kops?

As mentioned by Rico in the earlier post, you can use Velero to back up your etcd using cli client. Another option to consider for the scenario you described is CAPE: CAPE provides an easy to use control plane for Kubernetes Multi-cluster App & Data Management via a friendly user interface.
See below for resources:
How to create an on-demand K8s Backup:
https://www.youtube.com/watch?v=MOPtRTeG8sw&list=PLByzHLEsOQEB01EIybmgfcrBMO6WNFYZL&index=7
How to Restore/Migrate K8s Backup to Another Cluster:
https://www.youtube.com/watch?v=dhBnUgfTsh4&list=PLByzHLEsOQEB01EIybmgfcrBMO6WNFYZL&index=10