I have a question about Kubernetes StorageClasses...
I am planning to install a Kafka Cluster to K8s with the help of the Confluent Helm Charts...
Confluent Kafka Helm Charts
What I am not so sure about there how the StorageClasses in K8s functions, lets say I have Cluster with 3 Broker Instances, if I read the Helm charts correctly I can only configure one single StorageClass for the their StatefulSet Configuration and values.yaml
values.yaml
persistence:
enabled: true
storageClass: "kafka-storageclass"
size: 50Gi
disksPerBroker: 1
Now I am not exactly sure how the Storage Class functions (I will probably use an Azure Disk) but my question is, if I only able to configure one StorageClass, will my all my 3 brokers will use same Phyical Disk? That is one scenario that I will definetely will try to evade.
I don't want that my Broker Instances fighting each other for Disk IO...
If my assumption is correct how can I configure my helm charts to use different StorageClass for every instance (please consider that I can have today 3 instances but if the need arises I can have 20 Broker Instances also).
And there is also the additional paramter 'disksPerBroker', which I can't imagine which advantage it can have in K8s environment?
Thx for answers...
A storage-class defines how a unit of storage is dynamically created. Usually, you have some pre-created storage-classes with specific guaranteed performance (i.e. standard, premium). You need to find out the details in your cloud provider's documentation.
That said, with Apache Kafka you need to use block-storage, because good latency is required. Both Azure Managed Disks and AWS EBS are known to work well. You can think about them as they were dedicated local disks. To avoid imbalances, you should always use the same storage-class across all your brokers.
The disksPerBroker property is used to determine the number of disks attached to every single broker (JBOD). This is more about the flexibility to increase/descrease the amount of storage available to each broker, rather than performance.
Related
I am planning to run Kafka on GCP (google cloud platform).
What I wonder is what happens to a data in Kafka topic when a GCP pod fails? By default a new pod will be created, but will the data in Kafka topic be lost? How can I avoid data loss in this situation?
I appreciate any help. Thanks in advance :)
Best Regards,
Kafka itself needs a solution for persistence, you will probably need a cloud native storage solution. Create a storage class defining your storage requirements like replication factor, snapshot policy, and performance profile. Deploy Kafka as a StatefulSet on Kubernetes at a high level.
Not understood exactly your purpose but in this case, you cannot guarantee Kafka's data resistance when pod fails/evicted. Maybe you should tried with native VM with Kafka installed and config that to be fully backed up (restorable anytime when disaster happen)
It depends on what exactly you need. It's quite a general question.
You have some ready Kafka deployments if you would use MarketPlace.
As you are asking for pods, I guess you want to use Google Kubernetes Engine. On the internet you can find many guides about using Kafka on Kubernetes.
For example you can refer Kafka with Zookeeper on Portworx. In one of the steps you have StorageClass yaml. In GKE default storageclass is set to delete but you can create a new storageclass with reclaimPolicy: Retain which will keep the disk in GCP after delete of pod.
In GCP you have also an option to create disk snapshot
In addition to some best practice using Kafka on Kubernetes you can find here.
I'm running the following helm chart (https://github.com/helm/charts/tree/master/stable/prometheus) with server.replicaCount =2 and server.statefulSet.enabled = true.
For storage i use two Manage Disks (not Azure Files that is not POSIX) (2 PV and 2 PVC) are created during the deployment of the chart.
My question is:
Is this an HA solution? Are the metrics written to both prometheus instances (a service with a public ip and and headless "service" are created) and replicated to both disks?
How this replicas really work?
Thanks,
Sadly, as Piotr noted, this is not a true HA offering and Thanos is generally the preferred way to go for this kind of setup, but not without it's own gotchas. The amount of clusters you have is a factor, and you might need some sort of tooling account to be able to follow changes all the way through.
What I can offer you is this excellent talk, which includes a live demo and shows how this works in practice.
No, this is not HA solution. This only scales the deployment to have 2 replicas at all times which both are on statefulsets.
In order to achieve HA monitoring on Kuberetes there needs to be dynamic failure detection and routing tools involved.
There are couple of articles about getting prometheus work with HA:
Deploying an HA Prometheus in Kubernetes on AWS — Multiple Availability Zone Gotchas
HA Kubernetes Monitoring using Prometheus and Thanos
The number of replicas only instructs deployment to always have at least 2 running instances of the deployment pods. You can find more information about replicas in Kubernetes documentation.
In the helm chart documentation, there seems to be other options like server.service.statefulsetReplica.enabled and server.service.statefulsetReplica.replica but I think those are just tools that can help to create HA prometheus. Not a ready from the get go solution.
Hope it helps.
I'm running Zookeeper on K8s with PVC based on gp2.
I want to recreate ZK cluster but with different PVC that is based on different StorageClass.
Solutions like zcopy can't really help here since this requires that I will have two clusters running but in my case only 1 should be running all the time.
The last option will be to have two clusters running for a while but it is less preferred.
look at Velero to back up and restore your Kubernetes cluster resources and persistent volumes.
refer the below link --> https://github.com/vmware-tanzu/velero
I have deployed two k8s clusters and i want that if someone will create pv in first cluster then it should automatically get created in the second cluster. How can i achieve this?
simply speaking you can't: these are separate clusters and each of them has a separate configuration. there is no built-in mechanism of triggering between separate clusters. you would need to build your own program that would be watching both API servers and applying the changes.
I'm guessing however that you probably want to share filesystem data between clusters: if so, then have a look at volume types backed by network/distributed file systems such as NFS or ceph.
I have a MySQL database pod with 3 replicas.Now I'm making some changes in one pod(pod data,not pod configuration), say I'm adding a table.How will the change reflect on the other replicas of the pod?
I'm using kubernetes v1.13 with 3 worker nodes.
PODs do not sync. Think of them as independend processes.
If you want a clustered MySQL installation, the Kubernetes docs describe how to do this by using a StatefulSet: https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/#deploy-mysql
In essence you have to configure master/slave instances of MySQL yourself.
Pods are independent from each other, if you modify one pod the others will not be affected
As per your configuration - changes applied in one pod wont be reflected on all others. These are isolated resources.
There is a good practice to deploy such things using PersistentVolumeClaims and StatefulSets.
You can always find explanation with examples and best practices in Run a Replicated Stateful Application documentation.
If you have three mysql server pods, then you have 3 independent databases. Even though you created them from the same Deployment. So, depending on what you do, you might end up with bunch of databases in the cluster.
I would create 1 mysql pod, with persistence, so if one pod dies, the next one would take if from where the other one left. Would not lose data.
If what you want is high availability, or failover replica, you would need to manage it on your own.
Generally speaking, K8s should not be used for storage purposes.
You are good to have common storage among those 3 pods (PVC) and also consider STS when running databases on k8s.