Migrate zookeeper data on Kubernetes to new storage - kubernetes

I'm running Zookeeper on K8s with PVC based on gp2.
I want to recreate ZK cluster but with different PVC that is based on different StorageClass.
Solutions like zcopy can't really help here since this requires that I will have two clusters running but in my case only 1 should be running all the time.
The last option will be to have two clusters running for a while but it is less preferred.

look at Velero to back up and restore your Kubernetes cluster resources and persistent volumes.
refer the below link --> https://github.com/vmware-tanzu/velero

Related

When should I use StatefulSet?Can I deploy database in StatefulSet?

I heard that statefulset is suitable for database.
But StatefulSet will create different pvc for echo pod.
If I set the replicas=3.then I get 3 pod and 3 different pvc with different data.
For database users,they only want a database not 3 database.
So Its clear we should not use statefulset in this situation.
But when should we use statefulset.
A StatefulSet does three big things differently from a Deployment:
It creates a new PersistentVolumeClaim for each replica;
It gives the pods sequential names, starting with statefulsetname-0; and
It starts the pods in a specific order (ascending numerically).
This is useful when the database itself knows how to replicate data between different copies of itself. In Elasticsearch, for example, indexes are broken up into shards. There are by default two copies of each shard. If you have five Pods running Elasticsearch, each one will have a different fraction of the data, but internally the database system knows how to route a request to the specific server that has the datum in question.
I'd recommend using a StatefulSet in preference to manually creating a PersistentVolumeClaim. For database workloads that can't be replicated, you can't set replicas: greater than 1 in either case, but the PVC management is valuable. You usually can't have multiple databases pointing at the same physical storage, containers or otherwise, and most types of Volumes can't be shared across Pods.
We can deploy a database to Kubernetes as a stateful application. Usually, when we deploy pods they have their own storage, but that storage is ephemeral - if the container kills its storage, it’s gone with it.
So, we’ll have a Kubernetes object to tackle that scenario: when we want our data to persist we attach a pod with a respective persistent volume claim. By doing so, if our container kills our data, it will be in the cluster, and the new pod will access the data accordingly.
Some limitations of using StatefulSet are:
1.Required use of persistent volume provisioner to provision storage for pod-based on request storage class.
2.Deleting or scaling down the replicas will not delete the volume attached to StatefulSet. It ensures the safety of the data.
3.StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods.
4.StatefulSet doesn’t provide any guarantee to delete all pods when StatefulSet is deleted, unlike deployment, which deletes all pods associated with deployment when the deployment is deleted. You have to scale down pod replicas to 0 before deleting StatefulSet.
stateful set useful for running the application which stores the state basically.
Stateful set database run the multiple replicas of POD and PVC however internally they all auto sync. Data sync across the pods and PVC.
So ideally it's best option to use the stateful sets with multiple replicas to get the HA database.
Now it depends on the use case which database you want to use, it supports replication or not clustering, etc.
here is MySQL example with replication details : https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/

Kubernetes Persistent Volume for Pod in Production

I executed a scenario where I deployed Microsoft SQL Database on my K8s cluster using PV and PVC. It work well but I see some strange behaviour. I created PV but it is only visible on one node and not on other workers nodes. What am I missing here, any inputs please?
Background:
Server 1 - Master
Server 2 - Worker
Server 3 - Worker
Server 4 - Worker
Pod : "MyDb" is running on Server (Node) 4 without any replica set. I am guessing because my POD is running on server-4, PV got created on server four when created POD and refer PVC (claim) in it.
Please let me know your thought on this issue or share your inputs about mounting shared disk in production cluster.
Those who want to deploy SQL DB on K8s cluster, can refer blog posted by Philips. Link below,
https://www.phillipsj.net/posts/sql-server-on-linux-on-kubernetes-part-1/ (without PV)
https://www.phillipsj.net/posts/sql-server-on-linux-on-kubernetes-part-2/ (with PV and Claim)
Regards,
Farooq
Please see below my findings of my original problem statement.
Problem: POD for SQL Server was created. At runtime K8s created this pod on server-4 hence created PV on server-4. However, on other node PV path wasn't created (/tmp/sqldata_.
I shutdown server-4 node and run command for deleting SQL pod (no replica was used so initially).
Status of POD changed to "Terminating" POD
Nothing happened for a while.
I restarted server-4 and noticed POD got deleted immediately.
Next Step:
- I stopped server-4 again and created same pod.
- POD was created on server-3 node at runtime and I see PV (/tmp/sqldata) was created as well on server-3. However, all my data (example samples tables) are was lost. It is fresh new PV on server 3 now.
I am assuming PV would be mounted volume of external storage and not storage/disk from any node in cluster.
I am guessing because my POD is running on server-4, PV got created on server four when created POD and refer PVC (claim) in it.
This is more or less correct and you should be able to verify this by simply deleting the Pod and recreating it (since you say you do not have a ReplicaSet doing that for you). The PersistentVolume will then be visible on the node where the Pod is scheduled to.
Edit: The above assumes that you are using an external storage provider such as NFS or AWS EBS (see possible storage providers for Kubernetes). With HostPath the above does NOT apply and a PV will be created locally on a node (and will not be mounted to another node).
There is no reason to mount the PersistentVolume also to the other nodes. Imagine having hundreds of nodes, would you want to mount your PersistentVolume to all of them, while your Pod is just running on one?
You are also asking about "shared" disks. The PersistentVolume created in the blog post you linked is using ReadWriteMany, so you actually can start multiple Pods accessing the same volume (given that your storage supports that as well). But your software (a database in your case) needs to support having multiple processes accessing the same data.
Especially when considering databases, you should also look into StatefulSets, as this basically allows you to define Pods that are always using the same storage, which can be very interesting for databases. Wherever you should run or not run databases on Kubernetes is a whole different topic...

How pod replicas sync with each other - Kubernetes?

I have a MySQL database pod with 3 replicas.Now I'm making some changes in one pod(pod data,not pod configuration), say I'm adding a table.How will the change reflect on the other replicas of the pod?
I'm using kubernetes v1.13 with 3 worker nodes.
PODs do not sync. Think of them as independend processes.
If you want a clustered MySQL installation, the Kubernetes docs describe how to do this by using a StatefulSet: https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/#deploy-mysql
In essence you have to configure master/slave instances of MySQL yourself.
Pods are independent from each other, if you modify one pod the others will not be affected
As per your configuration - changes applied in one pod wont be reflected on all others. These are isolated resources.
There is a good practice to deploy such things using PersistentVolumeClaims and StatefulSets.
You can always find explanation with examples and best practices in Run a Replicated Stateful Application documentation.
If you have three mysql server pods, then you have 3 independent databases. Even though you created them from the same Deployment. So, depending on what you do, you might end up with bunch of databases in the cluster.
I would create 1 mysql pod, with persistence, so if one pod dies, the next one would take if from where the other one left. Would not lose data.
If what you want is high availability, or failover replica, you would need to manage it on your own.
Generally speaking, K8s should not be used for storage purposes.
You are good to have common storage among those 3 pods (PVC) and also consider STS when running databases on k8s.

kubernetes statefulsets : does node sees same persisted volume after it restarts on same node

I am reading local storage design of kubernetes. It has a section for distributed database where db replicates data by itself.
My question is that if any process of db goes down, will it be restarted on the same node/machine ? I think it's a yes.
If it's yes, will it have access to its local storage that it had before it crashed ?
I read an older article about stateful sets when it was in beta. The article didn't encourage use of local storage at that time.
I am new to Kubernetes, so please answer this question with more information that a some new needs for understanding.
In the local storage design, as you can read here, it is used with stateful sets. So for example if you want three mongodb instances named mongodb, then k8s will create for you three pods:
mongodb-1
mongodb-2
mongodb-3
If the mongodb-2 fails then k8s will restart it with the same local storage or persistent volume.
If you increase the number of replicas, then k8s will create new persistent volumes through your persistentVolumeClaimTemplate. If you shrink it down to two those newly created volumes won't be deleted and will be used it you go back to your previous number of replicas.
If your persistent volume is bound to a specific node then k8s will know to create your pod on that node.
You can read about a mongodb cluster statefulset example here : https://kubernetes.io/blog/2017/01/running-mongodb-on-kubernetes-with-statefulsets
Or you can check a great talk here (with demos):
https://www.youtube.com/watch?v=m8anzXcP-J8&feature=youtu.be
The use of statefuls sets and local storage is really well explained.

Kubernetes - Persistent storage for PostgreSQL

We currently have a 2-node Kubernetes environment running on bare-metal machines (no GCE) and now we wish to set up a PostgreSQL instance on top of this.
Our plan was to map a data volume for the PostgreSQL Data Directory to the node using the volumeMounts option in Kubernetes. However this would be a problem because if the Pod ever gets stopped, Kubernetes will re-launch it at random on one of the other nodes. Thus we have no guarantee that it will use the correct data directory on re-launch...
So what is the best approach for maintaining a consistent and persistent PostgreSQL Data Directory across a Kubernetes cluster?
one solution is to deploy HA postgresql, for example https://github.com/sorintlab/stolon
another is to have some network storage attached to all nodes(NFS, glusterFS) and use volumeMounts in the pods