How pod replicas sync with each other - Kubernetes? - kubernetes

I have a MySQL database pod with 3 replicas.Now I'm making some changes in one pod(pod data,not pod configuration), say I'm adding a table.How will the change reflect on the other replicas of the pod?
I'm using kubernetes v1.13 with 3 worker nodes.

PODs do not sync. Think of them as independend processes.
If you want a clustered MySQL installation, the Kubernetes docs describe how to do this by using a StatefulSet: https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/#deploy-mysql
In essence you have to configure master/slave instances of MySQL yourself.

Pods are independent from each other, if you modify one pod the others will not be affected

As per your configuration - changes applied in one pod wont be reflected on all others. These are isolated resources.
There is a good practice to deploy such things using PersistentVolumeClaims and StatefulSets.
You can always find explanation with examples and best practices in Run a Replicated Stateful Application documentation.

If you have three mysql server pods, then you have 3 independent databases. Even though you created them from the same Deployment. So, depending on what you do, you might end up with bunch of databases in the cluster.
I would create 1 mysql pod, with persistence, so if one pod dies, the next one would take if from where the other one left. Would not lose data.
If what you want is high availability, or failover replica, you would need to manage it on your own.
Generally speaking, K8s should not be used for storage purposes.

You are good to have common storage among those 3 pods (PVC) and also consider STS when running databases on k8s.

Related

Run different replica count for different containers within same pod

I have a pod with 2 closely related services running as containers. I am running as a StatefulSet and have set replicas as 5. So 5 pods are created with each pod having both the containers.
Now My requirement is to have the second container run only in 1 pod. I don't want it to run in 5 pods. But my first service should still run in 5 pods.
Is there a way to define this in the deployment yaml file for Kubernetes? Please help.
a "pod" is the smallest entity that is managed by kubernetes, and one pod can contain multiple containers, but you can only specify one pod per deployment/statefulset, so there is no way to accomplish what you are asking for with only one deployment/statefulset.
however, if you want to be able to scale them independently of each other, you can create two deployments/statefulsets to accomplish this. this is imo the only way to do so.
see https://kubernetes.io/docs/concepts/workloads/pods/ for more information.
Containers are like processes,
Pods are like VMs,
and Statefulsets/Deployments are like the supervisor program controlling the VM's horizontal scaling.
The only way for your scenario is to define the second container in a new deployment's pod template, and set its replicas to 1, while keeping the old statefulset with 5 replicas.
Here are some definitions from documentations (links in the references):
Containers are technologies that allow you to package and isolate applications with their entire runtime environment—all of the files necessary to run. This makes it easy to move the contained application between environments (dev, test, production, etc.) while retaining full functionality. [1]
Pods are the smallest, most basic deployable objects in Kubernetes. A Pod represents a single instance of a running process in your cluster. Pods contain one or more containers. When a Pod runs multiple containers, the containers are managed as a single entity and share the Pod's resources. [2]
A deployment provides declarative updates for Pods and ReplicaSets. [3]
StatefulSet is the workload API object used to manage stateful applications. Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. [4]
Based on all that information - this is impossible to match your requirements using one deployment/Statefulset.
I advise you to try the idea #David Maze mentioned in a comment under your question:
If it's possible to have 4 of the main application container not having a matching same-pod support container, then they're not so "closely related" they need to run in the same pod. Run the second container in a separate Deployment/StatefulSet (also with a separate Service) and you can independently control the replica counts.
References:
Documentation about Containers
Documentation about Pods
Documentation about Deployments
Documentation about StatefulSet

Kubernetes Autoscaling using VPA - Off or Auto update mode?

For the needs of a project i have created 2 Kubernetes clusters on GKE.
Cluster 1: 10 containers in one Pod
Cluster 2: 10 containers in 10 different Pods
All containers are connected and constitute an application.
What i would like to do is to generate some load and observe how the vpa will autoscale the containers..
Until now, using the "Auto" mode i have noticed that VPA changes values only once, at the begin and not while i generate load
and
that the Upper Bound is soo high, so it doesn't need any change!
Would you suggest me:
1) to use Auto or Recommendation mode?
and
2) to create 1 or 2 replicas of my application?
Also i would like to say that 2 of 10 containers is mysql and mongoDB . So if i have to create 2 replicas, i should use statefulsets or operators, right?
Thank you very much!!
Not sure you mean it when you say this
Cluster 1: 10 containers in one Pod
Cluster 2: 10 containers in different Pods
At very first you are not following best practice, ideally, you should be keeping the single container in a single POD
Running 10 containers in one pod that too much, if there is interdependency your code should be using the K8s service name to connect to each other.
to create 1 or 2 replicas of my application?
Yes, that would be always better to run multiple replicas of the application so if due to anything even node goes down your POD on another node would be running.
Also i would like to say that 2 of 10 containers is mysql and mongoDB
. So if i have to create 2 replicas, i should use statefulsets or
operators, right?
You can use the operators and stateful sets both no in it, it's possible operator idally create the stateful sets.
Implementing the replication of MySQL across the replicas would be hard manually unless you have good experience as DBA and you are aware.
While with operator you will get the benefit both auto backup, replication auto management and other such things.
Operators indirectly creates the stateful set or deployment but you won't have to manage much and worry about the replication and failover planning and DB strategy.

MariaDB Server vs MariaDB Galera Cluster HA Replication

I am planning to deploy HA database cluster on my kubernetes cluster. I am new to database and I am confused by the various database terms. I have decided on MariaDB and I have found two charts, MariaDB and MariaDB Galera Cluster.
I understand that both can achieve the same goal, but what are the main differences between the two? Under what scenario I should use either or?
Thanks in advance!
I'm not an expert so take my explanation with precaution (and double check it)
The main difference between the MariaDB's Chart and the MariaDB Galera Cluster's Chart is that the first one will deploy the standard master-slave (or primary-secondary) database, while the second one is a resilient master-master (or primary-primary) database cluster.
What does it means in more detail is the following:
MariaDB Chart will deploy a Master StatefulSet and a Slave StatefulSet which will spawn (with default values) one master Pod and 2 slave Pods. Once your database is up and running, you can connect to the master and write or read data, which is then replicated on the slaves, so that you have safe copies of your data available.
The copies can be used to read data, but only the master Pod can write new data in the database. Should the Pod crash.. or the Kubernetes cluster node where the Pod is running malfunction, you will not be able to write new data until the master's Pod is once more up and running (which may require manual intervention).. or if you perform a failover, promoting one of the other Pods to be the new temporary master (which also requires a manual intervention or some setup with proxies or virtual ips and so on).
Galera Cluster Chart instead, will deploy something more resilient. With default values, it will create a single StatefulSet with 3 Pods.. and each one of these Pods will be able to either read and write data, acting virtually as a master.
This means that if one of the Pods stop working for whatever reason, the other 2 will continue serving the database as if nothing happened, making the whole thing way more resilient. When the Pod (which stopped working) will come back up and running, it will obtain the new / different data from the other Pods, getting in sync.
In exchange for the resilience of the whole infrastructure (it would be too easy if the Galera Cluster solution would offer extreme resilience with no drawbacks), there are some cons in a multi-master application, with the more commons being some added latency in the operations, required to keep everything in sync and consistent.. and added complexity, which often may brings headaches.
There are several other limits with Galera Cluster, like explicit LOCKS of tables not working or that all tables must declare a primary key. You can find the full list here (https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/)
Deciding between the two solutions mostly depends on the following question:
Do you have the necessity that, should one of your Kubernetes cluster node fail, the database keeps working (and being usable by your apps) like nothing happened, even if one of its Pods was running on that particular node?

Can a deployment resource have multiple containers?

I am trying to deploy multiple pods in k8s like say MySQL, Mango, Redis etc
Can i create a single deployment resource for this and have multiple containers defined in template section? Is this allowed? If so, how will replication behave in this case?
Thanks
Pavan
I am trying to deploy multiple pods in k8s like say MySQL, Mango,
Redis etc
From microservices architecture perspective it is actually quite a bad idea to place all those containers in a single Pod. Keep in mind that a Pod is a smallest deployable unit that can be created and managed by Kubernetes. There are quite many good reasons you don't want to have all above mentioned services in a single Pod. Difficulties in scaling such solution is just one of them.
Can i create a single deployment resource for this and have multiple
containers defined in template section? Is this allowed? If so, how
will replication behave in this case?
No, it is not allowed in Kubernetes. As to Deployments and StatefulSets, (which you need for statefull applications such as databases) both manage Pods that are based on identical container spec so it is not possible to have a Deployment or StatefulSet consisting of different types of Pods, based on different specs.
To sum up:
Many Deployments and StatefulSets objects, serving for different purposes are the right solution.
A deployment can have multiple containers inside of it.
Generaly it's used to have one master container for the app and some sidecar container that are needed for the app. I don't have an example right now.
Still it's a best practice to split deployments for scalling purpose, your front may need to scale more than the back depending on cache and you may not want to have pods too big. For cahing purpose like redis it's better to have a cluster on the side as each time a pod start or stop, you will loose data.
It's common having multiple containers per Pod in order to share namespaces and volumes between them: take as example the Ambassador pattern that is used to present the application to outside adding a layer for the authentication, making it totally transparent to the main app.
Other examples using the sidecar pattern consist of log parsers or configurators that hot reload credentials without the main app to worry about it.
That's the theory, according to your needs you have to use one deployment per component, so a Deployment for your app, a StatefulSet for the DB and so on. Keep in mind to use a container per process and a Kubernetes resource per backing service.

How can I maintain a set of unique number crunching containers in kubernetes?

I want to run a "set" of containers in kubernetes, each which only differs in the docker environment variables (each one searches it's own dataset, which is located on network storage, then cached into the container's ram). For example:
container 1 -> Dataset 1
container 2 -> Dataset 2
Over time, I'll want to add (and sometimes remove) containers from this "set", but don't want to restart ALL of the containers when doing so.
From my (naive) knowledge of kubernetes, the only way I can see to do this is:
Each container could be its own deployment -- However there are thousands of containers, so would be a pain to modify and manage.
So my questions are:
Can I use a StatefulSet to manage this?
1.1. When a StatefulSet is "updated", must it restart all pods, even if their "spec" is unchanged?
1.2 Do StatefulSets allow for each unique container/pod to have its own environment variable(s)?
Is there any kubernetes concept to "group" deployments into some logical unit?
Any other thoughts about how to implement this in kubernetes?
Would docker swarm (or another container management platform) be better suited to my use case?
According to your description, the StatefulSet it's what you need.
1.1. When a StatefulSet is "updated", must it restart all pods, even if their "spec" is unchanged?
You can choose a proper update strategy. I suggest RollingUpdate but you can try whatever suits you.
Also check out this tutorial.
1.2 Do StatefulSets allow for each unique container/pod to have its own environment variable(s)?
Yes, because their naming is consistent (name-0, name-1, name-2, etc). You can use hostname (pod name) index with that.
Please let me know if that helped.
If you expect your containers to eventually be done with their workload and terminate (as opposed to processing a single item loaded in RAM forever), you should use a job queue such as Celery on top of Kubernetes to manage the execution. In this case Celery will do all the orchestration, including restarting jobs if they fail. This is much more manageable than using Kubernetes directly.
Kubernetes even provides an official example of such a setup.