If a Deployment uses ReplicaSets to scale Pods up and down, and StatefulSets don't have ReplicaSets...
So, how does it manage to scale Pods up and down? I mean, what resource is responsible? What requests does a StatefulSet make in order to scale?
In short StatefulSet Controller handles statefulset replicas.
A StatefulSet is a Kubernetes API object for managing stateful application workloads. StatefulSets handle the deployment and scaling of sets of Kubernetes pods, providing guarantees about their uniqueness and ordering.
Similar to deployments, StatefulSets manage pods with identical container specifications. They differ in terms of maintaining a persistent identity for each pod. While the pods are all created based on the same spec, they are not interchangeable, so each pod is given a persistent identifier that is maintained through rescheduling.
Benefits of a StatefulSet deployment include:
Unique identifiers—every pod in the StatefulSet is assigned a unique, stable network identity, consisting of a hostname based on the application name and instance number. For example, a StatefulSet for a web application with three instances may have pods labeled web1, web2 and web3.
Persistent storage—every pod has its own stable, persistent volume, either by default or as defined per storage class. When the pods in a cluster are scaled down or deleted, their associated volumes are not lost, and the data persists. Unneeded resources can be purged by scaling down the StatefulSet to 0 before deleting the unused pods.
Ordered deployment and scaling—the pods in a StatefulSet are created and deployed in order, according to their increments. Pods are also shut down in (reverse) order, ensuring that the deployment and runtime are reliable and repeatable. The StatefulSet won’t scale until all every required pod is running, so if a pod fails, it will recreate the pod before it attempts to add more instances as per the scaling requirements.
Automated, ordered updates—a StatefulSets can handle rolling updates, shutting down each node and rebuilding it according to the original order, until every node has been replaced and the older versions cleaned up. The persistent volumes can be reused, so data is migrated to the new version automatically.
Related
When deploying Kubernetes Daemonset, what will happen when single node (out of a few nodes) is almost out of resource, and a pod can't be created, and when there are no pods that can be evicted? Though Kubernetes can be horizontally scaled, I believe it is meaningless to scale horizontally as Daemonset would need every pod on each node.
Though Kubernetes can be horizontally scaled, I believe it is meaningless to scale horizontally as Daemonset would need every pod on each node.
DaemonSet is a workload type that is mostly for operations workload e.g. transporting logs from the node or similar "system services". It is rarely a good fit for workload that is serving your users, but it can be.
what will happen when single node (out of a few nodes) is almost out of resource, and a pod can't be created, and when there are no pods that can be evicted?
As I described above, workload deployed with DaemonSet is typically operations workload that has e.g. an infrastructure role in your cluster. Since this may be more critical pods (or less, depending on what you want), I would use a higher Quality of Service for these pods, so that other pods is evicted when there are few resources on the node.
See Configure Quality of Service for Pods for how to configure your Pods to be in a Quality of Service class, one of:
Guaranteed
Burstable
Best Effort
You might also consider to use Pod Priority and Preemption
The question was about DaemonSet but as a final note: Workload that serves requests from your users, typically is deployed as Deployment and for those, it is very easy to do horizontal scaling using Horizontal Pod Autoscaler.
I want to apply VPA vertical pod auto scaling for database pods. Can we use VPA for database auto scaling (Vertical) as VPA requires at least 2 replicas (ref : https://github.com/kubernetes/autoscaler/issues/1665#issuecomment-464679271) as it delete pods when set criteria is reached. So pods are deleted hence also data.
What is good practice for using VPA with database pods?
As I understand it, the real question is how to run a stateful workload with multiple replicas.
Use StatefulSets to configure n replicas for a database. StatefulSet pods have stable names which are preserved across pod restarts (and reincarnations). Combined with PersistentVolumeClaim templates (accepted with StatefulSet spec) and headless services, it is capable of retaining same volumes and network FQDN across reincarnations.
Take a look at Helm charts for various databases, e.g. MySQL chart, for useful insights.
On a side note, it might be worthwhile to consider using an operator for the database application you're using. Operators for most applications can be found on https://operatorhub.io.
VPA - Vertical pod autoscaler can work in 2 ways:
Recommendation mode - it will recommend the requests and limits for pods based on resources used
Auto mode - it will automatically analyze the usage and set the request and limits on pods. This will result in pod termination to recreate it with new specification as stated here:
Due to Kubernetes limitations, the only way to modify the resource requests of a running Pod is to recreate the Pod. If you create a VerticalPodAutoscaler with an updateMode of "Auto", the VerticalPodAutoscaler evicts a Pod if it needs to change the Pod's resource requests.
Cloud.google.com: Kubernetes Engine: Docs: Concepts: Vertical pod autoscaler
Please refer to above link for more information regarding the concepts of VPA.
The fact that it needs at least 2 replicas is most probably connected with the fact of high availability. As the pods are getting evicted to support new limits they are unable to process the request. If it came to situation where there is only 1 replica at the time, this replica wouldn't be able to respond to requests when in terminating/recreating state.
There is an official guide to run VPA on GKE:
Cloud.google.com: Kubernetes Engine: How to: Vertical pod autoscaling
VPA supports: Deployments as well as StatefulSets.
StatefulSet
Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
If you want to use storage volumes to provide persistence for your workload, you can use a StatefulSet as part of the solution.
Kubernetes.io: StatefulSet
Configuring StatefulSet with PersistentVolumes will ensure that the data stored on PV will not be deleted in case of pod termination.
To be able to use your database with replicas > 1 you will need to have replication implemented within your database environment.
There are guides/resources/solutions on running databases within Kubernetes environment. Please choose the solution most appropriate to your use case. Some of them are:
Kubernetes.io: Run replicated stateful application
Github.com: Zalando: Postgres operator
Github.com: Oracle: Mysql operator
After deploying your database you will be able to run below command to extract the name of the StatefulSet:
$ kubectl get sts
You can then apply the name of the StatefulSet to the VPA like below:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: DB-VPA
spec:
targetRef:
apiVersion: "apps/v1"
kind: StatefulSet
name: <INSERT_DB_STS_HERE>
updatePolicy:
updateMode: "Auto"
I encourage you also to read this article:
Cloud.google.com: Blog: To run or not to run a database on Kubernetes, what to consider
3 pods were running under ReplicationController 'rc1', then I deleted only rc1 (not pds) and created a new ReplicaSet 'rs1' with the same label selector of rc1. So as expected rs1 matched the existing pods created but rc1.
After some time, I created the ReplicationController rc2 with the same manifest file as that of rc1. Now, rc1 is spun up new pods instead of referring pods with same labels.
So I was wondering if it is possible that a pod can be scoped under two different ReplicaSets/ReplicationsControllers?
A ReplicaSet purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
So I was wondering if it is possible that a pod can be scoped under two different ReplicaSets/ReplicationsControllers?
The link a ReplicaSet has to its Pods is via the Pods’ metadata.ownerReferences field, which specifies what resource the current object is owned by. All Pods acquired by a ReplicaSet have their owning ReplicaSet’s identifying information within their ownerReferences field. It’s through this link that the ReplicaSet knows of the state of the Pods it is maintaining and plans accordingly.
A ReplicaSet identifies new Pods to acquire by using its selector. If there is a Pod that has no OwnerReference or the OwnerReference is not a Controller and it matches a ReplicaSet’s selector, it will be immediately acquired by said ReplicaSet. That is explained very well (with examples) in the official documentation.
After some time, I created the ReplicationController rc2 with the same manifest file as that of rc1. Now, rc1 is spun up new pods instead of referring pods with same labels.
Please note that a Deployment that configures a ReplicaSet is now the recommended way to set up replication.
A ReplicationController ensures that a specified number of pod replicas are running at any one time. In other words, a ReplicationController makes sure that a pod or a homogeneous set of pods is always up and available.
If there are too many pods, the ReplicationController terminates the extra pods. If there are too few, the ReplicationController starts more pods. Unlike manually created pods, the pods maintained by a ReplicationController are automatically replaced if they fail, are deleted, or are terminated.
Hope that helps.
As per the Kubernetes documentation there is 1:1 correspondence between Deployment and ReplicaSets. Similarly depending on the replicas attribute , a ReplicaSet can manage n number of pods of same nature. Is this a correct understanding ?
Logically (assuming Deployment is a wrapper/Controller) I feel Deployment can have multiple replicaSets and each replicaSet can have multiple Pods (same or different kind). If this statement is correct, can some one share an example K8S template ?
1.) Yes, a Deployment is a ReplicaSet, managed at a higher level.
2.) No, a Deployment can not have multiple ReplicaSets, a Deployment pretty much IS a ReplicaSet. Typically you never use a ReplicaSet directly, Deployment is all you need. And no, you can't have different Pod templates in one Deployment or ReplicaSet. The point of replication is to create copies of the same thing.
As to how many pods can be run per Deployment, the limits aren't really per Deployment, unless specified. Typically you'd either set the wanted number of replicas in the Deployment or you use the Horizontal Pod Autoscaler with a minimum and a maximum number of Pods. And unless Node limits are smaller, the following limits apply:
No more than 100 pods per node
No more than 150000 total pods
https://kubernetes.io/docs/setup/best-practices/cluster-large/
As per the Kubernetes documentation there is 1:1 correspondence between Deployment and ReplicaSets. Similarly depending on the replicas attribute , a ReplicaSet can manage n number of pods of same nature. Is this a correct understanding ?
Yes. It will create no of pods equal to value to the replicas field value.
Deployment manages a replica set, you don't/shouldn't interact with the replica set directly.
Logically (assuming Deployment is a wrapper/Controller) I feel Deployment can have multiple replicaSets and each replicaSet can have multiple Pods (same or different kind). If this statement is correct, can some one share an example K8S template ?
When you do a rolling deployment, it creates a new ReplicaSet with the new pods (updated containers), and scales down the pods running in older replica set.
I guess it does not support running two different ReplicaSets(not deployment updates) with different pod/containers.
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
After the deployment has been updated:
Run:
kubectl describe deployments
Output:
.
.
.
OldReplicaSets: <none>
NewReplicaSet: nginx-deployment-1564180365 (3/3 replicas created)
I want to deploy a single Pod on a Node to host my service (like GitLab for the example). The problem is : a Pod will not be re-created after the Node failure (like a reboot). The solution(s) : Use a StatefulSet, ReplicaSet or DaemonSet to ensure the Pod creation after a Node failure. But what is the best for this case ?
This Pod is stateful (I am using volume hostPath to keep the data) and is deployed using nodeSelector to keep it always on the same Node.
Here is a simple YAML file for the example : https://pastebin.com/WNDYTqSG
It creates 3 Pods (one for each Set) with a volume to keep the data statefully. In practice, all of these solutions can feet my needs, but I don't know if there are best practices for this case.
Can you help me to choose between these solutions to deploy a single stateful Pod please ?
Deployment is the most common option to manage a Pod or set of Pods. These are normally used instead of ReplicaSets as they are more flexible and creating a Deployment results in a ReplicaSet - see https://www.mirantis.com/blog/kubernetes-replication-controller-replica-set-and-deployments-understanding-replication-options/
You would only need a StatefulSet if you had multiple Pods and needed dedicated persistence per Pod or you had multiple Pods and the Pods need individual names because they relate to each other (e.g. one is a leader) - https://stackoverflow.com/a/48006210/9705485
A DaemonSet would be used when you want one Pod/replica per Node