I'm moving from a Kubernetes statefulset to a deployment type. The stateful set has an associated persistent volume, but the deployment type will not have that.
To do this smoothly, I'd like half the replicas to be on the stateful set, and half on the deployment type. I've read through some documentation and it seems that I may use spec.selector.matchLabels to select which pods to deploy to, but I don't understand how exactly to achieve that.
I'm also somewhat confused if I'll be selecting "nodes" here or "pods" but it seems like I should care about pods here.
How do I use stateful set for half my pods, and deployment for the other half?
Related
I need to start by saying that I have no experience using Cassandra and I am not the one who who created this deployment.
I have Cassandra running in a cluster in AKS. The PVC as configured in the statefulset is 1000Gi. Currently the pods are out of storage and are in a constant unhealthy state.
I am looking to expand the volumes attached to the pods. The problem I am facing is that I cannot scale down the statefulset because the statefulsets only scale down when all their pods are healthy.
I even tried deleting the statefulset and then recreateing it with a larger PVC (as recomended here)
Howerver, I can't seem to delete the statefulset. It looks to me like the CassandraDatacenter CRD keeps recreating the statefulset as soon as I delete it. Giving me no time to change anything.
My question are as follows:
Is there a standard way to expand the volume without losing data?
What would happen if I scale down the replicas in the CassandraDatacenter? Will it delete the PVC or keep it?
If there is no standard, does anyone have any ideas on how to accomplish expanding the volume size without losing storage?
Ordinarily in a Cassandra cluster, the best practice is to scale horizontally (not vertically). You want more Cassandra nodes to spread the load out to achieve maximum throughput.
The equivalent in Kubernetes is to scale up your deployment. As you increase the node count, the amount of data on each individual Cassandra node will decrease proportionally.
If you really want to resize the PVC, you will only be able to do it dynamically if you have enabled allowVolumeExpansion. You won't lose data as you do this.
Deleting a STS isn't going to work because by design it will be automatically replaced as you already know. You also won't be able to scale down because there isn't enough capacity (disk space) in your cluster if you do. Cheers!
Answer for: How can I expand a PVC for StatefulSet on AKS without loosing data?
While the answer of #Erick Raminez is a very good advice for Cassandra specific, I would like to answers the more general question "How can I expand a PVC for my StatefulSet on AKS without loosing data?".
If downtime is allowed, you can follow these 4 steps:
# Delete StatefulSet
# This is required on AKS since the azure disk must have status "Unattached"
kubectl delete statefulsets.apps STATEFULSET_NAME
# Edit capacity in
# - your statefulset yaml file
# - each pvc
kubectl patch pvc PVC_NAME -p '{"spec": {"resources": {"requests": {"storage": "3Gi"}}}}'
# Deploy updated statefulset yaml (or helm chart)
kubectl apply -f statefulset.yaml
# Verify Capacity
kubectl get pvc
If you don't want downtime, check the first reference for some additional steps.
References:
https://adamrushuk.github.io/increasing-the-volumeclaimtemplates-disk-size-in-a-statefulset-on-aks/
https://serverfault.com/a/989665/561107
We are looking at a Kubernetes scenario that requires us to maintain N pods for a given Deployment (let's assume for simplicitly that N is static and N = 3). Currently we are using a Deployment and a ReplicaSet for this.
Within each pod, is there any way (through environment variable injection or similar) for us to get a unique identifier that shows which pod that pod is (i.e. "1", "2", "3" or similar... the exact format is unimportant).
What is especially important (because of the system these pods connect to) is that if pod "2" dies, the replacement pod also reports its identifier as "2", not as something new, e.g. "4"... in other words, the set of identifiers does not change over time unless the size of the set is increased / decreased. Currently we are using the pod name, but that is not stable in this way; the pod name is new and unique every time.
Is this what a StatefulSet is for? The documentation seems to focus in particular on storage volumes, but this is not a priority for us. How would we actually obtain the unique and stable ID inside the container in code?
Yes, Statefulset is the way to go if the pods need to have their identity defined in some way.
Here is the quote from a relevant section from the docs:
Like a Deployment, a StatefulSet manages Pods that are based on an
identical container spec. Unlike a Deployment, a StatefulSet maintains
a sticky identity for each of their Pods. These pods are created from
the same spec, but are not interchangeable: each has a persistent
identifier that it maintains across any rescheduling.
So, if you have a Statefulset object named myapp with 3 replicas then the pods will be named as myapp-0, myapp-1 and myapp-2.
Further, if any of the pods die say myapp-1 then the new pod created as the replacement of that will again be myapp-1.
You can expose the pod name to the containers via environment variables through Downward API and use it inside the scripts:
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
A relevant note that I missed to mention is that with Statefulsets the pods are brought up one by one unlike Deployments. So, for the above example myapp, myapp-1 will only get started after myapp-0 is ready.
I can't seem to find an answer to this but what is the relationship between an HPA and ReplicaSet? From what I know we define a Deployment object which defines replicas which creates the RS and the RS is responsible for supervising our pods and scale up and down.
Where does the HPA fit into this picture? Does it wrap over the Deployment object? I'm a bit confused as you define the number of replicas in the manifest for the Deployment object.
Thank you!
When we create a deployment it create a replica set and number of pods (that we gave in replicas). Deployment control the RS, and RS controls pods. Now, HPA is another abstraction which give the instructions to deployment and through RS make sure the pods fullfil the respective scaling.
As far the k8s doc: The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics). Note that Horizontal Pod Autoscaling does not apply to objects that can't be scaled, for example, DaemonSets.
A brief high level overview is: Basically it's all about controller. Every k8s object has a controller, when a deployment object is created then respective controller creates the rs and associated pods, rs controls the pods, deployment controls rs. On the other hand, when hpa controllers sees that at any moment number of pods gets higher/lower than expected then it talks to deployment.
Read more from k8s doc
I have a Kubernetes deployment which can have multiple replica pods. I wish to horizontally increase and decrease the pods based on some logic in my python application (not custom metrics in hpa).
I have two ways to this:
Using Horizontal Pod Autoscalar and changing minReplicas, maxReplicas though my application by using kubernetes APIs
Directly updating the "/spec/replicas" field in my deployment using the APIs
Both the above things are working for upscale and downscale.
But, when I scale down, I want to remove a particular Pod, and not any other pod.
If I update the minReplicas maxReplicas in HPA, then it randomly deletes a pod.
Same when I update the /spec/replicas field in the deployment.
How can I delete a particular pod while scaling down?
I am not aware of any way to ensure that a particular pod in a ReplicaSet will be deleted during a scale down. You could achieve this behavior with a StatefulSet which will always delete the last pod on scale down.
For example, if we had a StatefulSet foo that was scaled to 3 we would have pods:
foo-0
foo-1
foo-2
And if we scaled the StatefulSet to 2, the controller would delete foo-2. But note that there are other limitations to be aware of with StatefulSet.
I want to deploy a single Pod on a Node to host my service (like GitLab for the example). The problem is : a Pod will not be re-created after the Node failure (like a reboot). The solution(s) : Use a StatefulSet, ReplicaSet or DaemonSet to ensure the Pod creation after a Node failure. But what is the best for this case ?
This Pod is stateful (I am using volume hostPath to keep the data) and is deployed using nodeSelector to keep it always on the same Node.
Here is a simple YAML file for the example : https://pastebin.com/WNDYTqSG
It creates 3 Pods (one for each Set) with a volume to keep the data statefully. In practice, all of these solutions can feet my needs, but I don't know if there are best practices for this case.
Can you help me to choose between these solutions to deploy a single stateful Pod please ?
Deployment is the most common option to manage a Pod or set of Pods. These are normally used instead of ReplicaSets as they are more flexible and creating a Deployment results in a ReplicaSet - see https://www.mirantis.com/blog/kubernetes-replication-controller-replica-set-and-deployments-understanding-replication-options/
You would only need a StatefulSet if you had multiple Pods and needed dedicated persistence per Pod or you had multiple Pods and the Pods need individual names because they relate to each other (e.g. one is a leader) - https://stackoverflow.com/a/48006210/9705485
A DaemonSet would be used when you want one Pod/replica per Node