I have an application that I want to scale in parallel but would love a way to have a brief pause between each new pod getting created.
I need this because during pod creation I update a file, and if the file isn't closed by the time the next pod tries to open it and update it, it will fail.
So I just need a 1-second buffer between pods opening and closing this file.
Right now if a scale happens and more than 1 new pod is added, they hit the file at the same time. So one pod will work, the other will fail, and I have to wait for the pod to timeout for k8s to kill it and recreate it, at which point it will work, but wastes a lot of valuable time during scaling events when I need new pods as quickly as possible to handle the load.
Not sure if I'm not wording my searches correctly, but not able to find this on the k8s website. Any ideas would be helpful.
(additional note, I know on StatefulSet it will scale 1 pod at a time by default, However, that method requires all pods to be healthy to continue to add that one node at a time, and if any pod becomes unhealthy it will stop scaling until all pods are healthy again. Which during high load situations, wouldn't be ideal)
You will need to do some work in order to achieve it.
Few "hacks/solutions"
1. Init Container
Add an init container to your deployment
The init container will write/delete a destination file like lock.txt
The init container will check to see if this file exist, if so he will wait until the file is removed
Once the file is removed the init container will exit so the pod will "start"
2. Probe 'PostStart'
Add a PostStart probe to your pods which will read the current number of the desired pods vs the actual ones and if required execute a scale
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: ...
image: ...
lifecycle:
postStart:
exec:
command: [<get the current number of pods from the replica>
<check to see if you need to restart a new pod>
<scale a new pod if required>]
3. CronJob
Scale your pods with CronJob, Use a file or config map to set the desired number of pods.
Apply a Cron job that scales your pods and when you reached the desired pods remove the CronJob
Related
I have a testing scenario to check if the API requests are being handled by another pod if one goes down. I know this is the default behaviour, but I want to stimulate the following scenario.
Pod replicas - 2 (pod A and B)
During my API requests, I want to kill/stop only pod A.
During downtime of A, requests should be handled by B.
I am aware that we can restart the deployment and also scale replicas to 0 and again to 2, but this won't work for me.
Is there any way to kill/stop/crash only pod A?
Any help will be appreciated.
If you want to simulate what happens if one of the pods just gets lost, you can scale down the deployment
kubectl scale deployment the-deployment-name --replicas=1
and Kubernetes will terminate all but one of the pods; you should almost immediately see all of the traffic going to the surviving pod.
But if instead you want to simulate what happens if one of the pods crashes and restarts, you can delete the pod
# kubectl scale deployment the-deployment-name --replicas=2
kubectl get pods
kubectl delete pod the-deployment-name-12345-f7h9j
Once the pod starts getting deleted, the Kubernetes Service should route all of the traffic to the surviving pod(s) (those with Running status). However, the pod is managed by a ReplicaSet that wants there to be 2 replicas, so if one of the pods is deleted, the ReplicaSet will immediately create a new one. This is similar to what would happen if the pod crashes and restarts (in this scenario you'd get the same pod and the same node, if you delete the pod it might come back in a different place).
As you mentioned you can manually kill or restart the pod that is the only solution to test the case or else you can try crashing the one single POD but in the end, it will create the same scenario POD will auto restart.
Or else may you can increase the Graceful shutdown period for deployment so this way POD might take time and stay in terminating state for a good amount of time and you can perform the test.
In kubernetes where pods are controlled by the replicaSet, if you kill a pod it will again be recreated. So the only way to do this is to scale down the number of replicas.
Let's say if your deployment had 4 replicas. You can scale down to 3 by running the command below
kubectl scale deployment <deployment-name> --replicas=3
My example is as show below
kubectl scale deployment hello-world --replicas=3
deployment.apps/hello-world scaled
Before I start all the services I need a pod that would do some initialization. But I dont want this pod to be running after the init is done and also all the other pods/services should start after this init is done. I am aware of init containers in a pod, but I dont think that would solve this issue, as I want the pod to exit after initialization.
You are recommended to let Kubernetes handle pods automatically instead of manual management by yourself, when you can. Consider Job for run-once tasks like this:
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
restartPolicy: Never
containers:
- name:
image:
$ kubectl apply -f job.yml
Kubernetes will create a pod to run that job. The pod will complete as soon as the job exits. Note that the completed job and its pod will be released from consuming resources but not removed completely, to give you a chance to examine its status and log. Deleting the job will remove the pod completely.
Jobs can do advanced things like restarting on failure with exponential backoff, running tasks in parallelism, and limiting the time it runs.
It depend on the Init task, but the init container is the best option you have (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/).
Kubernetes will create the initContainer before the other container, and when it accomplished it's task, it will exit.
Make the InitContainer exit gracefully (with a code 0) so that k8s will understand that the container completed the task it was meant to do and do no try to restart it.
You init task will be done, and the container will no longer exist
You can try attaching handlers to Container lifecycle events. Kubernetes supports the postStart and preStop events. Kubernetes sends the postStart event immediately after a Container is started, and it sends the preStop event immediately before the Container is terminated.
https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/
https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
I need to scale a set of pods that run queue-based workers. Jobs for workers can run for a long time (hours) and should not get interrupted. The number of pods is based on the length of the worker queue. Scaling would be either using the horizontal autoscaler using custom metrics, or a simple controller that changes the number of replicas.
Problem with either solution is that, when scaling down, there is no control over which pod(s) get terminated. At any given time, most workers are likely working on short running jobs, idle, or (more rare) processing a long running job. I'd like to avoid killing the long running job workers, idle or short running job workers can be terminated without issue.
What would be a way to do this with low complexity? One thing I can think of is to do this based on CPU usage of the pods. Not ideal, but it could be good enough. Another method could be that workers somehow expose a priority indicating whether they are the preferred pod to be deleted. This priority could change every time a worker picks up a new job though.
Eventually all jobs will be short running and this problem will go away, but that is a longer term goal for now.
Since version 1.22 there is a beta feature that helps you do that. You can add the annotation controller.kubernetes.io/pod-deletion-cost with a value in the range [-2147483647, 2147483647] and this will cause pods with lower value to be killed first. Default is 0, so anything negative on one pod will cause a pod to get killed during downscaling, e.g.
kubectl annotate pods my-pod-12345678-abcde controller.kubernetes.io/pod-deletion-cost=-1000
Link to discussion about the implementation of this feature: Scale down a deployment by removing specific pods (PodDeletionCost) #2255
Link to the documentation: ReplicaSet / Pod deletion cost
During the process of termination of a pod, Kubernetes sends a SIGTERM signal to the container of your pod. You can use that signal to gracefully shutdown your app. The problem is that Kubernetes does not wait forever for your application to finish and in your case your app may take a long time to exit.
In this case I recommend you use a preStop hook, which is completed before Kubernetes sends the KILL signal to the container. There is an example here on how to use handlers:
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
preStop:
exec:
command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]
There is a kind of workaround that can give some control over the pod termination.
Not quite sure if it the best practice, but at least you can try it and test if it suits your app.
Increase the Deployment grace period with terminationGracePeriodSeconds: 3600 where 3600 is the time in seconds of the longest possible task in the app. This makes sure that the pods will not be terminated by the end of the grace period. Read the docs about the pod termination process in detail.
Define a preStop handler. More details about lifecycle hooks can be found in docs as well as in the example. In my case, I've used the script below to create the file which will later be used as a trigger to terminate the pod (probably there are more elegant solutions).
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "touch /home/node/app/preStop"]
Stop your app running as soon as the condition is met. When the app exits the pod terminates as well. It is not possible to end the process with PID 1 from preStop shell script so you need to add some logic to the app to terminate itself. In my case, it is a NodeJS app, there is a scheduler that is running every 30 seconds and checks whether two conditions are met. !isNodeBusy identifies whether it is allowed to finish the app and fs.existsSync('/home/node/app/preStop') whether preStop hook was triggered. It might be different logic for your app but you get the basic idea.
schedule.scheduleJob('*/30 * * * * *', () => {
if(!isNodeBusy && fs.existsSync('/home/node/app/preStop')){
process.exit();
}
});
Keep in mind that this workaround works only with voluntary disruptions and obviously not helpful with involuntary disruptions. More info in docs.
I think running this type of workload using a Deployment or similar, and using a HorizontalPodAutoscaler for scaling, is the wrong way to go. One way you could go about this is to:
Define a controller (this could perhaps be a Deployment) whose task is to periodically create a Kubernetes Job object.
The spec of the Job should contain a value for .spec.parallelism equal to the maximum number of concurrent executions you will accept.
The Pods spawned by the Job then run your processing logic. They should each pull a message from the queue, process it, and then delete it from the queue (in the case of success).
The Job must exit with the correct status (success or failure). This ensures that the Job recognises when the processing has completed, and so will not spin up additional Pods.
Using this method, .spec.parallelism controls the autoscaling based on how much work there is to be done, and scale-down is an automatic benefit of using a Job.
You are looking for Pod Priority and Preemption. By configuring a high priority PriorityClass for your pods you can ensure that they won't be removed to make space for other pods with a lower priority.
Create a new PriorityClass
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "This priority class will not cause other pods to be preempted."
Set your new PriorityClass in your pods
priorityClassName: high-priority
The value: 1000000 in the PriorityClass configures the scheduling priority of the pod. The higher the value the more important the pod is.
For those who lands on this page facing the issues of Pods getting killed while Node scaling down -
This is an expected feature of Cluster Autoscaler as CA will try to optimize the pods so that it could use a minimum size of the cluster.
However, You can protect your Job pods from eviction (getting killed) by creating a PodDisruptionBudget with maxUnavailable=0 for them.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: sample-pdb
spec:
maxUnavailable: 0
selector:
matchLabels:
app: <your_app_name>
I have a container that runs some data fetching from a MySQL database and simply displays the result in console.log(), and want to run this as a cron job in GKE. So far I have the container working on my local machine, and have successfully deployed this to GKE (in terms of there being no errors thrown so far as I can see).
However, the pods that were created were just left as Running instead of stopping after completion of the task. Are the pods supposed to stop automatically after executing all the code, or do they require explicit instruction to stop and if so what is the command to terminate a pod after creation (by the Cron Job)?
I'm reading that there is supposedly some kind of termination grace period of ~30s by default, but after running a minutely-executed cronjob for ~20minutes, all the pods were still running. Not sure if there's a way to terminate the pods from inside the code, otherwise it would be a little silly to have a cronjob generating lots of pods left running idly..My cronjob.yaml below:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test
spec:
schedule: "5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: test
image: gcr.io/project/test:v1
# env:
# - name: "DELAY"
# value: 15
restartPolicy: OnFailure
A CronJob is essentially a cookie cutter for jobs. That is, it knows how to create jobs and execute them at a certain time. Now, that being said, when looking at garbage collection and clean up behaviour of a CronJob, we can simply look at what the Kubernetes docs have to say about this topic in the context of jobs:
When a Job completes, no more Pods are created, but the Pods are not deleted either. Keeping them around allows you to still view the logs of completed pods to check for errors, warnings, or other diagnostic output. The job object also remains after it is completed so that you can view its status. It is up to the user to delete old jobs after noting their status. Delete the job with kubectl (e.g. kubectl delete jobs/pi or kubectl delete -f ./job.yaml).
Adding a process.kill(); line in the code to explicitly end the process after the code has finished executing allowed the pod to automatically stop after execution
A job in Kubernetes is intended to run a single instance of a pod and ensure it runs to completion. As another answer specifies, a CronJob is a factory for Jobs which knows how and when to spawn a job according to the specified schedule.
Accordingly, and unlike a service which is intended to run forever, the container(s) in the pod created by the pod must exit upon completion of the job. There is a notable problem with the sidecar pattern which often requires manual pod lifecycle handling; if your main pod requires additional pods to provide logging or database access, you must arrange for these to exit upon completion of the main pod, otherwise they will remain running and k8s will not consider the job complete. In such circumstances, the pod associated with the job will never terminate.
The termination grace period is not applicable here: this timer applies after Kubernetes has requested that your pod terminate (e.g. if you delete it). It specifies the maximum time the pod is afforded to shutdown gracefully before the kubelet will summarily terminate it. If Kubernetes never considers your job to be complete, this phase of the pod lifecycle will not be entered.
Furthermore, old pods are kept around after completion for some time to allow perusal of logs and such. You may see pods listed which are not actively running and so not consuming compute resources on your worker nodes.
If your pods are not completing, please provide more information regarding the code they are running so we can assist in determining why the process never exits.
I have been creating pods with type:deployment but I see that some documentation uses type:pod, more specifically the documentation for multi-container pods:
apiVersion: v1
kind: Pod
metadata:
name: ""
labels:
name: ""
namespace: ""
annotations: []
generateName: ""
spec:
? "// See 'The spec schema' for details."
: ~
But to create pods I can just use a deployment type:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ""
spec:
replicas: 3
template:
metadata:
labels:
app: ""
spec:
containers:
etc
I noticed the pod documentation says:
The create command can be used to create a pod directly, or it can
create a pod or pods through a Deployment. It is highly recommended
that you use a Deployment to create your pods. It watches for failed
pods and will start up new pods as required to maintain the specified
number. If you don’t want a Deployment to monitor your pod (e.g. your
pod is writing non-persistent data which won’t survive a restart, or
your pod is intended to be very short-lived), you can create a pod
directly with the create command.
Note: We recommend using a Deployment to create pods. You should use
the instructions below only if you don’t want to create a Deployment.
But this raises the question of what kind:pod is good for? Can you somehow reference pods in a deployment? I didn't see a way. It looks like what you get with pods is some extra metadata but none of the deployment options such as replica or a restart policy. What good is a pod that doesn't persist data, survives a restart? I think I'd be able to create a multi-container pod with a deployment as well.
Radek's answer is very good, but I would like to pitch in from my experience, you will almost never use an object with the kind pod, because that doesn't make any sense in practice.
Because you need a deployment object - or other Kubernetes API objects like a replication controller or replicaset - that needs to keep the replicas (pods) alive (that's kind of the point of using kubernetes).
What you will use in practice for a typical application are:
Deployment object (where you will specify your apps container/containers) that will host your app's container with some other specifications.
Service object (that is like a grouping object and gives it a so-called virtual IP (cluster IP) for the pods that have a certain label - and those pods are basically the app containers that you deployed with the former deployment object).
You need to have the service object because the pods from the deployment object can be killed, scaled up and down, and you can't rely on their IP addresses because they will not be persistent.
So you need an object like a service, that gives those pods a stable IP.
Just wanted to give you some context around pods, so you know how things work together.
Hope that clears a few things for you, not long ago I was in your shoes :)
Both Pod and Deployment are full-fledged objects in the Kubernetes API. Deployment manages creating Pods by means of ReplicaSets. What it boils down to is that Deployment will create Pods with spec taken from the template. It is rather unlikely that you will ever need to create Pods directly for a production use-case.
Kubernetes has three Object Types you should know about:
Pods - runs one or more closely related containers
Services - sets up networking in a Kubernetes cluster
Deployment - Maintains a set of identical pods, ensuring that they have the correct config and that the right number of them exist.
Pods:
Runs a single set of containers
Good for one-off dev purposes
Rarely used directly in production
Deployment:
Runs a set of identical pods
Monitors the state of each pod, updating as necessary
Good for dev
Good for production
And I would agree with other answers, forget about Pods and just use Deployment. Why? Look at the second bullet point, it monitors the state of each pod, updating as necessary.
So, instead of struggling with error messages such as this one:
Forbidden: pod updates may not change fields other than spec.containers[*].image
So just refactor or completely recreate your Pod into a Deployment that creates a pod to do what you need done. With Deployment you can change any piece of configuration you want to and you need not worry about seeing that error message.
Pod is container instance.
That is the output of replicas: 3
Think of one deployment can have many running instances(replica).
//deployment.yaml
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: tomcat-deployment222
spec:
selector:
matchLabels:
app: tomcat
replicas: 3
template:
metadata:
labels:
app: tomcat
spec:
containers:
- name: tomcat
image: tomcat:9.0
ports:
- containerPort: 8080
I want to add some informations from Kubernetes In Action book, so you can see all picture and connect relation between Kubernetes resources like Pod, Deployment and ReplicationController(ReplicaSet)
Pods
are the basic deployable unit in Kubernetes. But in real-world use cases, you want your deployments to stay up and running automatically and remain healthy without any manual intervention. For this the recommended approach is to use a Deployment, which under the hood create a ReplicaSet.
A ReplicaSet, as the name implies, is a set of replicas (Pods) maintained with their Revision history.
(ReplicaSet extends an older object called ReplicationController -- which is exactly the same but without the Revision history.)
A ReplicaSet constantly monitors the list of running pods and makes sure the running number of pods matching a certain specification always matches the desired number.
Removing a pod from the scope of the ReplicationController comes in handy
when you want to perform actions on a specific pod. For example, you might
have a bug that causes your pod to start behaving badly after a specific amount
of time or a specific event.
A Deployment
is a higher-level resource meant for deploying applications and updating them declaratively.
When you create a Deployment, a ReplicaSet resource is created underneath (eventually more of them). ReplicaSets replicate and manage pods, as well. When using a Deployment, the actual pods are created and managed by the Deployment’s ReplicaSets, not by the Deployment directly
Let’s think about what has happened. By changing the pod template in your Deployment resource, you’ve updated your app to a newer version—by changing a single field!
Finally, Roll back a Deployment either to the previous revision or to any earlier revision so easy with Deployment resource.
These images are from Kubernetes In Action book, too.
Pod is a collection of containers and basic object of Kuberntes. All containers of pod lie in same node.
Not suitable for production
No rolling updates
Deployment is a kind of controller in Kubernetes.
Controllers use a Pod Template that you provide to create the Pods for which it is responsible.
Deployment creates a ReplicaSet which in turn make sure that,
CurrentReplicas is always same as desiredReplicas .
Advantages :
You can rollout and rollback your changes using deployment
Monitors the state of each pod
Best suitable for production
Supports rolling updates
In Kubernetes we can deploy our workloads using different type of API objects like Pods, Deployment, ReplicaSet, ReplicationController and StatefulSets.
Out of those Pods are the smallest deployable unit in Kubernetes. Any workload/application that runs in Kubernetes, has to run inside a container part of a Pod. A Pod could run multiple containers (meaning multiple applications) within it. A Pod is a wrapper on top of one/many running containers. Using a Pod, kubernetes could control, monitor, operate the containers.
Now using stand alone Pods we can't do lot of things. We can't change configurations, volumes inside Pods. We can't restart the Pod if one is down.
So there is another API Object called Deployment comes into picture which maintains the desired state (how many instances, how much compute resource application uses) of the application. The Deployment maintaines multiple instances of same application by running multiple Pods. Deployments unlike Pods are mutable. Deployments uses another API Object called ReplicaSet to maintain the desired state. Deployments through ReplicaSet spawns another Pod if one is down.
So Pod runs applications in containers. Deployments run Pods and maintains desired state of the application.
Try to avoid Pods and implement Deployments instead for managing containers as objects of kind Pod will not be rescheduled (or self healed) in the event of a node failure or pod termination.
A Deployment is generally preferable because it defines a ReplicaSet to ensure that the desired number of Pods is always available and specifies a strategy to replace Pods, such as RollingUpdate.
May be this example will be helpful for beginners !!
1) Listing PODs
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
mysql 1/1 Running 0 92s
webapp-mysql-75dfdf859f-9c54j 1/1 Running 0 92s
2) Deleting web-app pode - which is created using deployment
controlplane $ kubectl -n my-namespace delete pod webapp-mysql-75dfdf859f-9c54j
pod "webapp-mysql-75dfdf859f-9c54j" deleted
3) Listing PODs ( You can see, it is recreated automatically)
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
mysql 1/1 Running 0 2m42s
webapp-mysql-75dfdf859f-mqrcx 1/1 Running 0 45s
4) Deleting mysql POD whcih is created directly ( with out deployment)
controlplane $ kubectl -n my-namespace delete pod mysql
pod "mysql" deleted
5) Listing PODs ( You can see mysql POD is lost for ever )
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
webapp-mysql-75dfdf859f-mqrcx 1/1 Running 0 76s
In kubernetes Pods are the smallest deployable units. Every time when we create a kubernetes object like Deployments, replica-sets, statefulsets, daemonsets it creates pod.
As mentioned above deployments create pods based on desired state mentioned in your deployment object. So for example you want 5 replicas of a application, you mentioned replicas: 5 in your deployment manifest. Now deployment controller is responsible to create 5 identical replicas (no less, no more) of given application with all metadata like RBAC policy, networks policy, labels, annotations, health check, resource quotas, taint/tolerations and others and associate with each pods it creates.
There are some cases when you wants to create pod, for example if you are running a test sidecar where you don't need to run application forever, you don't need multiple replicas, and you run application when you wants to execute in that case pod is suitable. For example helm test, which is a pod definition that specifies a container with a given command to run.
I am also a beginner in k8s so correct me if I am wrong.
We know that a pod is created when we create a deployment. What I observed is that if you see the YAML file of the deployment, you can see its kind:deployment. But if you see the YAML file of the pod, you see its kind:pod.